Project

General

Profile

Actions

Task #12139

closed

Create GRCh38 tile assembly

Added by Abram Connelly over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
High
Assigned To:
Target version:
-

Description

Create the appropriate assembly files for GRCh38.

This involves create the files:

  • assembly.00.grch38.fw.gz
  • assembly.00.grch38.fw.fwi
  • assembly.00.grch38.fw.gzi

Where the assembly.00.grch38.fw.gz is the compressed assembly "fixed width" file and the others are the index files to access it.

This should involve mapping the tagset onto the GRCh38 sequences. We'll need to figure out what to do with alternative assembly regions in GRCh38. The focus should be on the main assembly regions.

Actions #1

Updated by Abram Connelly over 6 years ago

  • Status changed from New to In Progress
  • Assigned To set to Abram Connelly
  • Priority changed from Normal to High

This is a necessary step to be able to convert genomes that use GRCh38 as a reference.

Actions #2

Updated by Abram Connelly over 6 years ago

  • Status changed from In Progress to Closed
  • Remaining (hours) set to 0.0

There is an hg38 tile assembly in the assembly collection in keep.

The l7g repo has been updated with an hg38 tile liftover script to create the new assembly file.

Some notes:
  • There are empty tile paths in the hg38 tile assembly which might need special consideration when converting to cgf
  • Some tags that were unique in hg19 are now duplicated in places in hg38
  • Some tiles can be significantly longer than the original tiles in hg19

As a reminder, the format is "<tilestep> <end position, 0 reference, non inclusive>". The end position is 0 referenced, non inclusive and holds the end position of the tile step, except for the last tile step in which case the end position is the end of the tile path.

For example,

...
1536       2367984
1537       2368209
1538       2368410
>hg38:chr1:0001
0000       2368786
0001       2369403
0002       2369634
...

The fields are tab delimited and padded with spaces to make everything fixed width. The .gzi and .fwi index files are also provided for efficient random access into the file.

I will mark this as done with the understanding that this might need to be updated in the future if any errors are discovered.

Actions

Also available in: Atom PDF