Story #11673

Extend CGF to include genotyping

Added by Abram Connelly about 5 years ago. Updated about 3 years ago.

Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
Story points:


Extend the CGFv3 format to include genotyping information, not just whole genome information.

We should be able to convert both 23andMe and data a tiling/CGF representation.

The specification should be updated and the cgft tool should be updated to be able to take in the new band information, store it as CGF and convert back to band information.

The main difference in storage at the CGF layer is that the "low quality" data will be reversed, representing position and length data of high quality data instead of low quality data.
A flag needs to be set in the header in order to differentiate between the two storage interpretations.
Further, since the genotyping data is so small, we can "stuff" in the original data into a data field in the CGF so that we can recreate, up to a reasonable approximation, the original genotyping information.


#1 Updated by Abram Connelly about 5 years ago

  • Story points changed from 2.0 to 1.0

Part of this has been done with the "noc-inv" function in the cgft program. This is added to the CGFVersion as the string "noc-inv".

The CGFVersion is a string with each field comma (,) delimited.

There's some documentation left to do in the CGFv3 specification to make sure it's explicitly stated there. The cgft tool should also provide an explicit option to query whether the CGF file is "default" or "noc-inv" as well as an option to explicitly change only that field (if not present already).

#2 Updated by Abram Connelly about 5 years ago

Also, we should convert all the Harvard PGP genotyping information we have. There will probably be some overlap between the Harvard PGP and openSNP (and others if we import them, like OpenHumans), so we should consider how to handle that gracefully.

#3 Updated by Abram Connelly about 5 years ago

  • Target version set to Lightning Sprint (2017-05-15 to 2017-05-29)

#4 Updated by Abram Connelly almost 5 years ago

  • Status changed from New to In Progress

#5 Updated by Jiayong Li about 3 years ago

  • Status changed from In Progress to New
  • Target version changed from Lightning Sprint (2017-05-15 to 2017-05-29) to Tiling 1.1

Also available in: Atom PDF