Idea #11673
open
Extend CGF to include genotyping
Added by Abram Connelly over 7 years ago.
Updated over 5 years ago.
Description
Extend the CGFv3 format to include genotyping information, not just whole genome information.
We should be able to convert both 23andMe and Ancestry.com data a tiling/CGF representation.
The specification should be updated and the cgft tool should be updated to be able to take in the new band information, store it as CGF and convert back to band information.
The main difference in storage at the CGF layer is that the "low quality" data will be reversed, representing position and length data of high quality data instead of low quality data.
A flag needs to be set in the header in order to differentiate between the two storage interpretations.
Further, since the genotyping data is so small, we can "stuff" in the original data into a data
field in the CGF so that we can recreate, up to a reasonable approximation, the original genotyping information.
- Story points changed from 2.0 to 1.0
Part of this has been done with the "noc-inv" function in the cgft
program. This is added to the CGFVersion
as the string "noc-inv".
The CGFVersion
is a string with each field comma (,
) delimited.
There's some documentation left to do in the CGFv3 specification to make sure it's explicitly stated there. The cgft
tool should also provide an explicit option to query whether the CGF file is "default" or "noc-inv" as well as an option to explicitly change only that field (if not present already).
Also, we should convert all the Harvard PGP genotyping information we have. There will probably be some overlap between the Harvard PGP and openSNP (and others if we import them, like OpenHumans), so we should consider how to handle that gracefully.
- Target version set to Lightning Sprint (2017-05-15 to 2017-05-29)
- Status changed from New to In Progress
- Status changed from In Progress to New
- Target version changed from Lightning Sprint (2017-05-15 to 2017-05-29) to Tiling 1.1
Also available in: Atom
PDF