Task #12111
closed
Tool to identify which mitochondrial reference an input genome file (gVCF, GFF, 23andMe, etc.) was referenced against
Added by Abram Connelly over 6 years ago.
Updated over 6 years ago.
Description
There are two different mitochondrial reference files that we've seen when trying to convert build37/hg19 referenced genomes. To help automate the choice of which reference mitochondrial DNA to use, we should build a tool to help with this.
I'm imagining a tool that will the mitochondrial portion of a variant file and report back which is the most closely matched reference mitochondrial DNA.
A rough initial prototype is available at which-ref.
The tool needs to be cleaned up in terms of usage explanation, error handling and reporting but a basic usage is as follows:
./which-ref \
<( samtools faidx /data-sdd/data/ref/human_g1k_v37.fa.gz MT | egrep -v '^>' | tr '[:upper:]' '[:lower:]' | tr 'm' 'n' | tr -d '\n' ) \
<( samtools faidx /data-sdd/data/ref/hg38.fa.gz chrM | egrep -v '^>' | tr '[:upper:]' '[:lower:]' | tr 'm' 'n' | tr -d '\n' ) \
<( samtools faidx /data-sdd/data/ref/hg19.fa.gz chrM | egrep -v '^>' | tr '[:upper:]' '[:lower:]' | tr 'm' 'n' | tr -d '\n' ) \
<( tabix /data-sdd/data/pgp-gff/hu826751-GS03052-DNA_B01.gff.gz chrM | pasta -a gff-rotini -r <( refstream chrM ) | pasta -a rotini-alt0 | tr -d '\n' )
This reports (on lightning-dev1
):
...
min_score: 63
min_idx:2
name:/dev/fd/61
TODO:
- print usage and allow for other command line options
- option for all matches
- warn when there are identical input reference sequences or when there are identical scores
- allow for max score bailout
- check for errors
- Status changed from New to In Progress
- Status changed from In Progress to Closed
- Remaining (hours) set to 0.0
Also available in: Atom
PDF