Project

General

Profile

Actions

Task #12111

closed

Tool to identify which mitochondrial reference an input genome file (gVCF, GFF, 23andMe, etc.) was referenced against

Added by Abram Connelly over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
Target version:
-

Description

There are two different mitochondrial reference files that we've seen when trying to convert build37/hg19 referenced genomes. To help automate the choice of which reference mitochondrial DNA to use, we should build a tool to help with this.

I'm imagining a tool that will the mitochondrial portion of a variant file and report back which is the most closely matched reference mitochondrial DNA.

Actions #1

Updated by Abram Connelly over 6 years ago

A rough initial prototype is available at which-ref.

The tool needs to be cleaned up in terms of usage explanation, error handling and reporting but a basic usage is as follows:

./which-ref \
 <( samtools faidx /data-sdd/data/ref/human_g1k_v37.fa.gz MT | egrep -v '^>' | tr '[:upper:]' '[:lower:]' | tr 'm' 'n' | tr -d '\n' ) \
 <( samtools faidx /data-sdd/data/ref/hg38.fa.gz chrM | egrep -v '^>' | tr '[:upper:]' '[:lower:]' | tr 'm' 'n' | tr -d '\n' ) \
 <( samtools faidx /data-sdd/data/ref/hg19.fa.gz chrM | egrep -v '^>' | tr '[:upper:]' '[:lower:]' | tr 'm' 'n' | tr -d '\n' ) \
 <( tabix /data-sdd/data/pgp-gff/hu826751-GS03052-DNA_B01.gff.gz chrM | pasta -a gff-rotini -r <( refstream chrM ) | pasta -a rotini-alt0 | tr -d '\n' )

This reports (on lightning-dev1):

...
min_score: 63
min_idx:2
name:/dev/fd/61
TODO:
  • print usage and allow for other command line options
  • option for all matches
  • warn when there are identical input reference sequences or when there are identical scores
  • allow for max score bailout
  • check for errors
Actions #2

Updated by Abram Connelly over 6 years ago

  • Status changed from New to In Progress
Actions #3

Updated by Abram Connelly over 6 years ago

  • Status changed from In Progress to Closed
  • Remaining (hours) set to 0.0

A rudimentary version is on github.com/curoverse/l7g/tools/which-ref. Better usage reporting, error reporting and command line processing should be added but the tool is functional now.

Actions

Also available in: Atom PDF