Bug #12245

pasta rotini-fastj does not handle tagset without ending newline

Added by Abram Connelly almost 2 years ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Target version:
Start date:
09/13/2017
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

There is a bug in the pasta rotini-fastj conversion where if the tagset provided does not have a trailing newline, the last tag will be ignored.

For example, the following should work but doesn't:

pasta -action rotini-fastj \
  -start 0 \
  -tilepath 0000 \
  -chrom chr1 \
  -build hg19 \
  -i stage/hu034DB1-GS00253-DNA_A02/0000.pa \
  -assembly <( l7g assembly assembly.00.hg19.fw.gz 0000 ) \
  -tag <( samtools faidx tagset.fa.gz 0000.00 | egrep -v '^>' | tr -d '\n' | fold -w 24 ) > stage/hu034DB1-GS00253-DNA_A02/0000.fj

The culprit being the

samtools faidx tagset.fa.gz 0000.00 | egrep -v '^>' | tr -d '\n' | fold -w 24 
.

As a workaround, adding an extra newline will correct the issue:

pasta -action rotini-fastj \
  -start 0 \
  -tilepath 0000 \
  -chrom chr1 \
  -build hg19 \
  -i stage/hu034DB1-GS00253-DNA_A02/0000.pa \
  -assembly <( l7g assembly assembly.00.hg19.fw.gz 0000 ) \
  -tag <( cat <( samtools faidx tagset.fa.gz 0000.00 | egrep -v '^>' | tr -d '\n' | fold -w 24 ) <( echo "" )  ) > stage/hu034DB1-GS00253-DNA_A02/0000.fj

The place to look is the readTag function in pasta_fastj.go and where the g.TagFinished flag is referenced but the details of what's wrong need to be investigated.

This bug specifically happens when there is a variant on the next to last tag in the stream being converted, as is the case for data set hu034DB1-GS00253-DNA_A02 in tilepath 0000.

History

#1 Updated by Sarah Zaranek over 1 year ago

Abram has this been handled?

#2 Updated by Abram Connelly about 1 year ago

Note this happens with pasta version 0.2.6 and lower.

#3 Updated by Sarah Zaranek 3 months ago

Currently the cwl pipeline using 0.2.6. Need to insure that this fix in all relevant code.

#4 Updated by Jiayong Li 3 months ago

In convert2fastj of gvcf src code, there's the line

    $pasta -action rotini-fastj -start $tilepath_start0 -tilepath $tilepath -chrom $chrom -build $ref \
      -i $odir/$tilepath.pa \
      -assembly <( $tile_assembly tilepath $afn $tilepath ) \
      -tag <( cat <( samtools faidx $tagdir $tilepath.00 | egrep -v '^>' | tr -d '\n' | fold -w 24 ) <( echo )  ) > $odir/$tilepath.fj

We need to verify 'echo' is functionally the same as 'echo ""'.

#5 Updated by Jiayong Li 3 months ago

  • Target version set to Tiling 1.0

#6 Updated by Forrest Hartley 3 months ago

Pasta 0.2.8 can be found in the l7g abramVersion branch. This needs to be tested thoroughly.

Also available in: Atom PDF