Project

General

Profile

Actions

Bug #12245

open

pasta rotini-fastj does not handle tagset without ending newline

Added by Abram Connelly over 6 years ago. Updated almost 5 years ago.

Status:
New
Priority:
Normal
Assigned To:
-
Target version:
Story points:
-

Description

There is a bug in the pasta rotini-fastj conversion where if the tagset provided does not have a trailing newline, the last tag will be ignored.

For example, the following should work but doesn't:

pasta -action rotini-fastj \
  -start 0 \
  -tilepath 0000 \
  -chrom chr1 \
  -build hg19 \
  -i stage/hu034DB1-GS00253-DNA_A02/0000.pa \
  -assembly <( l7g assembly assembly.00.hg19.fw.gz 0000 ) \
  -tag <( samtools faidx tagset.fa.gz 0000.00 | egrep -v '^>' | tr -d '\n' | fold -w 24 ) > stage/hu034DB1-GS00253-DNA_A02/0000.fj

The culprit being the

samtools faidx tagset.fa.gz 0000.00 | egrep -v '^>' | tr -d '\n' | fold -w 24 
.

As a workaround, adding an extra newline will correct the issue:

pasta -action rotini-fastj \
  -start 0 \
  -tilepath 0000 \
  -chrom chr1 \
  -build hg19 \
  -i stage/hu034DB1-GS00253-DNA_A02/0000.pa \
  -assembly <( l7g assembly assembly.00.hg19.fw.gz 0000 ) \
  -tag <( cat <( samtools faidx tagset.fa.gz 0000.00 | egrep -v '^>' | tr -d '\n' | fold -w 24 ) <( echo "" )  ) > stage/hu034DB1-GS00253-DNA_A02/0000.fj

The place to look is the readTag function in pasta_fastj.go and where the g.TagFinished flag is referenced but the details of what's wrong need to be investigated.

This bug specifically happens when there is a variant on the next to last tag in the stream being converted, as is the case for data set hu034DB1-GS00253-DNA_A02 in tilepath 0000.

Actions #1

Updated by Sarah Zaranek about 6 years ago

Abram has this been handled?

Actions #2

Updated by Abram Connelly almost 6 years ago

Note this happens with pasta version 0.2.6 and lower.

Actions #3

Updated by Sarah Zaranek almost 5 years ago

Currently the cwl pipeline using 0.2.6. Need to insure that this fix in all relevant code.

Actions #4

Updated by Jiayong Li almost 5 years ago

In convert2fastj of gvcf src code, there's the line

    $pasta -action rotini-fastj -start $tilepath_start0 -tilepath $tilepath -chrom $chrom -build $ref \
      -i $odir/$tilepath.pa \
      -assembly <( $tile_assembly tilepath $afn $tilepath ) \
      -tag <( cat <( samtools faidx $tagdir $tilepath.00 | egrep -v '^>' | tr -d '\n' | fold -w 24 ) <( echo )  ) > $odir/$tilepath.fj

We need to verify 'echo' is functionally the same as 'echo ""'.

Actions #5

Updated by Jiayong Li almost 5 years ago

  • Target version set to Tiling 1.0
Actions #6

Updated by Anonymous almost 5 years ago

Pasta 0.2.8 can be found in the l7g abramVersion branch. This needs to be tested thoroughly.

Actions

Also available in: Atom PDF