Project

General

Profile

Actions

Bug #12245

open

pasta rotini-fastj does not handle tagset without ending newline

Added by Abram Connelly over 6 years ago. Updated about 5 years ago.

Status:
New
Priority:
Normal
Assigned To:
-
Target version:
Story points:
-

Description

There is a bug in the pasta rotini-fastj conversion where if the tagset provided does not have a trailing newline, the last tag will be ignored.

For example, the following should work but doesn't:

pasta -action rotini-fastj \
  -start 0 \
  -tilepath 0000 \
  -chrom chr1 \
  -build hg19 \
  -i stage/hu034DB1-GS00253-DNA_A02/0000.pa \
  -assembly <( l7g assembly assembly.00.hg19.fw.gz 0000 ) \
  -tag <( samtools faidx tagset.fa.gz 0000.00 | egrep -v '^>' | tr -d '\n' | fold -w 24 ) > stage/hu034DB1-GS00253-DNA_A02/0000.fj

The culprit being the

samtools faidx tagset.fa.gz 0000.00 | egrep -v '^>' | tr -d '\n' | fold -w 24 
.

As a workaround, adding an extra newline will correct the issue:

pasta -action rotini-fastj \
  -start 0 \
  -tilepath 0000 \
  -chrom chr1 \
  -build hg19 \
  -i stage/hu034DB1-GS00253-DNA_A02/0000.pa \
  -assembly <( l7g assembly assembly.00.hg19.fw.gz 0000 ) \
  -tag <( cat <( samtools faidx tagset.fa.gz 0000.00 | egrep -v '^>' | tr -d '\n' | fold -w 24 ) <( echo "" )  ) > stage/hu034DB1-GS00253-DNA_A02/0000.fj

The place to look is the readTag function in pasta_fastj.go and where the g.TagFinished flag is referenced but the details of what's wrong need to be investigated.

This bug specifically happens when there is a variant on the next to last tag in the stream being converted, as is the case for data set hu034DB1-GS00253-DNA_A02 in tilepath 0000.

Actions

Also available in: Atom PDF