Project

General

Profile

Actions

Task #13234

closed

FastJ checks for FastJ files

Added by Abram Connelly about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Keldin Sergheyev
Target version:
-

Description

Do some data integrity checks for FastJ files to make sure the FastJ files are consistent.

Here are some basic checks that can be done:

  • Check the reported "n" is the number of bases in the tile
  • Check the reported number of no-call bases appear in the sequence
  • Check the reported tagsets are consistent with the tile sequence
  • Check the reported hashes are consistent with the tile sequence
  • Check the reported beginning and end tiles are at the beginning and end of the tilepath
  • Check the reported TileIDs are consistent

There should be levels of checks, some that need external information like the reference sequence, the tagset for the tilepath, the number of tile steps in the tile path, etc and others that only need the FastJ information.

The tool should take in a command line option to specify which level of checks it should be doing.


Subtasks 2 (0 open2 closed)

Task #13268: Correct python script and rewrite in c++ResolvedKeldin Sergheyev04/05/201803/30/2018Actions
Task #13379: Review 13234ResolvedAbram Connelly04/24/2018Actions
Actions #1

Updated by Keldin Sergheyev about 6 years ago

  • Assigned To set to Keldin Sergheyev
Actions #2

Updated by Keldin Sergheyev about 6 years ago

  • Status changed from New to In Progress
  • Parent task set to #13205
Actions #3

Updated by Keldin Sergheyev about 6 years ago

Started with copy of Abram's fjt.cpp, located in l7g/tools/fjt.
  • Added new fields for n and nocall count to JSON header.
  • Added command line option -T for Tests.
  • Added function fastj_check that, for now, checks n and nocall, but will eventually contain all the checks.
Actions #4

Updated by Keldin Sergheyev about 6 years ago

  • Working directly on fjt.cpp in branch 13234-fastjchecks of l7g repo (as one is supposed to do) rather than editing locally.
  • Started startTag check.
  • Added modified 035e.fj to testdata in repo that tests the startTag check
  • Need to implement endTag check
  • Consider replacing beginTag with startTag to match precedent
  • Need to move over testdata from local to l7g (the ones that are tailored to trigger each error message)
  • Need to finish creating test for start and end tag, including for nocalls
  • Need to take out debugging messages
  • Need to add in verbose option error reporting (line number for error)
Actions #5

Updated by Keldin Sergheyev about 6 years ago

  • Changed beginTag to startTag, to match precedent.
Actions #6

Updated by Keldin Sergheyev about 6 years ago

  • endTag check added
  • 035e.broken_endTag.fj added to testdata that correctly triggers endTag check error
  • hash consistency check implemented. md5sum (from hdr) is called seqHash
  • 035e.broken_md5sum.fj added to testdata that correctly triggers hash check error
  • startTile and endTile check added
  • tileID consistency check
  • Eliminate risk of trying to access -1 index of array (size - 1 when size is 0)
  • Do a valgrind check
  • Need to create more test fastjs
  • Add in another flag for startTile, endTile, tileID consistency (only check if flag specified; don't run if other tests -T specified).
    • Add extra options to shell help message
  • Check that all tilePaths are the same
  • Update shell script to perform new tests I created.
    • Make sure it fails on known bad fastj files and succeeds on known good fastj files.
Actions #7

Updated by Keldin Sergheyev about 6 years ago

  • Added -t flag (test tileID consistency)
  • Added -t to shell help message
  • Not sure if FJT_ACTION enum needs to be edited
Actions #8

Updated by Keldin Sergheyev about 6 years ago

  • Specify -T for regular fastj check and add flag -t to also check tileID consistency
  • tileID consistency check split off from function fastj_check and put in its own function fastj_check_tileid
  • Edited error messages. They now go to stderr.
  • Edited messages for success. They only show if verbose flag specified.
  • Housekeeping: exit(-1) added
  • check_tileid_option flag added. Thus no extra entry to FJT_ACTION needed.
  • Added conditional to check if fj_tile has size 0 to prevent from out-of-bounds error
  • Ran valgrind. No problems.
Actions #9

Updated by Keldin Sergheyev about 6 years ago

  • Status changed from In Progress to Resolved
  • Added all tests to fjt_test.sh, including running checks on known healthy fastj files to test that tool does indeed produce NO error.
Comments upon declaring ticket "ready for review":
  • fjt -T and fjt -T -t perform all the fastj checks specified in the ticket.
  • fjt_test.sh has been updated so that future work on fjt.cpp can be tested for introducing bugs.
  • fastj checks do not require any reference sequence information (unlike description in ticket)
  • Future work:
    • Add verbose option, so that running fastj checks can give you the line number that triggered the error.
  • Next steps:
    • Run on larger fastj files (larger than 035e.fj)
    • Create pipeline in arvados
Actions #10

Updated by Keldin Sergheyev about 6 years ago

  • Status changed from Resolved to In Progress
Actions #11

Updated by Abram Connelly about 6 years ago

Review comments:

  • Change compilation option to -O3 instead of debug (-g) in Makefile)
  • take out trailing whitespace (fjt_test.sh, fjt.cpp)

Otherwise, it looks good to me.

Actions #12

Updated by Keldin Sergheyev about 6 years ago

  • Changed compilation option to -O3 in Makefile
  • Used search \s\n to find and remove trailing whitespace in fjt.cpp and fjt_test.sh
Actions #13

Updated by Keldin Sergheyev about 6 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF