Cgb » History » Revision 2
Revision 1 (Abram Connelly, 11/15/2016 04:41 PM) → Revision 2/3 (Abram Connelly, 11/15/2016 04:42 PM)
h1. cgb @cgb@ is a tool to help with access to the binary compact genome format (CGF). The tool is still in the prototyping stage. Code for @cgb@ can be found on "github.com/abeconnelly/cgf":https://github.com/abeconnelly/cgf. [github.com/abeconnelly/cgf](https://github.com/abeconnelly/cgf). h2. Quick start <pre> $ git clone https://github.com/abeconnelly/cgf $ cd cgf/cpp $ ./cmp.sh $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -s 0 -B -k -p 862 [ 79 8 0 0 0 0 0 -1 0 0 0 389 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 -1 34 -1 185 1] [ 79 2 0 0 0 0 0 -1 0 0 0 390 0 0 0 0 0 1 0 0 0 0 0 0 26 0 0 1 0 0 -1 34 -1 185 1] [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]] [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]] </pre> h2. Brief overview @cgb@ is meant to help debug and inspect CGF files. The two main features are to report the contents of a CGF in terms of tile variants and low quality information as well as to do some basic tile concordance operations. The code that @cgb@ uses is shared by the Lightning CGF server and is in part meant to test functionality used there. h3. Concordance The CGF has different 'tiers' of information, from a bit vector representing whether the tile is canonical, to a cache holding the first 8 tile variants to the overflow tables if the cache is exceeded. To test and for rough estimates, different 'levels' of concordance are used with @cgb@. * Level 0 - compare canonical tiles only * Level 1 - compare canonical tiles and cache * Level 2 - a full tile concordance h5. example <pre> $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -i data/hg19.cgf -l 0 level: 0, canonical match: 6491163 $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -i data/hg19.cgf -l 1 level: 1, canonical+cache match: 6519788, loq: 148760 $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -i data/hg19.cgf -l 2 #match_tot: 6610685 </pre> h3. CGF Inspection h5. JSON Tile Path information Get tile path 862 (0x35e, which is chrM) starting at tile 0 including low quality information and print in JSON format. <pre> $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -p 862 -s 0 -B { "035e":{ "tilepath":862, "start_tilestep":0, "allele":[ [ 79, 8, 0, 0, 0, 0, 0, -1, 0, 0, 0, 389, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, -1, 34, -1, 185, 1 ], [ 79, 2, 0, 0, 0, 0, 0, -1, 0, 0, 0, 390, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 26, 0, 0, 1, 0, 0, -1, 34, -1, 185, 1 ] ], "loq_info":[ [ [ ], [ ], [ ], [ ], [ ], [ ], [ 903, 1 ], [ ], [ 16, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ 96, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ 291, 2 ] ], [ [ ], [ ], [ ], [ ], [ ], [ ], [ 903, 1 ], [ ], [ 16, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ 96, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ 291, 2 ] ] ] } } </pre> h5. Tile Path Compact Representation Get tile path 862 (0x35e, which is chrM) starting at tile 0 including low quality information <pre> $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -p 862 -s 0 -B -L -k [ 79 8 0 0 0 0 0 -1 0 0 0 389 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 -1 34 -1 185 1] [ 79 2 0 0 0 0 0 -1 0 0 0 390 0 0 0 0 0 1 0 0 0 0 0 0 26 0 0 1 0 0 -1 34 -1 185 1] [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]] [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]] </pre> h5. Inspect Binary File (Debug) Get a debugging printout of the information in the CGF file <pre> $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -D -i data/hg19.cgf Magic: "cgf.b"{ (7b22622e66676322) CGFVersion: 0.1.0 LibVersion: 0.1.0 PathCount: 863 TileMapLength: 7044 TileMap: [[0+1],[0+1]], [[0+1],[1+1]], [[1+1],[0+1]], [[1+1],[1+1]], [[0+1],[2+1]], [[2+1],[0+1]], [[0+1,0+1],[1+2]], [[1+2],[0+1,0+1]], [[0+2],[0+2]], [[0+1],[3+1]], [[3+1],[0+1]], [[1+1,0+1],[0+2]], [[0+2],[1+1,0+1]], [[0+1],[4+1]], [[4+1],[0+1]], [[1+2],[1+2]], [[2+1],[2+1]], [[1+1],[3+1]], [[3+1],[1+1]], [[1+1],[2+1]], [[2+1],[1+1]], [[0+1],[5+1]], [[5+1],[0+1]], [[0+1],[6+1]], [[6+1],[0+1]], [[0+1,0+1],[2+2]], [[2+2],[0+1,0+1]], [[0+1,0+1],[3+2]], [[3+2],[0+1,0+1]], [[3+1],[3+1]], [[0+1],[7+1]], [[7+1],[0+1]], ... 035e.Loq.LoqFlagByteCount: 5 035e.Loq.LoqFlag[5]: 40 01 20 00 04 035e.Loq.LoqInfoByteCount: 18 035e.Loq.LoqInfo[18]: 01 02 83 87 01 01 02 10 01 01 02 60 01 01 02 81 23 02 </pre>