Project

General

Profile

Cgb » History » Revision 2

Revision 1 (Abram Connelly, 11/15/2016 04:41 PM) → Revision 2/3 (Abram Connelly, 11/15/2016 04:42 PM)

h1. cgb 

 @cgb@ is a tool to help with access to the binary compact genome format (CGF).    The tool is still in the prototyping stage. 

 Code for @cgb@ can be found on "github.com/abeconnelly/cgf":https://github.com/abeconnelly/cgf. [github.com/abeconnelly/cgf](https://github.com/abeconnelly/cgf). 

 h2. Quick start 

 <pre> 
 $ git clone https://github.com/abeconnelly/cgf 
 $ cd cgf/cpp 
 $ ./cmp.sh 
 $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -s 0 -B -k -p 862 
 [ 79 8 0 0 0 0 0 -1 0 0 0 389 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 -1 34 -1 185 1] 
 [ 79 2 0 0 0 0 0 -1 0 0 0 390 0 0 0 0 0 1 0 0 0 0 0 0 26 0 0 1 0 0 -1 34 -1 185 1] 
 [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]] 
 [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]] 
 </pre> 

 h2. Brief overview 

 @cgb@ is meant to help debug and inspect CGF files.    The two main features are to report the contents of a CGF in terms of tile variants and low quality information as well as to do some basic tile concordance operations.    The code that @cgb@ uses is shared by the Lightning CGF server and is in part meant to test functionality used there. 

 h3. Concordance 

 The CGF has different 'tiers' of information, from a bit vector representing whether the tile is canonical, to a cache holding the first 8 tile variants to the overflow tables if the cache is exceeded.    To test and for rough estimates, different 'levels' of concordance are used with @cgb@. 

 * Level 0 - compare canonical tiles only 
 * Level 1 - compare canonical tiles and cache 
 * Level 2 - a full tile concordance 

 h5. example 

 <pre> 
 $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -i data/hg19.cgf -l 0 
 level: 0, canonical match: 6491163 
 $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -i data/hg19.cgf -l 1 
 level: 1, canonical+cache match: 6519788, loq: 148760 
 $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -i data/hg19.cgf -l 2 
 #match_tot: 6610685 
 </pre> 

 h3. CGF Inspection 

 h5. JSON Tile Path information 

 Get tile path 862 (0x35e, which is chrM) starting at tile 0 including low quality information and print in JSON format. 

 <pre> 
 $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -p 862 -s 0 -B 
 { 
   "035e":{ 
     "tilepath":862, 
     "start_tilestep":0, 
     "allele":[ 
       [ 79, 8, 0, 0, 0, 0, 0, -1, 0, 0, 0, 389, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, -1, 34,  
       -1, 185, 1 ], 
       [ 79, 2, 0, 0, 0, 0, 0, -1, 0, 0, 0, 390, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 26, 0, 0, 1, 0, 0, -1, 34,  
       -1, 185, 1 ] 
     ], 
     "loq_info":[ 
       [ [ ], [ ], [ ], [ ], [ ], [ ], [ 903, 1 ], [ ], [ 16, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ 96, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ],  
         [ ], [ ], [ 291, 2 ] ], 
       [ [ ], [ ], [ ], [ ], [ ], [ ], [ 903, 1 ], [ ], [ 16, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ 96, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ],  
         [ ], [ ], [ 291, 2 ] ] 
     ] 
   } 
 } 
 </pre> 

 h5. Tile Path Compact Representation 

 Get tile path 862 (0x35e, which is chrM) starting at tile 0 including low quality information 

 <pre> 
 $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -p 862 -s 0 -B -L -k 
 [ 79 8 0 0 0 0 0 -1 0 0 0 389 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 -1 34 -1 185 1] 
 [ 79 2 0 0 0 0 0 -1 0 0 0 390 0 0 0 0 0 1 0 0 0 0 0 0 26 0 0 1 0 0 -1 34 -1 185 1] 
 [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]] 
 [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]] 
 </pre> 

 h5. Inspect Binary File (Debug) 

 Get a debugging printout of the information in the CGF file 

 <pre> 
 $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -D -i data/hg19.cgf 
 Magic: "cgf.b"{ (7b22622e66676322) 
 CGFVersion: 0.1.0 
 LibVersion: 0.1.0 
 PathCount: 863 
 TileMapLength: 7044 
 TileMap: 
    [[0+1],[0+1]], [[0+1],[1+1]], [[1+1],[0+1]], [[1+1],[1+1]], [[0+1],[2+1]], [[2+1],[0+1]], [[0+1,0+1],[1+2]], [[1+2],[0+1,0+1]], [[0+2],[0+2]], [[0+1],[3+1]], [[3+1],[0+1]], [[1+1,0+1],[0+2]], [[0+2],[1+1,0+1]], [[0+1],[4+1]], [[4+1],[0+1]], [[1+2],[1+2]], [[2+1],[2+1]], [[1+1],[3+1]], [[3+1],[1+1]], [[1+1],[2+1]], [[2+1],[1+1]], [[0+1],[5+1]], [[5+1],[0+1]], [[0+1],[6+1]], [[6+1],[0+1]], [[0+1,0+1],[2+2]], [[2+2],[0+1,0+1]], [[0+1,0+1],[3+2]], [[3+2],[0+1,0+1]], [[3+1],[3+1]], [[0+1],[7+1]], [[7+1],[0+1]], 
 ... 

   035e.Loq.LoqFlagByteCount: 5 
   035e.Loq.LoqFlag[5]: 
      40 01 20 00 04 

   035e.Loq.LoqInfoByteCount: 18 
   035e.Loq.LoqInfo[18]: 
      01 02 83 87 01 01 02 10 01 01 02 60 01 01 02 81 23 02 

 </pre>