Project

General

Profile

Cgb » History » Version 3

Abram Connelly, 11/15/2016 04:43 PM

1 1 Abram Connelly
h1. cgb
2
3
@cgb@ is a tool to help with access to the binary compact genome format (CGF).  The tool is still in the prototyping stage.
4
5 2 Abram Connelly
Code for @cgb@ can be found on "github.com/abeconnelly/cgf":https://github.com/abeconnelly/cgf.
6 1 Abram Connelly
7
h2. Quick start
8
9
<pre>
10
$ git clone https://github.com/abeconnelly/cgf
11
$ cd cgf/cpp
12
$ ./cmp.sh
13
$ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -s 0 -B -k -p 862
14
[ 79 8 0 0 0 0 0 -1 0 0 0 389 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 -1 34 -1 185 1]
15
[ 79 2 0 0 0 0 0 -1 0 0 0 390 0 0 0 0 0 1 0 0 0 0 0 0 26 0 0 1 0 0 -1 34 -1 185 1]
16
[[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]]
17
[[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]]
18
</pre>
19
20
h2. Brief overview
21
22
@cgb@ is meant to help debug and inspect CGF files.  The two main features are to report the contents of a CGF in terms of tile variants and low quality information as well as to do some basic tile concordance operations.  The code that @cgb@ uses is shared by the Lightning CGF server and is in part meant to test functionality used there.
23
24 3 Abram Connelly
---
25 1 Abram Connelly
26 3 Abram Connelly
h2. Concordance
27
28 1 Abram Connelly
The CGF has different 'tiers' of information, from a bit vector representing whether the tile is canonical, to a cache holding the first 8 tile variants to the overflow tables if the cache is exceeded.  To test and for rough estimates, different 'levels' of concordance are used with @cgb@.
29
30
* Level 0 - compare canonical tiles only
31
* Level 1 - compare canonical tiles and cache
32
* Level 2 - a full tile concordance
33
34
h5. example
35
36
<pre>
37
$ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -i data/hg19.cgf -l 0
38
level: 0, canonical match: 6491163
39
$ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -i data/hg19.cgf -l 1
40
level: 1, canonical+cache match: 6519788, loq: 148760
41
$ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -i data/hg19.cgf -l 2
42
#match_tot: 6610685
43
</pre>
44
45 3 Abram Connelly
---
46
47
h2. CGF Inspection
48 1 Abram Connelly
49
h5. JSON Tile Path information
50
51
Get tile path 862 (0x35e, which is chrM) starting at tile 0 including low quality information and print in JSON format.
52
53
<pre>
54
$ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -p 862 -s 0 -B
55
{
56
  "035e":{
57
    "tilepath":862,
58
    "start_tilestep":0,
59
    "allele":[
60
      [ 79, 8, 0, 0, 0, 0, 0, -1, 0, 0, 0, 389, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, -1, 34, 
61
      -1, 185, 1 ],
62
      [ 79, 2, 0, 0, 0, 0, 0, -1, 0, 0, 0, 390, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 26, 0, 0, 1, 0, 0, -1, 34, 
63
      -1, 185, 1 ]
64
    ],
65
    "loq_info":[
66
      [ [ ], [ ], [ ], [ ], [ ], [ ], [ 903, 1 ], [ ], [ 16, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ 96, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], 
67
        [ ], [ ], [ 291, 2 ] ],
68
      [ [ ], [ ], [ ], [ ], [ ], [ ], [ 903, 1 ], [ ], [ 16, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ 96, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], 
69
        [ ], [ ], [ 291, 2 ] ]
70
    ]
71
  }
72
}
73
</pre>
74
75
h5. Tile Path Compact Representation
76
77
Get tile path 862 (0x35e, which is chrM) starting at tile 0 including low quality information
78
79
<pre>
80
$ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -p 862 -s 0 -B -L -k
81
[ 79 8 0 0 0 0 0 -1 0 0 0 389 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 -1 34 -1 185 1]
82
[ 79 2 0 0 0 0 0 -1 0 0 0 390 0 0 0 0 0 1 0 0 0 0 0 0 26 0 0 1 0 0 -1 34 -1 185 1]
83
[[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]]
84
[[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]]
85
</pre>
86
87
h5. Inspect Binary File (Debug)
88
89
Get a debugging printout of the information in the CGF file
90
91
<pre>
92
$ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -D -i data/hg19.cgf
93
Magic: "cgf.b"{ (7b22622e66676322)
94
CGFVersion: 0.1.0
95
LibVersion: 0.1.0
96
PathCount: 863
97
TileMapLength: 7044
98
TileMap:
99
   [[0+1],[0+1]], [[0+1],[1+1]], [[1+1],[0+1]], [[1+1],[1+1]], [[0+1],[2+1]], [[2+1],[0+1]], [[0+1,0+1],[1+2]], [[1+2],[0+1,0+1]], [[0+2],[0+2]], [[0+1],[3+1]], [[3+1],[0+1]], [[1+1,0+1],[0+2]], [[0+2],[1+1,0+1]], [[0+1],[4+1]], [[4+1],[0+1]], [[1+2],[1+2]], [[2+1],[2+1]], [[1+1],[3+1]], [[3+1],[1+1]], [[1+1],[2+1]], [[2+1],[1+1]], [[0+1],[5+1]], [[5+1],[0+1]], [[0+1],[6+1]], [[6+1],[0+1]], [[0+1,0+1],[2+2]], [[2+2],[0+1,0+1]], [[0+1,0+1],[3+2]], [[3+2],[0+1,0+1]], [[3+1],[3+1]], [[0+1],[7+1]], [[7+1],[0+1]],
100
...
101
102
  035e.Loq.LoqFlagByteCount: 5
103
  035e.Loq.LoqFlag[5]:
104
     40 01 20 00 04
105
106
  035e.Loq.LoqInfoByteCount: 18
107
  035e.Loq.LoqInfo[18]:
108
     01 02 83 87 01 01 02 10 01 01 02 60 01 01 02 81 23 02
109
110
</pre>