Pipeline Optimization » History » Version 1
Bryan Cosca, 04/14/2016 03:08 PM
1 | 1 | Bryan Cosca | h1. Pipeline Optimization |
---|---|---|---|
2 | |||
3 | h2. Crunchstat Summary |
||
4 | h3. How to install crunchstat-summary |
||
5 | h3. How to use crunchstat-summary |
||
6 | --text mode |
||
7 | using node recommendations, keep cache size |
||
8 | |||
9 | --html mode |
||
10 | check if you're cpu/io bound |
||
11 | check if tasks are being weird, i.e. gatk queue case |
||
12 | |||
13 | when to pipe and when to write to keep |
||
14 | in general writing straight to keep will reap benefits. If you run crunchstat-summary --html and you see keep io stopping once in a while, then youre cpu bound. |
||
15 | |||
16 | h3. How to optimize the number when you don't have native multithreading |
||
17 | tools like gatk, blah blah have native multithreading where you pass a -t. |
||
18 | tools like varscan/freebayes blah blah don't have native multithreading so you need to find a workaround. generally, some tools have a -L --intervals to pass in certain loci to work on. If you have a bed file you can split on, then you can create a new task per interval. |
||
19 | example here |
||
20 | |||
21 | h3. piping between tools or writing to a tmpdir. |
||
22 | |||
23 | h3. choosing the right number of jobs |
||
24 | each job must output a collection, so if you don't want to output a file, then |