Pipeline Optimization » History » Version 19
Bryan Cosca, 04/21/2016 03:38 PM
1 | 1 | Bryan Cosca | h1. Pipeline Optimization |
---|---|---|---|
2 | |||
3 | 17 | Bryan Cosca | h2. Overview |
4 | 1 | Bryan Cosca | |
5 | 19 | Bryan Cosca | This wiki page is designed to help users make their Arvados pipelines cost and compute efficient for production level data. It will teach users how to make the most out of their pipeline to save money or compute time on their clusters. |
6 | 17 | Bryan Cosca | |
7 | h2. Pipeline design |
||
8 | |||
9 | h3. Choosing the right number of jobs |
||
10 | 1 | Bryan Cosca | |
11 | 19 | Bryan Cosca | Currently, your pipeline may have one script that runs everything from alignment to variant calling. If something unexpected happens and your pipeline fails, you don't want to start from the beginning again, you want to resume from where you left off! |
12 | 1 | Bryan Cosca | |
13 | 19 | Bryan Cosca | This is where the notion of choosing which jobs to create comes into play. The right number of jobs depends on how versatile you want your pipeline to be. Specifically, how many steps do you want your pipeline to have? |
14 | |||
15 | Questions you may ask yourself are: |
||
16 | |||
17 | 18 | Bryan Cosca | Do I want to output all your intermediate files to keep? |
18 | 1 | Bryan Cosca | How many checkpoints do I want my pipeline to have? |
19 | 19 | Bryan Cosca | Do I want to do alignment and variant calling in one step? Should I separate them? (usually, no!) |
20 | 1 | Bryan Cosca | |
21 | 19 | Bryan Cosca | Each job must output a collection of files. This is how you can checkpoint your pipeline. The outputs of jobs are always available, so you can reuse them when the next job fails. |
22 | 18 | Bryan Cosca | |
23 | 19 | Bryan Cosca | If you don't want to have a lot of jobs or a lot of outputs, you should combine multiple commands in a job. You can do all your bam realignment/recalibration in one step, and you don't need to output all the intermediate bam files to keep. Keep in mind, you always choose what to upload to keep, so if you don't need files later on, its best to leave it on the compute node. One way to do so is to use arvados.current_task().tmpdir to store all your intermediate files, and then only upload what you want to keep. |
24 | |||
25 | If you choose to do multiple computations in a job, you should try piping them together. Creating pipes between tools has shown to sometimes be faster than writing/reading from disk. Feel free to pipe your tools together, for example using subprocess.PIPE in the "python subprocess module":https://docs.python.org/2/library/subprocess.html. Sometimes piping is faster, sometimes it's not. You'll have to try for yourself. |
||
26 | |||
27 | 1 | Bryan Cosca | If you want a lot of checkpoints you should have a job for each command. You'll be able to resume/restart work easily if any unexpected interruption happens. Also, you can choose different node types for each command. For example, BWA-mem can scale a lot better than fastqc or varscan, so having a 16 core node for something that doesn't have native multi-threading would be wasteful. |
28 | |||
29 | 19 | Bryan Cosca | h3. Writing to keep |
30 | 1 | Bryan Cosca | |
31 | 19 | Bryan Cosca | Once you've chosen how to create jobs, you'll need a way to write your outputs to keep. |
32 | 1 | Bryan Cosca | |
33 | 19 | Bryan Cosca | There are two ways to write your output collection to keep. Writing straight to keep ( arvados.crunch.TaskOutputDir() ) and staging a file in a temporary directory and then uploading to keep. |
34 | |||
35 | In general, writing straight to keep will reap more benefits. TaskOutputDir acts like a pipe, so you never have to spend node time on uploading data. |
||
36 | One problem though is if your job is dependent on using your output directory as a temporary space for files. If your job uses its output directory for computation, then your job will be trying to compute over a network and could become very slow. That being said, it's very safe for a job to write to a temporary directory then spending compute time uploading to keep. If you have time, it's worth trying both and seeing how much time you save by doing both. Most of the time, writing straight to keep using TaskOutputDir will be the right option, but using a tmpdir is always the safe alternative. |
||
37 | |||
38 | 1 | Bryan Cosca | h3. Choosing the right number of tasks |
39 | |||
40 | 19 | Bryan Cosca | Once you have your job architecture set up, you need to start looking at the core level of the compute node. Specifically, how much compute are you using and how can you take advantage of a machine. |
41 | 1 | Bryan Cosca | |
42 | 19 | Bryan Cosca | If you want multiple computations (tasks) on a node, setting max_tasks_per_node in "runtime_constraints":http://doc.arvados.org/api/schema/Job.html allows you to choose how many tasks you would like to run on a machine. For example, if you have a lot of small tasks that use 1 core/1GB ram, you can put multiple of those on a bigger machine. For example, 8 tasks on an 8 core machine. If you want to utilize machines better for cost savings, you should use crunchstat-summary to find out the maximum memory/cpu usage for one task, and see if you can fit more than 1 of those on a machine. One warning, however is if you do run out of RAM (some compute nodes can't swap) your process will die. |
43 | 17 | Bryan Cosca | |
44 | 19 | Bryan Cosca | If you know how much RAM you want your job to have, you can set min_ram_mb_per_node in runtime_constraints to ensure that you get a compute node big enough for your computation. Most tools scale up with more RAM, so feel free to test your job on different nodetypes to see what it works on the best. |
45 | 17 | Bryan Cosca | |
46 | 19 | Bryan Cosca | If you have a job that will multithread, you should set num_cores_per_node in runtime_constraints to the number of threads you want. Most clouds have machines up to 16 threads, so it would be worth testing on multiple nodetypes to see what works best for you. |
47 | 17 | Bryan Cosca | |
48 | 19 | Bryan Cosca | If you have a job that's very i/o intensive, its worth it to change keep_cache_mb_per_task in runtime_constraints |
49 | 17 | Bryan Cosca | |
50 | |||
51 | 19 | Bryan Cosca | h3. How to optimize the number of tasks when you don't have native multithreading |
52 | 17 | Bryan Cosca | |
53 | 19 | Bryan Cosca | Tools like GATK have native multi-threading where if you pass a -t, it will use the correct number of cores on the node. You usually want take advantage of this, and choose the min_cores_per_node that equals your threading parameter. You can use any number of min_tasks_per_node making sure that your tool-threading*min_tasks_per_node is <= min_cores_per_node. Also making sure that your node has enough RAM to allocate to all the tasks. |
54 | |||
55 | Tools like varscan/freebayes don't have native multi-threading so you need to find a workaround. Generally, these tools have a -L/--intervals to pass in certain loci to work on. If you have a bed file you can split reads on, then you can create a new task per interval. Then, have a job merge the outputs together. |
||
56 | |||
57 | 2 | Bryan Cosca | h2. Crunchstat Summary |
58 | 16 | Bryan Cosca | |
59 | 3 | Bryan Cosca | Crunchstat-summary is an arvados tool to help choose optimal configurations for arvados jobs and pipeline instances. It helps you choose "runtime_constraints":http://doc.arvados.org/api/schema/Job.html specified in the pipeline template under each job by graphing job statistics. For example: CPU usage, RAM, and Keep network traffic over time. |
60 | 1 | Bryan Cosca | |
61 | 2 | Bryan Cosca | h3. How to install crunchstat-summary |
62 | 3 | Bryan Cosca | |
63 | <pre> |
||
64 | $ git clone https://github.com/curoverse/arvados.git |
||
65 | $ cd arvados/tools/crunchstat-summary/ |
||
66 | $ python setup.py build |
||
67 | $ python setup.py install --user |
||
68 | </pre> |
||
69 | 1 | Bryan Cosca | |
70 | h3. How to use crunchstat-summary |
||
71 | 3 | Bryan Cosca | |
72 | <pre> |
||
73 | $ ./bin/crunchstat-summary --help |
||
74 | usage: crunchstat-summary [-h] |
||
75 | [--job UUID | --pipeline-instance UUID | --log-file LOG_FILE] |
||
76 | [--skip-child-jobs] [--format {html,text}] |
||
77 | [--verbose] |
||
78 | |||
79 | Summarize resource usage of an Arvados Crunch job |
||
80 | |||
81 | optional arguments: |
||
82 | -h, --help show this help message and exit |
||
83 | --job UUID Look up the specified job and read its log data from |
||
84 | Keep (or from the Arvados event log, if the job is |
||
85 | still running) |
||
86 | 1 | Bryan Cosca | --pipeline-instance UUID |
87 | 3 | Bryan Cosca | Summarize each component of the given pipeline |
88 | instance |
||
89 | --log-file LOG_FILE Read log data from a regular file |
||
90 | --skip-child-jobs Do not include stats from child jobs |
||
91 | --format {html,text} Report format |
||
92 | --verbose, -v Log more information (once for progress, twice for |
||
93 | debug) |
||
94 | </pre> |
||
95 | 1 | Bryan Cosca | |
96 | 16 | Bryan Cosca | There are two ways of using crunchstat-summary: a text view for an overall view of a job or an html page, which graphs usage over time. |
97 | 14 | Bryan Cosca | |
98 | 16 | Bryan Cosca | Case 1: A job that does bwa-aln mapping and converts to bam using samtools. |
99 | 8 | Bryan Cosca | |
100 | <pre> |
||
101 | 19 | Bryan Cosca | $ ~/arvados/tools/crunchstat-summary/bin/crunchstat-summary --format text --job qr1hi-8i9sb-bzn6hzttfu9cetv |
102 | 8 | Bryan Cosca | category metric task_max task_max_rate job_total |
103 | blkio:202:0 read 310334464 - 913853440 |
||
104 | blkio:202:0 write 2567127040 - 7693406208 |
||
105 | blkio:202:16 read 8036201472 155118884.01 4538585088 |
||
106 | blkio:202:16 write 55502038016 0 0 |
||
107 | blkio:202:32 read 2756608 100760.59 6717440 |
||
108 | blkio:202:32 write 53570560 0 99514368 |
||
109 | cpu cpus 8 - - |
||
110 | cpu sys 1592.34 1.17 805.32 |
||
111 | cpu user 11061.28 7.98 4620.17 |
||
112 | cpu user+sys 12653.62 8.00 5425.49 |
||
113 | mem cache 7454289920 - - |
||
114 | mem pgmajfault 1859 - 830 |
||
115 | mem rss 7965265920 - - |
||
116 | mem swap 5537792 - - |
||
117 | net:docker0 rx 2023609029 - 2093089079 |
||
118 | net:docker0 tx 21404100070 - 49909181906 |
||
119 | net:docker0 tx+rx 23427709099 - 52002270985 |
||
120 | net:eth0 rx 44750669842 67466325.07 14233805360 |
||
121 | net:eth0 tx 2126085781 20171074.09 3670464917 |
||
122 | 1 | Bryan Cosca | net:eth0 tx+rx 46876755623 67673532.73 17904270277 |
123 | time elapsed 949 - 1899 |
||
124 | 8 | Bryan Cosca | # Number of tasks: 3 |
125 | # Max CPU time spent by a single task: 12653.62s |
||
126 | # Max CPU usage in a single interval: 799.88% |
||
127 | # Overall CPU usage: 285.70% |
||
128 | # Max memory used by a single task: 7.97GB |
||
129 | # Max network traffic in a single task: 46.88GB |
||
130 | # Max network speed in a single interval: 67.67MB/s |
||
131 | # Keep cache miss rate 0.00% |
||
132 | # Keep cache utilization 0.00% |
||
133 | 1 | Bryan Cosca | #!! qr1hi-8i9sb-bzn6hzttfu9cetv max CPU usage was 800% -- try runtime_constraints "min_cores_per_node":8 |
134 | 8 | Bryan Cosca | #!! qr1hi-8i9sb-bzn6hzttfu9cetv max RSS was 7597 MiB -- try runtime_constraints "min_ram_mb_per_node":7782 |
135 | </pre> |
||
136 | |||
137 | 19 | Bryan Cosca | $ ~/arvados/tools/crunchstat-summary/bin/crunchstat-summary --format html --job qr1hi-8i9sb-bzn6hzttfu9cetv > qr1hi-8i9sb-bzn6hzttfu9cetv.html |
138 | |||
139 | 1 | Bryan Cosca | !86538baca4ecef099d9fad76ad9c7180.png! |
140 | |||
141 | 16 | Bryan Cosca | Here, you can see the distinct computation steps between the bwa-aln and the samtools step. Since there is a noticeable plateau on CPU usage for both computations, it would be worth trying to run the job on a bigger node. For example, a 16 core node to see if the computation can scale higher than 8 cores. |
142 | 1 | Bryan Cosca | |
143 | 16 | Bryan Cosca | Another thing to note is you can also see the runtime_constraints recommendations. These recommendations are for you to set to ensure the job will be able to call the right node type and run reliably when reproduced. |
144 | |||
145 | 12 | Bryan Cosca | Case study 2: FastQC |
146 | |||
147 | <pre> |
||
148 | 19 | Bryan Cosca | $ ~/arvados/tools/crunchstat-summary/bin/crunchstat-summary --format text --job qr1hi-8i9sb-nxqqxravvapt10h |
149 | 12 | Bryan Cosca | category metric task_max task_max_rate job_total |
150 | blkio:0:0 read 174349211138 65352499.20 174349211138 |
||
151 | blkio:0:0 write 0 0 0 |
||
152 | cpu cpus 8 - - |
||
153 | cpu sys 364.95 0.17 364.95 |
||
154 | cpu user 17589.59 6.59 17589.59 |
||
155 | 1 | Bryan Cosca | cpu user+sys 17954.54 6.72 17954.54 |
156 | fuseops read 1330241 498.40 1330241 |
||
157 | 16 | Bryan Cosca | fuseops write 0 0 0 |
158 | 1 | Bryan Cosca | keepcache hit 2655806 1038.00 2655806 |
159 | keepcache miss 2633 1.60 2633 |
||
160 | keepcalls get 2658439 1039.00 2658439 |
||
161 | 16 | Bryan Cosca | keepcalls put 0 0 0 |
162 | 1 | Bryan Cosca | mem cache 19836608512 - - |
163 | 16 | Bryan Cosca | mem pgmajfault 19 - 19 |
164 | 1 | Bryan Cosca | mem rss 1481367552 - - |
165 | 16 | Bryan Cosca | net:eth0 rx 178321 17798.40 178321 |
166 | net:eth0 tx 7156 685.00 7156 |
||
167 | 11 | Bryan Cosca | net:eth0 tx+rx 185477 18483.40 185477 |
168 | 16 | Bryan Cosca | net:keep0 rx 175959092914 107337311.20 175959092914 |
169 | 1 | Bryan Cosca | net:keep0 tx 0 0 0 |
170 | net:keep0 tx+rx 175959092914 107337311.20 175959092914 |
||
171 | time elapsed 3301 - 3301 |
||
172 | 16 | Bryan Cosca | # Number of tasks: 1 |
173 | 1 | Bryan Cosca | # Max CPU time spent by a single task: 17954.54s |
174 | 16 | Bryan Cosca | # Max CPU usage in a single interval: 672.01% |
175 | 1 | Bryan Cosca | # Overall CPU usage: 543.91% |
176 | 16 | Bryan Cosca | # Max memory used by a single task: 1.48GB |
177 | 1 | Bryan Cosca | # Max network traffic in a single task: 175.96GB |
178 | 16 | Bryan Cosca | # Max network speed in a single interval: 107.36MB/s |
179 | # Keep cache miss rate 0.10% |
||
180 | 11 | Bryan Cosca | # Keep cache utilization 99.09% |
181 | #!! qr1hi-8i9sb-nxqqxravvapt10h max CPU usage was 673% -- try runtime_constraints "min_cores_per_node":7 |
||
182 | 5 | Bryan Cosca | #!! qr1hi-8i9sb-nxqqxravvapt10h max RSS was 1413 MiB -- try runtime_constraints "min_ram_mb_per_node":1945 |
183 | </pre> |
||
184 | 19 | Bryan Cosca | |
185 | ~/arvados/tools/crunchstat-summary/bin/crunchstat-summary --format html --job qr1hi-8i9sb-nxqqxravvapt10h > qr1hi-8i9sb-nxqqxravvapt10h.html |
||
186 | 16 | Bryan Cosca | |
187 | 1 | Bryan Cosca | !62222dc72a51c18c15836796e91f3bc7.png! |
188 | 16 | Bryan Cosca | |
189 | 5 | Bryan Cosca | One thing to point out here is "keep_cache utilization":http://doc.arvados.org/api/schema/Job.html, which can be changed using 'keep_cache_mb_per_task'. You can see keep cache utilization at 99.09%, which means its at a good point. You can try increasing this since it is almost at 100%, but it may not yield significant gains. |
190 | 11 | Bryan Cosca | |
191 | 1 | Bryan Cosca | Another thing to note is to look at the CPU usage and keep transfer rate graphs. You should look to see if they ever mirror each other, which is a sign of a cpu bound job, or an i/o bound job. For example, if keep transfer is low but CPU usage is high, then your job is highly dependent on CPU, which means you should upgrade to a higher core node. If CPU usage is low and keep transfer is high, then you may want to increase the keep_cache_mb_per_task in order to be able to compute on more data. |