Crunch » History » Version 15

Ward Vandewege, 11/06/2017 07:44 PM

1 14 Ward Vandewege
h1. Crunch - container orchestration
2 1 Anonymous
3 14 Ward Vandewege
Arvados has a robust container orchestration system called 'Crunch', which executes CWL workflows while maintaining provenance and reproducibility. 
4 1 Anonymous
5 1 Anonymous
h2. Design Goals
6 1 Anonymous
7 1 Anonymous
Notable design goals and features include:
8 1 Anonymous
9 6 Tom Clegg
* Make use of multiple cores and nodes to produce results faster
10 6 Tom Clegg
* Integrate with [[Keep]] and git repositories to maintain provenance
11 6 Tom Clegg
* Use off-the-shelf software tools in distributed computations
12 6 Tom Clegg
* Efficient over a wide range of problem sizes
13 6 Tom Clegg
* Maximum flexibility of programming language choice
14 6 Tom Clegg
* Maximum flexibility of execution environment
15 6 Tom Clegg
* Tools for building reusable pipelines
16 6 Tom Clegg
* Lower entry barrier for users
17 1 Anonymous
18 15 Ward Vandewege
h2. Benefits of Crunch
19 1 Anonymous
20 15 Ward Vandewege
Although some of the workflow and provenance features in Arvados could theoretically be implemented using Hadoop MapReduce, there are distinct benefits to Crunch:
21 1 Anonymous
22 15 Ward Vandewege
* *Provenance and Reproducibility* - Like Keep, the Arvados distributed file system, Crunch is designed to automate tracking the origin of result data, reproducing complex pipelines, and comparing pipelines to one another.
23 6 Tom Clegg
24 15 Ward Vandewege
* *Performance* - Most genomics problems are embarrassingly parallel and can benefit from horizontal scaling. In the cloud, Crunch can deliver cost-effective performance for genomics related analyses by automatically adjusting the available compute resources to the workload.
25 6 Tom Clegg
26 15 Ward Vandewege
* *Standardization* - "Common Workflow Language (CWL)":http://commonwl.org is the workflow description standard in bioinformatics. It is the native workflow language in Crunch.