Crunch » History » Version 17
Ward Vandewege, 11/06/2017 07:47 PM
1 | 14 | Ward Vandewege | h1. Crunch - container orchestration |
---|---|---|---|
2 | 1 | Anonymous | |
3 | 17 | Ward Vandewege | Arvados has a robust container orchestration system called 'Crunch', which executes CWL workflows while maintaining provenance and reproducibility. |
4 | 1 | Anonymous | |
5 | h2. Design Goals |
||
6 | |||
7 | Notable design goals and features include: |
||
8 | |||
9 | 6 | Tom Clegg | * Make use of multiple cores and nodes to produce results faster |
10 | * Integrate with [[Keep]] and git repositories to maintain provenance |
||
11 | * Use off-the-shelf software tools in distributed computations |
||
12 | * Efficient over a wide range of problem sizes |
||
13 | * Maximum flexibility of programming language choice |
||
14 | * Maximum flexibility of execution environment |
||
15 | * Tools for building reusable pipelines |
||
16 | 17 | Ward Vandewege | * Low entry barrier for bioinformaticians: "CWL":http://commonwl.org, the standard workflow description language in bioinformatics, is the native workflow description language in Crunch |
17 | 1 | Anonymous | |
18 | 15 | Ward Vandewege | h2. Benefits of Crunch |
19 | 1 | Anonymous | |
20 | 15 | Ward Vandewege | Although some of the workflow and provenance features in Arvados could theoretically be implemented using Hadoop MapReduce, there are distinct benefits to Crunch: |
21 | 1 | Anonymous | |
22 | 16 | Ward Vandewege | * *Provenance and Reproducibility* - Like Keep, the Arvados distributed file system, Crunch is designed to automatically track the origin of result data. It can also efficiently reproduce complex workflows and comparing workflows to one another. |
23 | 6 | Tom Clegg | |
24 | 15 | Ward Vandewege | * *Performance* - Most genomics problems are embarrassingly parallel and can benefit from horizontal scaling. In the cloud, Crunch can deliver cost-effective performance for genomics related analyses by automatically adjusting the available compute resources to the workload. |
25 | 6 | Tom Clegg | |
26 | 15 | Ward Vandewege | * *Standardization* - "Common Workflow Language (CWL)":http://commonwl.org is the workflow description standard in bioinformatics. It is the native workflow language in Crunch. |