Project

General

Profile

Actions

Idea #6264

closed

[CWL] Prototype Arvados CWL pipeline runner

Added by Brett Smith almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
07/01/2015
Due date:
Story points:
1.0

Description

Implement a tool that acts like arv-run-pipeline-instance that accepts CWL workflows as inputs. It creates and monitors Arvados jobs, continuing until the workflow reaches an end state.

Goals:

  • Have an implementation to demonstrate at BOSC.
  • Build a base that can be developed into a workflow runner usable under Crunch v2.

Simplifying assumptions:

  • For now, the runner only supports an active monitoring mode, akin to arv-run-pipeline-instance's default run mode. No need to support switches like --submit or --no-wait.
  • The runner runs on a host where it can start a Docker container to support CWL's scripting engine capabilities—probably an Arvados shell node.
  • The workflow is only expected to succeed if it uses Docker images with bare minimum Arvados support. At the very least, they need to have virtualenv installed, which Crunch can use to install an Arvados SDK. Accepting arbitrary Docker images will probably require more support from Crunch v2 so it's out of scope for now.

Subtasks 3 (0 open3 closed)

Task #6294: Review 6264-cwl-runnerResolvedPeter Amstutz07/02/2015Actions
Task #6365: Write runner using cwltool and run-commandResolvedPeter Amstutz07/01/2015Actions
Task #6464: Review 6264-run-command-task-envResolvedPeter Amstutz07/01/2015Actions

Related issues

Related to Arvados - Idea #4685: [Crunch] CWL prototype workflow runner in ArvadosResolvedPeter Amstutz04/07/2015Actions
Actions #1

Updated by Brett Smith almost 9 years ago

  • Description updated (diff)
  • Story points set to 3.0
Actions #2

Updated by Brett Smith almost 9 years ago

  • Subject changed from [DRAFT] [CWL] Prototype Arvados CWL pipeline runner to [CWL] Prototype Arvados CWL pipeline runner
Actions #3

Updated by Peter Amstutz almost 9 years ago

  • Assigned To set to Peter Amstutz
Actions #4

Updated by Peter Amstutz almost 9 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Brett Smith almost 9 years ago

6264-run-command-task-env @ 15ccbba is good to merge.

One thing I noticed is that if the user specifies a non-string value in task.env (most likely a numeric type), they'll get an apparently random exception from subst.do_substitution that doesn't really enlighten what the problem is. This same issue exists elsewhere in the code, but this sort of accidental type error seems more likely for task.env than, say, task.stdout. I wanted to flag it, but if you think dealing with it is out of scope for the branch, that's fair.

Thanks.

Actions #6

Updated by Brett Smith almost 9 years ago

Reviewing 6264-cwl-runner @ 53755b6

Considering the circumstances, I think this is good to merge. I would like to see tests, but I figure that's not doable before BOSC. I know you've put a lot of work into manual testing—maybe a quick demo or tour of how that works on Monday would be good.

Suggestions I'd make from here:

  • I think I'd have had an easier time following the code if these variables had more descriptive names:
    • i in _match
    • t and r in arvExecutor
    • (I know this isn't new code, but) n and c in uploadfiles
  • Putting the reuse command line switches in a mutually exclusive group would make a nice user affordance.
  • I think the three-line for loop to copy self.environment to task.env can just become script_paramters["task_env"].update(self.environment).
  • Why does the regexp in ArvPathMapper capture the path (in a group, natch)? I don't see it doing anything with that, so it looks like needless complexity.

Thanks.

Actions #7

Updated by Brett Smith almost 9 years ago

  • Target version changed from 2015-07-08 sprint to 2015-07-22 sprint
Actions #8

Updated by Brett Smith almost 9 years ago

  • Story points changed from 3.0 to 1.0
Actions #9

Updated by Peter Amstutz almost 9 years ago

Brett Smith wrote:

Reviewing 6264-cwl-runner @ 53755b6

Considering the circumstances, I think this is good to merge. I would like to see tests, but I figure that's not doable before BOSC. I know you've put a lot of work into manual testing—maybe a quick demo or tour of how that works on Monday would be good.

Suggestions I'd make from here:

  • I think I'd have had an easier time following the code if these variables had more descriptive names:
    • i in _match
    • t and r in arvExecutor

Fixed.

  • (I know this isn't new code, but) n and c in uploadfiles

Didn't fix.

  • Putting the reuse command line switches in a mutually exclusive group would make a nice user affordance.

Good idea, done.

  • I think the three-line for loop to copy self.environment to task.env can just become script_paramters["task_env"].update(self.environment).

Thanks, done.

  • Why does the regexp in ArvPathMapper capture the path (in a group, natch)? I don't see it doing anything with that, so it looks like needless complexity.

Can't remember, either that's was leftover from copying the regex from somewhere else (where unfortunately the source regex wasn't exactly what I needed...) or I was using it and then refactored the part that needed the path somewhere else. In either case, I removed the capture group.

Thanks

Actions #10

Updated by Brett Smith almost 9 years ago

Peter Amstutz wrote:

Brett Smith wrote:

  • Why does the regexp in ArvPathMapper capture the path (in a group, natch)? I don't see it doing anything with that, so it looks like needless complexity.

Can't remember, either that's was leftover from copying the regex from somewhere else (where unfortunately the source regex wasn't exactly what I needed...) or I was using it and then refactored the part that needed the path somewhere else. In either case, I removed the capture group.

But what I'm saying is, why match the path characters at all? When all you're doing is a boolean match, ^string.* is functionally identical to ^string.

But this is still good to merge, thanks.

Actions #11

Updated by Peter Amstutz almost 9 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 67 to 100

Applied in changeset arvados|commit:b92203411f6f6adaef1c2af62495830f13f4fa14.

Actions

Also available in: Atom PDF