Story #6264

[CWL] Prototype Arvados CWL pipeline runner

Added by Brett Smith over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
07/01/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0

Description

Implement a tool that acts like arv-run-pipeline-instance that accepts CWL workflows as inputs. It creates and monitors Arvados jobs, continuing until the workflow reaches an end state.

Goals:

  • Have an implementation to demonstrate at BOSC.
  • Build a base that can be developed into a workflow runner usable under Crunch v2.

Simplifying assumptions:

  • For now, the runner only supports an active monitoring mode, akin to arv-run-pipeline-instance's default run mode. No need to support switches like --submit or --no-wait.
  • The runner runs on a host where it can start a Docker container to support CWL's scripting engine capabilities—probably an Arvados shell node.
  • The workflow is only expected to succeed if it uses Docker images with bare minimum Arvados support. At the very least, they need to have virtualenv installed, which Crunch can use to install an Arvados SDK. Accepting arbitrary Docker images will probably require more support from Crunch v2 so it's out of scope for now.

Subtasks

Task #6294: Review 6264-cwl-runnerResolvedPeter Amstutz

Task #6365: Write runner using cwltool and run-commandResolvedPeter Amstutz

Task #6464: Review 6264-run-command-task-envResolvedPeter Amstutz


Related issues

Related to Arvados - Story #4685: [Crunch] CWL prototype workflow runner in ArvadosResolved04/07/2015

Associated revisions

Revision 8048e6da
Added by Peter Amstutz over 6 years ago

Merge branch '6264-run-command-task-env' refs #6264

Revision b9220341
Added by Peter Amstutz over 6 years ago

Merge branch '6264-cwl-runner' closes #6264

History

#1 Updated by Brett Smith over 6 years ago

  • Description updated (diff)
  • Story points set to 3.0

#2 Updated by Brett Smith over 6 years ago

  • Subject changed from [DRAFT] [CWL] Prototype Arvados CWL pipeline runner to [CWL] Prototype Arvados CWL pipeline runner

#3 Updated by Peter Amstutz over 6 years ago

  • Assigned To set to Peter Amstutz

#4 Updated by Peter Amstutz over 6 years ago

  • Status changed from New to In Progress

#5 Updated by Brett Smith over 6 years ago

6264-run-command-task-env @ 15ccbba is good to merge.

One thing I noticed is that if the user specifies a non-string value in task.env (most likely a numeric type), they'll get an apparently random exception from subst.do_substitution that doesn't really enlighten what the problem is. This same issue exists elsewhere in the code, but this sort of accidental type error seems more likely for task.env than, say, task.stdout. I wanted to flag it, but if you think dealing with it is out of scope for the branch, that's fair.

Thanks.

#6 Updated by Brett Smith over 6 years ago

Reviewing 6264-cwl-runner @ 53755b6

Considering the circumstances, I think this is good to merge. I would like to see tests, but I figure that's not doable before BOSC. I know you've put a lot of work into manual testing—maybe a quick demo or tour of how that works on Monday would be good.

Suggestions I'd make from here:

  • I think I'd have had an easier time following the code if these variables had more descriptive names:
    • i in _match
    • t and r in arvExecutor
    • (I know this isn't new code, but) n and c in uploadfiles
  • Putting the reuse command line switches in a mutually exclusive group would make a nice user affordance.
  • I think the three-line for loop to copy self.environment to task.env can just become script_paramters["task_env"].update(self.environment).
  • Why does the regexp in ArvPathMapper capture the path (in a group, natch)? I don't see it doing anything with that, so it looks like needless complexity.

Thanks.

#7 Updated by Brett Smith over 6 years ago

  • Target version changed from 2015-07-08 sprint to 2015-07-22 sprint

#8 Updated by Brett Smith over 6 years ago

  • Story points changed from 3.0 to 1.0

#9 Updated by Peter Amstutz over 6 years ago

Brett Smith wrote:

Reviewing 6264-cwl-runner @ 53755b6

Considering the circumstances, I think this is good to merge. I would like to see tests, but I figure that's not doable before BOSC. I know you've put a lot of work into manual testing—maybe a quick demo or tour of how that works on Monday would be good.

Suggestions I'd make from here:

  • I think I'd have had an easier time following the code if these variables had more descriptive names:
    • i in _match
    • t and r in arvExecutor

Fixed.

  • (I know this isn't new code, but) n and c in uploadfiles

Didn't fix.

  • Putting the reuse command line switches in a mutually exclusive group would make a nice user affordance.

Good idea, done.

  • I think the three-line for loop to copy self.environment to task.env can just become script_paramters["task_env"].update(self.environment).

Thanks, done.

  • Why does the regexp in ArvPathMapper capture the path (in a group, natch)? I don't see it doing anything with that, so it looks like needless complexity.

Can't remember, either that's was leftover from copying the regex from somewhere else (where unfortunately the source regex wasn't exactly what I needed...) or I was using it and then refactored the part that needed the path somewhere else. In either case, I removed the capture group.

Thanks

#10 Updated by Brett Smith over 6 years ago

Peter Amstutz wrote:

Brett Smith wrote:

  • Why does the regexp in ArvPathMapper capture the path (in a group, natch)? I don't see it doing anything with that, so it looks like needless complexity.

Can't remember, either that's was leftover from copying the regex from somewhere else (where unfortunately the source regex wasn't exactly what I needed...) or I was using it and then refactored the part that needed the path somewhere else. In either case, I removed the capture group.

But what I'm saying is, why match the path characters at all? When all you're doing is a boolean match, ^string.* is functionally identical to ^string.

But this is still good to merge, thanks.

#11 Updated by Peter Amstutz over 6 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 67 to 100

Applied in changeset arvados|commit:b92203411f6f6adaef1c2af62495830f13f4fa14.

Also available in: Atom PDF