Idea #6264
closed[CWL] Prototype Arvados CWL pipeline runner
Description
Implement a tool that acts like arv-run-pipeline-instance that accepts CWL workflows as inputs. It creates and monitors Arvados jobs, continuing until the workflow reaches an end state.
Goals:
- Have an implementation to demonstrate at BOSC.
- Build a base that can be developed into a workflow runner usable under Crunch v2.
Simplifying assumptions:
- For now, the runner only supports an active monitoring mode, akin to arv-run-pipeline-instance's default run mode. No need to support switches like
--submit
or--no-wait
. - The runner runs on a host where it can start a Docker container to support CWL's scripting engine capabilities—probably an Arvados shell node.
- The workflow is only expected to succeed if it uses Docker images with bare minimum Arvados support. At the very least, they need to have virtualenv installed, which Crunch can use to install an Arvados SDK. Accepting arbitrary Docker images will probably require more support from Crunch v2 so it's out of scope for now.
Related issues
Updated by Brett Smith over 9 years ago
- Description updated (diff)
- Story points set to 3.0
Updated by Brett Smith over 9 years ago
- Subject changed from [DRAFT] [CWL] Prototype Arvados CWL pipeline runner to [CWL] Prototype Arvados CWL pipeline runner
Updated by Peter Amstutz over 9 years ago
- Status changed from New to In Progress
Updated by Brett Smith over 9 years ago
6264-run-command-task-env @ 15ccbba is good to merge.
One thing I noticed is that if the user specifies a non-string value in task.env (most likely a numeric type), they'll get an apparently random exception from subst.do_substitution
that doesn't really enlighten what the problem is. This same issue exists elsewhere in the code, but this sort of accidental type error seems more likely for task.env than, say, task.stdout. I wanted to flag it, but if you think dealing with it is out of scope for the branch, that's fair.
Thanks.
Updated by Brett Smith over 9 years ago
Reviewing 6264-cwl-runner @ 53755b6
Considering the circumstances, I think this is good to merge. I would like to see tests, but I figure that's not doable before BOSC. I know you've put a lot of work into manual testing—maybe a quick demo or tour of how that works on Monday would be good.
Suggestions I'd make from here:
- I think I'd have had an easier time following the code if these variables had more descriptive names:
i
in_match
t
andr
inarvExecutor
- (I know this isn't new code, but)
n
andc
inuploadfiles
- Putting the reuse command line switches in a mutually exclusive group would make a nice user affordance.
- I think the three-line for loop to copy
self.environment
totask.env
can just becomescript_paramters["task_env"].update(self.environment)
. - Why does the regexp in ArvPathMapper capture the path (in a group, natch)? I don't see it doing anything with that, so it looks like needless complexity.
Thanks.
Updated by Brett Smith over 9 years ago
- Target version changed from 2015-07-08 sprint to 2015-07-22 sprint
Updated by Peter Amstutz over 9 years ago
Brett Smith wrote:
Reviewing 6264-cwl-runner @ 53755b6
Considering the circumstances, I think this is good to merge. I would like to see tests, but I figure that's not doable before BOSC. I know you've put a lot of work into manual testing—maybe a quick demo or tour of how that works on Monday would be good.
Suggestions I'd make from here:
- I think I'd have had an easier time following the code if these variables had more descriptive names:
i
in_match
t
andr
inarvExecutor
Fixed.
- (I know this isn't new code, but)
n
andc
inuploadfiles
Didn't fix.
- Putting the reuse command line switches in a mutually exclusive group would make a nice user affordance.
Good idea, done.
- I think the three-line for loop to copy
self.environment
totask.env
can just becomescript_paramters["task_env"].update(self.environment)
.
Thanks, done.
- Why does the regexp in ArvPathMapper capture the path (in a group, natch)? I don't see it doing anything with that, so it looks like needless complexity.
Can't remember, either that's was leftover from copying the regex from somewhere else (where unfortunately the source regex wasn't exactly what I needed...) or I was using it and then refactored the part that needed the path somewhere else. In either case, I removed the capture group.
Thanks
Updated by Brett Smith over 9 years ago
Peter Amstutz wrote:
Brett Smith wrote:
- Why does the regexp in ArvPathMapper capture the path (in a group, natch)? I don't see it doing anything with that, so it looks like needless complexity.
Can't remember, either that's was leftover from copying the regex from somewhere else (where unfortunately the source regex wasn't exactly what I needed...) or I was using it and then refactored the part that needed the path somewhere else. In either case, I removed the capture group.
But what I'm saying is, why match the path characters at all? When all you're doing is a boolean match, ^string.*
is functionally identical to ^string
.
But this is still good to merge, thanks.
Updated by Peter Amstutz over 9 years ago
- Status changed from In Progress to Resolved
- % Done changed from 67 to 100
Applied in changeset arvados|commit:b92203411f6f6adaef1c2af62495830f13f4fa14.