Idea #9684
closed[Crunch2] [API] Add Workflow resource
Description
Overview¶
Workflows serve the same purpose for Crunch v2 that PipelineTemplates do for Crunch v1. They store a parameterized workflow, written in CWL. Clients like Workbench and arvados-cwl-runner can let users browse Workflows, or run a specific Workflow with specified inputs.
Much like PipelineInstances include an instantiated copy of their PipelineTemplate, clients that want to run Workflows are expected to copy the entire workflow text into the right place (e.g., arvados-cwl-runner will make it a parameter of a ContainerRequest; crunch-run will mount that parameter as text inside the running Container).
Functional Requirements¶
Add a Workflow resource to the API server. It supports all the same fields and methods as PipelineTemplate, except that instead of a components
attribute, it has a workflow
attribute. This is a text column, intended to store a CWL workflow in its native YAML representation.
The Workflow has a validation that the workflow
attribute is either "blank" (nil or empty) or is valid YAML. [TODO: Is there an easy way to check that it's valid CWL?] A Workflow with workflow
text that is not good YAML is invalid.
The name
and description
attributes are set automatically from the CWL's label
and doc
fields, respectively, unless explicitly overridden by the client.
- On create, if either attribute is unset, if the
workflow
CWL includes the corresponding field, set our attribute from that value. - On update, if not
attribute_changed?
, andattribute_was
matches the value fromworkflow_was
, set our attribute from the corresponding field in theworkflow
CWL (including setting the attribute to nil if the field is no longer set in theworkflow
CWL).
Yes, this means that if a client manually sets the Workflow'sname
to match thelabel
in the CWL, then updates the CWL, thename
can be updated as well. This is a smidgen less than ideal, but we consider it worth the tradeoff for a simple implementation, and not too confusing anyway.
Update the API documentation to include information about the Workflow resource and methods. It should mention that workflow
should be CWL YAML, and the auto-update behavior of the name
and description
attributes.
Updated by Tom Morris over 8 years ago
- Target version set to Arvados Future Sprints
- Story points set to 2.0
Updated by Brett Smith over 8 years ago
Updated with a little background from discussion yesterday, plus the specific names of the CWL fields we should be looking at.
Updated by Peter Amstutz over 8 years ago
Suggest slightly different design:
Add "inputs" field which is the inputs section copied from the CWL file.
The "workflow" field is a keep file reference to the workflow file in the form abcd+123/workflow.cwl
Updated by Radhika Chippada over 8 years ago
- Target version changed from Arvados Future Sprints to 2016-08-17 sprint
Updated by Radhika Chippada over 8 years ago
- Assigned To set to Radhika Chippada
Updated by Radhika Chippada over 8 years ago
- Status changed from New to In Progress
Updated by Tom Clegg over 8 years ago
at 3723f69 on 9684-workflows
How are we handling YAML security problems?- http://ruby-doc.org/stdlib-2.1.9/libdoc/yaml/rdoc/YAML.html: "Do not use YAML to load untrusted data"
- https://github.com/dtao/safe_yaml
Is there some reason for the special treatment of whitespace-only strings? " \t ".blank?
is true, but " \t "
is not valid YAML according to YAML.load, so this exemption seems to let invalid YAML into the database. Can we allow nil, and let YAML.load decide everything else?
I don't think after_save is the right hook for the name/description sync -- the current code doesn't write the synchronized name/description to the database unless the caller hits "save" a second time. Suggest inserting "w.reload" after each "w.update_attributes!" in the unit test to expose this.
In set_name_and_description:- Catching all exceptions and reporting them to the client isn't right. Instead, please check for error cases and return appropriate messages. (I can't tell which exceptions we're supposed to expect here...)
changes = self.changes
is mysterious. Why not just useself.changes.include?(a)
?need_save = false
is superfluous.self.workflow_was.blank?
should be.nil?
- I suppose it works as
[]
, but I thinkold_wf = {}
would be a bit clearer, given that we expect the YAML to express a hash when it is present.
The "valid yaml" fixture should use some more realistic YAML, e.g., "name: foo\ndescription: bar\n"
Please add a unit test that creates a workflow with YAML that encodes something other than a hash, e.g., "foo" or "- foo\n- bar\n"
Updated by Radhika Chippada over 8 years ago
How are we handling YAML security problems ...
Added 'safe_yaml' gem. According to the safe_yaml documentation, the default mode is 'safe' and hence didn't do anything else
Is there some reason for the special treatment of whitespace-only strings? " \t ".blank? is true, but " \t " is not valid YAML according to YAML.load, so this exemption seems to let invalid YAML into the database. Can we allow nil, and let YAML.load decide everything else?
Changed it accordingly
I don't think after_save is the right hook for the name/description sync -- the current code doesn't write the synchronized name/description to the database unless the caller hits "save" a second time. Suggest inserting "w.reload" after each "w.update_attributes!" in the unit test to expose this.
You are right. After verifying the test failures with w.reload, changed from after_save to before_save
In set_name_and_description:
Catching all exceptions and reporting them to the client isn't right. Instead, please check for error cases and return appropriate messages. (I can't tell which exceptions we're supposed to expect here...)
Since it appears that workflow_was yaml validation is the only possible exception in this case, updated to cache only this error
changes = self.changes is mysterious. Why not just use self.changes.include?(a)?
Updated
need_save = false is superfluous.
Yes. Removed
self.workflow_was.blank? should be .nil?
Updated
I suppose it works as [], but I think old_wf = {} would be a bit clearer, given that we expect the YAML to express a hash when it is present.
Updated
The "valid yaml" fixture should use some more realistic YAML, e.g., "name: foo\ndescription: bar\n"
Updated
Please add a unit test that creates a workflow with YAML that encodes something other than a hash, e.g., "foo" or "- foo\n- bar\n"
Added this test. Also added additional test assertions to check name and description when workflow yaml is unset.
Updated by Tom Clegg over 8 years ago
Hmm, does YAML.load actually raise ActiveRecord::RecordInvalid? Whatever it raises, the "rescue" should only cover the YAML.load line, not the whole function.
Also, if the old YAML is invalid we should probably just leave name and description alone and return true. Raising an error seems nice because it acknowledges the error condition, but it also makes it impossible to fix the broken item: "can't update record because something is wrong with the old version"...
The "is not valid yaml" error message probably shouldn't include the whole (non)YAML string -- that could be rather large. It would be nice to include the error provided by the YAML parser, though.
Errors should refer to the client-provided parameter name, not the name of the function doing the check. E.g.:
-errors.add :validate_workflow, "#{...} is not valid yaml"
+errors.add :workflow, "is not valid yaml: #{ ...error detail... }"
Updated by Radhika Chippada over 8 years ago
Addressed those comments at 35806ff
Updated by Radhika Chippada over 8 years ago
Tom: before we wrap this up, can you also please take a look at my note-6 on #9767-6 and let me know if we need to add any more info to the workflow objects?
Updated by Radhika Chippada over 8 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|commit:2755c8e5060fc5f84c91e089b43d5019c84cd93b.
Updated by Radhika Chippada over 8 years ago
The addition of safe_yaml gem requires the addition of an initializer to set deserialize_symbols: https://github.com/dtao/safe_yaml
SafeYAML::OPTIONS[:deserialize_symbols] = true
Updated by Radhika Chippada over 8 years ago
Deleted the initializer and instead update arvados_model -> recursive_stringify to convert ":foo" to "foo". Also, need an extra check to omit this for "::1" which was used by create_superuser_token.rb as the created_by_ip_address. (It appears that it might be desirable if recursive_stringify and has_symbols? in arvados_model.rb omitted checking a "value" of a "key" when the value is not map instead.)