Story #9684

[Crunch2] [API] Add Workflow resource

Added by Brett Smith over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
API
Target version:
Start date:
08/02/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
2.0

Description

Overview

Workflows serve the same purpose for Crunch v2 that PipelineTemplates do for Crunch v1. They store a parameterized workflow, written in CWL. Clients like Workbench and arvados-cwl-runner can let users browse Workflows, or run a specific Workflow with specified inputs.

Much like PipelineInstances include an instantiated copy of their PipelineTemplate, clients that want to run Workflows are expected to copy the entire workflow text into the right place (e.g., arvados-cwl-runner will make it a parameter of a ContainerRequest; crunch-run will mount that parameter as text inside the running Container).

Functional Requirements

Add a Workflow resource to the API server. It supports all the same fields and methods as PipelineTemplate, except that instead of a components attribute, it has a workflow attribute. This is a text column, intended to store a CWL workflow in its native YAML representation.

The Workflow has a validation that the workflow attribute is either "blank" (nil or empty) or is valid YAML. [TODO: Is there an easy way to check that it's valid CWL?] A Workflow with workflow text that is not good YAML is invalid.

The name and description attributes are set automatically from the CWL's label and doc fields, respectively, unless explicitly overridden by the client.

  • On create, if either attribute is unset, if the workflow CWL includes the corresponding field, set our attribute from that value.
  • On update, if not attribute_changed?, and attribute_was matches the value from workflow_was, set our attribute from the corresponding field in the workflow CWL (including setting the attribute to nil if the field is no longer set in the workflow CWL).
    Yes, this means that if a client manually sets the Workflow's name to match the label in the CWL, then updates the CWL, the name can be updated as well. This is a smidgen less than ideal, but we consider it worth the tradeoff for a simple implementation, and not too confusing anyway.

Update the API documentation to include information about the Workflow resource and methods. It should mention that workflow should be CWL YAML, and the auto-update behavior of the name and description attributes.


Subtasks

Task #9720: Review branch 9684-workflowsResolvedRadhika Chippada


Related issues

Related to Arvados - Bug #9826: [API] crunch-dispatch crash on startResolved08/22/2016

Associated revisions

Revision 2755c8e5
Added by Radhika Chippada over 4 years ago

closes #9684
Merge branch '9684-workflows'

Revision d4bb1f8a
Added by Tom Clegg over 4 years ago

Merge branch '9684-workflows'

refs #9684

History

#1 Updated by Tom Morris over 4 years ago

  • Target version set to Arvados Future Sprints
  • Story points set to 2.0

#2 Updated by Brett Smith over 4 years ago

  • Description updated (diff)

#3 Updated by Brett Smith over 4 years ago

Updated with a little background from discussion yesterday, plus the specific names of the CWL fields we should be looking at.

#4 Updated by Peter Amstutz over 4 years ago

Suggest slightly different design:

Add "inputs" field which is the inputs section copied from the CWL file.
The "workflow" field is a keep file reference to the workflow file in the form abcd+123/workflow.cwl

#5 Updated by Radhika Chippada over 4 years ago

  • Target version changed from Arvados Future Sprints to 2016-08-17 sprint

#6 Updated by Radhika Chippada over 4 years ago

  • Assigned To set to Radhika Chippada

#7 Updated by Radhika Chippada over 4 years ago

  • Status changed from New to In Progress

#8 Updated by Tom Clegg over 4 years ago

at 3723f69 on 9684-workflows

How are we handling YAML security problems?

Is there some reason for the special treatment of whitespace-only strings? " \t ".blank? is true, but " \t " is not valid YAML according to YAML.load, so this exemption seems to let invalid YAML into the database. Can we allow nil, and let YAML.load decide everything else?

I don't think after_save is the right hook for the name/description sync -- the current code doesn't write the synchronized name/description to the database unless the caller hits "save" a second time. Suggest inserting "w.reload" after each "w.update_attributes!" in the unit test to expose this.

In set_name_and_description:
  • Catching all exceptions and reporting them to the client isn't right. Instead, please check for error cases and return appropriate messages. (I can't tell which exceptions we're supposed to expect here...)
  • changes = self.changes is mysterious. Why not just use self.changes.include?(a)?
  • need_save = false is superfluous.
  • self.workflow_was.blank? should be .nil?
  • I suppose it works as [], but I think old_wf = {} would be a bit clearer, given that we expect the YAML to express a hash when it is present.

The "valid yaml" fixture should use some more realistic YAML, e.g., "name: foo\ndescription: bar\n"

Please add a unit test that creates a workflow with YAML that encodes something other than a hash, e.g., "foo" or "- foo\n- bar\n"

#9 Updated by Radhika Chippada over 4 years ago

How are we handling YAML security problems ...

Added 'safe_yaml' gem. According to the safe_yaml documentation, the default mode is 'safe' and hence didn't do anything else

Is there some reason for the special treatment of whitespace-only strings? " \t ".blank? is true, but " \t " is not valid YAML according to YAML.load, so this exemption seems to let invalid YAML into the database. Can we allow nil, and let YAML.load decide everything else?

Changed it accordingly

I don't think after_save is the right hook for the name/description sync -- the current code doesn't write the synchronized name/description to the database unless the caller hits "save" a second time. Suggest inserting "w.reload" after each "w.update_attributes!" in the unit test to expose this.

You are right. After verifying the test failures with w.reload, changed from after_save to before_save

In set_name_and_description:

Catching all exceptions and reporting them to the client isn't right. Instead, please check for error cases and return appropriate messages. (I can't tell which exceptions we're supposed to expect here...)

Since it appears that workflow_was yaml validation is the only possible exception in this case, updated to cache only this error

changes = self.changes is mysterious. Why not just use self.changes.include?(a)?

Updated

need_save = false is superfluous.

Yes. Removed

self.workflow_was.blank? should be .nil?

Updated

I suppose it works as [], but I think old_wf = {} would be a bit clearer, given that we expect the YAML to express a hash when it is present.

Updated

The "valid yaml" fixture should use some more realistic YAML, e.g., "name: foo\ndescription: bar\n"

Updated

Please add a unit test that creates a workflow with YAML that encodes something other than a hash, e.g., "foo" or "- foo\n- bar\n"

Added this test. Also added additional test assertions to check name and description when workflow yaml is unset.

#10 Updated by Tom Clegg over 4 years ago

Hmm, does YAML.load actually raise ActiveRecord::RecordInvalid? Whatever it raises, the "rescue" should only cover the YAML.load line, not the whole function.

Also, if the old YAML is invalid we should probably just leave name and description alone and return true. Raising an error seems nice because it acknowledges the error condition, but it also makes it impossible to fix the broken item: "can't update record because something is wrong with the old version"...

The "is not valid yaml" error message probably shouldn't include the whole (non)YAML string -- that could be rather large. It would be nice to include the error provided by the YAML parser, though.

Errors should refer to the client-provided parameter name, not the name of the function doing the check. E.g.:

-errors.add :validate_workflow, "#{...} is not valid yaml" 
+errors.add :workflow, "is not valid yaml: #{ ...error detail... }" 

#11 Updated by Radhika Chippada over 4 years ago

Addressed those comments at 35806ff

#12 Updated by Radhika Chippada over 4 years ago

Tom: before we wrap this up, can you also please take a look at my note-6 on #9767-6 and let me know if we need to add any more info to the workflow objects?

#13 Updated by Tom Clegg over 4 years ago

LGTM, thanks!

#14 Updated by Radhika Chippada over 4 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:2755c8e5060fc5f84c91e089b43d5019c84cd93b.

#15 Updated by Radhika Chippada over 4 years ago

The addition of safe_yaml gem requires the addition of an initializer to set deserialize_symbols: https://github.com/dtao/safe_yaml

SafeYAML::OPTIONS[:deserialize_symbols] = true

#16 Updated by Radhika Chippada over 4 years ago

Deleted the initializer and instead update arvados_model -> recursive_stringify to convert ":foo" to "foo". Also, need an extra check to omit this for "::1" which was used by create_superuser_token.rb as the created_by_ip_address. (It appears that it might be desirable if recursive_stringify and has_symbols? in arvados_model.rb omitted checking a "value" of a "key" when the value is not map instead.)

Also available in: Atom PDF