Story #13429

[API] [arvados-cwl-runner] Save workflow outputs to desired storage classes

Added by Tom Morris 12 months ago. Updated 9 months ago.

Status:
Closed
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
06/04/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0
Release:
Release relationship:
Auto

Description

arvados-cwl-runner should accept a command line option that determines the storage classes used to save the workflow output.

The minimal implementation is to update the storage_classes_desired field of the output collection. It's OK that this wastes storage resources by first writing the output data to the default class, then copying it to the desired class, and then (after blobSignatureTTL) deleting it from default.


Subtasks

Task #13524: Review 13429-cwl-runner-storage-classes-supportResolvedFuad Muhic


Related issues

Blocks Arvados - Feature #11184: [Keep] Support multiple storage classesIn Progress

Associated revisions

Revision 3fbabc1b
Added by Fuad Muhic 10 months ago

Merge branch '13429-cwl-runner-storage-classes-support'

closes #13429

Arvados-DCO-1.1-Signed-off-by: Fuad Muhic <>

History

#1 Updated by Tom Morris 12 months ago

  • Related to Feature #11184: [Keep] Support multiple storage classes added

#2 Updated by Tom Morris 12 months ago

  • Target version set to To Be Groomed

#3 Updated by Tom Clegg 12 months ago

  • Description updated (diff)
  • Subject changed from arvados-cwl-runner support storage tiers to [API] [arvados-cwl-runner] Save workflow outputs to desired storage classes

#4 Updated by Tom Clegg 12 months ago

  • Related to deleted (Feature #11184: [Keep] Support multiple storage classes)

#5 Updated by Tom Clegg 12 months ago

  • Blocks Feature #11184: [Keep] Support multiple storage classes added

#6 Updated by Tom Clegg 12 months ago

  • Description updated (diff)

#7 Updated by Tom Morris 12 months ago

  • Target version changed from To Be Groomed to Arvados Future Sprints
  • Story points set to 1.0

#8 Updated by Tom Morris 11 months ago

  • Target version changed from Arvados Future Sprints to 2018-06-06 Sprint

#9 Updated by Peter Amstutz 11 months ago

  • Assigned To set to Fuad Muhic

#10 Updated by Lucas Di Pentima 11 months ago

Reviewing 95c01d22571043d7b6405f928538204cf930d453

  • File sdk/cwl/arvados_cwl/__init__.py
    • Line 604: I think the default storage classes value could be set on the parser.add_argument() call
    • Line 606: On 13430 one of the requirements was to accept just one storage class for the moment, it that also desirable on this story?
    • Line 727: Typo on help message: 'wortkflow'
  • I think some tests are needed for cases when the required storage classes is not the default value, and to check that it errors out when passed multiple storage classes.

#11 Updated by Fuad Muhic 11 months ago

  • Status changed from New to In Progress

#12 Updated by Fuad Muhic 11 months ago

I added check and tests for multiple storage classes. I think it would be nice to have a test that checks if output collection
contains storage_classes_desired field set to proper value defined by --storage-classes argument. However I'm really not sure how to write that test, I tried but I guess I'm not experienced enough with cwl-runner.

#13 Updated by Fuad Muhic 11 months ago

  • Target version changed from 2018-06-06 Sprint to 2018-06-20 Sprint

#14 Updated by Lucas Di Pentima 11 months ago

Reviewing b01b480adb45c3bbfcfab13e343e08c16854dedc

  • File sdk/cwl/tests/test_submit.py * Line 338: capture_stdout is not being used after the main() call so I suppose it can avoided or read from it to assure that the correct error message is being printed?
  • File sdk/cwl/arvados_cwl/__init__.py * Line 789: On error I think the way to go is to log it and return 1 (as in lines 804 & 824)
  • I think a test that assures an output collection has correct correct storage_classes_desired field is not necessary as the Python SDK Collection’s test already proves that. OTOH I think having a test that confirms that default & non-default —storage-classes values are passed correctly to make_output_collection() will be very useful. I think this can be done using a similar approach as your test_error_when_multiple_storage_classes_specified test and mocking make_output_collection to check how it was called.
  • I realized that the storage classes argument is being used only when —submit is not added. When it is used, a-c-r submits a container that runs another a-c-r with all necessary params (see test at test_submit.py line 591, for example), so we’ll need a similar test that assures that the correct@ —storage-classes@ param is being passed along. I’m not sure where this needs to be added, please talk to Peter if you don’t find it.
  • Also, remember to merge master into this branch so Jenkins test don't fail because of ciso8601

#15 Updated by Fuad Muhic 10 months ago

tests: https://ci.curoverse.com/job/developer-run-tests/759/ (with merged master)

  • Added test which check that storage_classes parameters is propagated correctly to make_output_collection
  • In --submit mode, storage_classes are passes to submitted container. (I opted to pass --storage-classes parameter only if user specified if in the first place. If user don't specify --storage-classes, submitted container will not have --storage-classes=default. Everything still works correctly and we don't have to add 100+ lines to test_submit.py). I added tests for this.

#16 Updated by Lucas Di Pentima 10 months ago

This LGTM. Thanks!

#17 Updated by Fuad Muhic 10 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

#18 Updated by Fuad Muhic 10 months ago

  • Status changed from Resolved to Closed

#19 Updated by Tom Morris 9 months ago

  • Release set to 13

Also available in: Atom PDF