Project

General

Profile

Actions

Idea #13429

closed

[API] [arvados-cwl-runner] Save workflow outputs to desired storage classes

Added by Tom Morris over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
06/04/2018
Due date:
Story points:
1.0
Release:
Release relationship:
Auto

Description

arvados-cwl-runner should accept a command line option that determines the storage classes used to save the workflow output.

The minimal implementation is to update the storage_classes_desired field of the output collection. It's OK that this wastes storage resources by first writing the output data to the default class, then copying it to the desired class, and then (after blobSignatureTTL) deleting it from default.


Subtasks 1 (0 open1 closed)

Task #13524: Review 13429-cwl-runner-storage-classes-supportResolvedFuad Muhic06/04/2018Actions

Related issues 1 (0 open1 closed)

Blocks Arvados - Feature #11184: [Keep] Support multiple storage classesResolvedTom MorrisActions
Actions #1

Updated by Tom Morris over 6 years ago

  • Related to Feature #11184: [Keep] Support multiple storage classes added
Actions #2

Updated by Tom Morris over 6 years ago

  • Target version set to To Be Groomed
Actions #3

Updated by Tom Clegg over 6 years ago

  • Subject changed from arvados-cwl-runner support storage tiers to [API] [arvados-cwl-runner] Save workflow outputs to desired storage classes
  • Description updated (diff)
Actions #4

Updated by Tom Clegg over 6 years ago

  • Related to deleted (Feature #11184: [Keep] Support multiple storage classes)
Actions #5

Updated by Tom Clegg over 6 years ago

  • Blocks Feature #11184: [Keep] Support multiple storage classes added
Actions #6

Updated by Tom Clegg over 6 years ago

  • Description updated (diff)
Actions #7

Updated by Tom Morris over 6 years ago

  • Target version changed from To Be Groomed to Arvados Future Sprints
  • Story points set to 1.0
Actions #8

Updated by Tom Morris over 6 years ago

  • Target version changed from Arvados Future Sprints to 2018-06-06 Sprint
Actions #9

Updated by Peter Amstutz over 6 years ago

  • Assigned To set to Fuad Muhic
Actions #10

Updated by Lucas Di Pentima over 6 years ago

Reviewing 95c01d22571043d7b6405f928538204cf930d453

  • File sdk/cwl/arvados_cwl/__init__.py
    • Line 604: I think the default storage classes value could be set on the parser.add_argument() call
    • Line 606: On 13430 one of the requirements was to accept just one storage class for the moment, it that also desirable on this story?
    • Line 727: Typo on help message: 'wortkflow'
  • I think some tests are needed for cases when the required storage classes is not the default value, and to check that it errors out when passed multiple storage classes.
Actions #11

Updated by Fuad Muhic over 6 years ago

  • Status changed from New to In Progress
Actions #12

Updated by Fuad Muhic over 6 years ago

I added check and tests for multiple storage classes. I think it would be nice to have a test that checks if output collection
contains storage_classes_desired field set to proper value defined by --storage-classes argument. However I'm really not sure how to write that test, I tried but I guess I'm not experienced enough with cwl-runner.

Actions #13

Updated by Fuad Muhic over 6 years ago

  • Target version changed from 2018-06-06 Sprint to 2018-06-20 Sprint
Actions #14

Updated by Lucas Di Pentima over 6 years ago

Reviewing b01b480adb45c3bbfcfab13e343e08c16854dedc

  • File sdk/cwl/tests/test_submit.py * Line 338: capture_stdout is not being used after the main() call so I suppose it can avoided or read from it to assure that the correct error message is being printed?
  • File sdk/cwl/arvados_cwl/__init__.py * Line 789: On error I think the way to go is to log it and return 1 (as in lines 804 & 824)
  • I think a test that assures an output collection has correct correct storage_classes_desired field is not necessary as the Python SDK Collection’s test already proves that. OTOH I think having a test that confirms that default & non-default —storage-classes values are passed correctly to make_output_collection() will be very useful. I think this can be done using a similar approach as your test_error_when_multiple_storage_classes_specified test and mocking make_output_collection to check how it was called.
  • I realized that the storage classes argument is being used only when —submit is not added. When it is used, a-c-r submits a container that runs another a-c-r with all necessary params (see test at test_submit.py line 591, for example), so we’ll need a similar test that assures that the correct@ —storage-classes@ param is being passed along. I’m not sure where this needs to be added, please talk to Peter if you don’t find it.
  • Also, remember to merge master into this branch so Jenkins test don't fail because of ciso8601
Actions #15

Updated by Fuad Muhic over 6 years ago

tests: https://ci.curoverse.com/job/developer-run-tests/759/ (with merged master)

  • Added test which check that storage_classes parameters is propagated correctly to make_output_collection
  • In --submit mode, storage_classes are passes to submitted container. (I opted to pass --storage-classes parameter only if user specified if in the first place. If user don't specify --storage-classes, submitted container will not have --storage-classes=default. Everything still works correctly and we don't have to add 100+ lines to test_submit.py). I added tests for this.
Actions #16

Updated by Lucas Di Pentima over 6 years ago

This LGTM. Thanks!

Actions #17

Updated by Fuad Muhic over 6 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100
Actions #18

Updated by Fuad Muhic over 6 years ago

  • Status changed from Resolved to Closed
Actions #19

Updated by Tom Morris over 6 years ago

  • Release set to 13
Actions

Also available in: Atom PDF