Story #13430

[arv-put] [Python] Allow caller to specify storage classes when writing data to Keep

Added by Tom Morris over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
05/22/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
2.0
Release:
Release relationship:
Auto

Description

Examples
  • Command line (arv-put --storage-classes=archive ...)
  • Create a new empty Collection and set desired storage classes (default being "default")
  • Load an existing Collection, write some new data, and save
  • Load an existing Collection, change its desired storage classes, write some new data, and save

If the caller asks for multiple storage classes, error out early, instead of writing all the data and then hitting an error when saving the collection record.


Subtasks

Task #13464: Review 13430-allow-storage-classesResolvedFuad Muhic


Related issues

Related to Arvados - Feature #13382: [keepstore] Write new blocks to appropriate storage classNew

Blocks Arvados - Feature #11184: [Keep] Support multiple storage classesIn Progress

Associated revisions

Revision 215fe1cc
Added by Fuad Muhic over 2 years ago

Merge branch '13430-allow-storage-classes'

refs #13430

Arvados-DCO-1.1-Signed-off-by: Fuad Muhic <>

History

#1 Updated by Tom Morris over 2 years ago

  • Related to Feature #11184: [Keep] Support multiple storage classes added

#2 Updated by Tom Morris over 2 years ago

  • Target version set to To Be Groomed

#3 Updated by Tom Clegg over 2 years ago

  • Subject changed from arv-put supports storage tiers to [arv-put] [Python] Allow caller to specify storage classes when writing data to Keep
  • Description updated (diff)

#4 Updated by Tom Clegg over 2 years ago

  • Related to deleted (Feature #11184: [Keep] Support multiple storage classes)

#5 Updated by Tom Clegg over 2 years ago

  • Blocks Feature #11184: [Keep] Support multiple storage classes added

#6 Updated by Tom Clegg over 2 years ago

  • Description updated (diff)

#7 Updated by Peter Amstutz over 2 years ago

  • Related to Feature #13382: [keepstore] Write new blocks to appropriate storage class added

#8 Updated by Tom Morris over 2 years ago

  • Story points set to 2.0

#9 Updated by Tom Morris over 2 years ago

  • Target version changed from To Be Groomed to Arvados Future Sprints

#10 Updated by Tom Morris over 2 years ago

  • Target version changed from Arvados Future Sprints to 2018-05-23 Sprint

#11 Updated by Peter Amstutz over 2 years ago

  • Assigned To set to Fuad Muhic

#12 Updated by Fuad Muhic over 2 years ago

  • Status changed from New to In Progress

#13 Updated by Fuad Muhic over 2 years ago

Quick notes:

Storage classes are separated with commas instead of space:
arv-put --storage-classes foo,bar,baz ./myFile.txt
If specifying them as space separated list of strings is better I can change it.
(In that case storage classes will have to be specified as last parameter)

Updating storage_classes_desired field always happens at the end when collection is saved (by calling save and save_new methods) which means only one API call is made.
That means that running following commands one after another:

arv-put --update-collection uuid --storage-classes hot ./myFile.txt
arv-put --update-collection uuid --storage-classes cold ./myFile.txt

will not change collections storage_classes_desired to cold, if myFile.txt didn't change, because manifest didn't change and Collection's save method will not make API call. (If this is not what we want I can change it but in that case updating collection while specifying storage classes will usually require 2 API calls.)

#14 Updated by Lucas Di Pentima over 2 years ago

  • Non-related: I think you should set up your git so that your Veritas account appears on the commits.
  • Maybe some Python SDK tests are needed to cover the additions on save() and save_new() methods.
  • Adding docstring documentation for storage_classes on both methods would also be helpful
  • If storage_classes is not None on save() & save_new(), maybe it would be convenient to validate that a list of strings was passed and error out early, similar to what the story ask on the multiple storage case.
  • Regarding updating the storage class when updating a collection that didn’t change: Maybe we can add a new behavior on the save() method that updates the collection whenever the storage_classes argument is not None, even if self.committed() == True.

#15 Updated by Lucas Di Pentima over 2 years ago

Reviewing 287d269fa0ea5740715bfa6d3e93b44bea3b8584

Test run: https://ci.curoverse.com/job/developer-run-tests/722/

  • storage_classes validation on sdk/python/arvados/collection.py lines 1474 & 1487 could be condensed to just one check at the beginning of the save() method.
  • Would also be nice to have a test that proves that saving a committed collection updates the storage_classes_desired attribute.
  • With that, it LGTM.

#16 Updated by Fuad Muhic over 2 years ago

  • Status changed from In Progress to Resolved

#17 Updated by Tom Morris over 2 years ago

  • Release set to 13

Also available in: Atom PDF