Project

General

Profile

Actions

Feature #17351

closed

[arv-put] Storage classes

Added by Nico César about 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Story points:
-
Release relationship:
Auto

Description

Goals of this ticket:
  • to define the command line arguments to specify storage clases
  • to have all the expected behaviour for arv-put
  • to add to the documentation this behaviour
  • to add necesary tests to make sure we comply with this behaviour
to discuss:
  • shall we migrate arv-put to Go in this instance or is this a future work?

Current command line arguments (arvados 2.1.0) :

arv-put --help 
usage: arv-put [-h] [--version] [--normalize | --dry-run]
               [--as-stream | --stream | --as-manifest | --in-manifest | --manifest | --as-raw | --raw]
               [--update-collection UUID] [--use-filename FILENAME]
               [--filename FILENAME] [--portable-data-hash] [--replication N]
               [--storage-classes STORAGE_CLASSES] [--threads N]
               [--exclude PATTERN] [--follow-links | --no-follow-links]
               [--trash-at YYYY-MM-DDTHH:MM | --trash-after DAYS]
               [--project-uuid UUID] [--name NAME]
               [--progress | --no-progress | --batch-progress] [--silent]
               [--resume | --no-resume] [--cache | --no-cache]
               [--retries RETRIES]
               [path [path ...]]
(..)
  --replication N       Set the replication level for the new collection: how
                        many different physical storage devices (e.g., disks)
                        should have a copy of each data block. Default is to
                        use the server-provided default (if any) or 2.
  --storage-classes STORAGE_CLASSES
                        Specify comma separated list of storage classes to be
                        used when saving data to Keep.

base casecase

arv-put --replication N  --storage-classes STORAGE_CLASSES directory

Expected behaviour: ...

updating an existing collection

arv-put --replication N  --storage-classes STORAGE_CLASSES directory --update-collection zzzzz-4zz18-xxxxxxxxxxxxxxx

Expected behaviour: ...

giving conflicting options for resume transaction

arv-put --replication N  --storage-classes STORAGE_CLASSES directory
arv-put --replication M  --storage-classes DIFFERENT_STORAGE_CLASSES directory --resume

Expected behaviour: ...


Subtasks 1 (0 open1 closed)

Task #17736: Review 17351-arvput-keepclient-storage-supportResolvedTom Clegg06/03/2021Actions

Related issues

Related to Arvados Epics - Idea #16107: Storage classesResolved03/01/202109/30/2021Actions
Related to Arvados - Idea #17465: Support writing blocks to correct storage classes in Python SDKResolvedLucas Di Pentima06/01/2021Actions
Actions #1

Updated by Nico César about 3 years ago

Actions #2

Updated by Nico César about 3 years ago

  • Target version set to To Be Groomed
  • Category set to Keep
Actions #3

Updated by Nico César about 3 years ago

  • Description updated (diff)
  • Subject changed from [arv-put] [and other keep clients] Storage tiers design to [arv-put] Storage tiers design
Actions #4

Updated by Nico César about 3 years ago

  • Description updated (diff)
Actions #5

Updated by Nico César about 3 years ago

  • Subject changed from [arv-put] Storage tiers design to [arv-put] Storage classes revisit
Actions #6

Updated by Nico César about 3 years ago

  • Subject changed from [arv-put] Storage classes revisit to [arv-put] Storage tiers design
Actions #7

Updated by Nico César about 3 years ago

  • Subject changed from [arv-put] Storage tiers design to [arv-put] Storage classes revisit
Actions #8

Updated by Lucas Di Pentima about 3 years ago

  • Target version changed from To Be Groomed to 2021-04-14 sprint
Actions #9

Updated by Peter Amstutz about 3 years ago

  • Target version changed from 2021-04-14 sprint to 2021-05-26 sprint
Actions #10

Updated by Peter Amstutz almost 3 years ago

  • Subject changed from [arv-put] Storage classes revisit to [arv-put] Storage classes
Actions #11

Updated by Peter Amstutz almost 3 years ago

  • Assigned To set to Lucas Di Pentima
  • Subject changed from [arv-put] Storage classes to [arv-put] Storage classes
Actions #12

Updated by Lucas Di Pentima almost 3 years ago

  • Target version changed from 2021-05-26 sprint to 2021-06-09 sprint
Actions #13

Updated by Lucas Di Pentima almost 3 years ago

  • Status changed from New to In Progress
Actions #14

Updated by Lucas Di Pentima almost 3 years ago

Updates at a0fcd46cb - branch 17351-arvput-keepclient-storage-support
Test run: developer-run-tests: #2508

  • Removes limitation of no more than 1 storage classes. (couldn't find the reason of that limitation, introduced in #13430)
  • Passes storage classes data at Collection instantiation time instead of passing it to the .save() or .save_new() methods. This produces that the keep client used to upload files will write to keep directly to the specified classes.
Actions #15

Updated by Tom Clegg almost 3 years ago

  • Related to Idea #17465: Support writing blocks to correct storage classes in Python SDK added
Actions #16

Updated by Lucas Di Pentima almost 3 years ago

Rebased to the latest #17465 changes at 57a26e5
Test run: developer-run-tests: #2510

Actions #17

Updated by Lucas Di Pentima almost 3 years ago

  • Target version changed from 2021-06-09 sprint to 2021-06-23 sprint
Actions #18

Updated by Lucas Di Pentima almost 3 years ago

While working on #17572 I realized that making arv-put to honor a previously created collection's desired_storage_classes field may produce surprising results to the user, for example:

1. The user creates an empty collection via the CLI tools, assigning a desired_storage_classes list with nonexistent classes.
2. The RailsAPI will be OK with that, so it gets created.
3. Then, the user executes arv-put without any --storage-classes argument but using --update-collection UUID with the previously created collection's UUID.
4. The user will get an error from arv-put because Keep returns 503 (I think) when a non-valid class is specified. This is because the command will honor the storage class set up on the collection record if none is specified.

If our priority is making sure that keep writes get done on the correct classes or nowhere, I think the solution would be to make RailsAPI or controller error out when a non-valid class is requested. WDYT?

Actions #19

Updated by Tom Clegg almost 3 years ago

I think the "error because existing collection has unwritable classes" outcome is acceptable. Even if we validate classes when creating/saving a collection, this same condition can happen if all volumes with a given class become read-only, temporarily unreachable, or full.

We should probably check that the error message in such cases is not too confusing, though.

Actions #20

Updated by Tom Clegg almost 3 years ago

57a26e595 LGTM, thanks

Actions #21

Updated by Lucas Di Pentima almost 3 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved
Actions #22

Updated by Peter Amstutz over 2 years ago

  • Release set to 42
Actions

Also available in: Atom PDF