Project

General

Profile

Actions

Idea #17465

closed

Support writing blocks to correct storage classes in Python SDK

Added by Peter Amstutz about 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Story points:
-
Release relationship:
Auto

Description

Depends on #13382

The client is responsible for keeping track of how many replicas have been written for each storage class. If one storage class is satisfied but not another, the satisfied one can be removed from the X-Keep-Storage-Classes list. The overall write is a success if all storage classes were satisfied, and a failure if any of the storage classes could not be satisfied.

Add as an option to Python keepclient and Collection class.

"We are not expecting to skip servers, we just need to send the storage classes and know when we are done"

Keepclient code in Go and Python SDK support:

Setting storage classes to use when uploading a block
Passing storage classes in the X-Keep-Storage-Classes header

The client is responsible for keeping track of how many replicas have been written for each storage class. If one storage class is satisfied but not another, the satisfied one can be removed from the X-Keep-Storage-Classes list. The overall write is a success if all storage classes were satisfied, and a failure if any of the storage classes could not be satisfied.

  • Collection instance knows the storage classes
  • Storage classes are passed to the block manager instance
  • Block manager passes storage classes to the put() call via a new optional parameter

Subtasks 1 (0 open1 closed)

Task #17511: Review 17465-pysdk-storage-classes-supportResolvedLucas Di Pentima06/01/2021Actions

Related issues

Related to Arvados Epics - Idea #16107: Storage classesResolved03/01/202109/30/2021Actions
Related to Arvados - Feature #17351: [arv-put] Storage classesResolvedLucas Di Pentima06/03/2021Actions
Blocked by Arvados - Feature #13382: [keepstore] Write new blocks to appropriate storage classResolvedTom Clegg04/02/2021Actions
Blocks Arvados - Feature #17572: arv-mount understands storage classesResolvedLucas Di Pentima06/21/2021Actions
Actions #1

Updated by Peter Amstutz about 3 years ago

Actions #2

Updated by Peter Amstutz about 3 years ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz about 3 years ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz about 3 years ago

  • Target version set to 2021-04-14 sprint
Actions #5

Updated by Peter Amstutz about 3 years ago

  • Category set to SDKs
Actions #6

Updated by Lucas Di Pentima about 3 years ago

  • Assigned To set to Lucas Di Pentima
Actions #7

Updated by Peter Amstutz about 3 years ago

  • Target version changed from 2021-04-14 sprint to 2021-05-12 sprint
  • Assigned To deleted (Lucas Di Pentima)
Actions #8

Updated by Lucas Di Pentima about 3 years ago

  • Blocked by Feature #13382: [keepstore] Write new blocks to appropriate storage class added
Actions #9

Updated by Peter Amstutz almost 3 years ago

Actions #10

Updated by Peter Amstutz almost 3 years ago

  • Description updated (diff)
Actions #11

Updated by Peter Amstutz almost 3 years ago

  • Assigned To set to Lucas Di Pentima
Actions #12

Updated by Lucas Di Pentima almost 3 years ago

  • Target version changed from 2021-05-12 sprint to 2021-05-26 sprint
Actions #13

Updated by Lucas Di Pentima almost 3 years ago

  • Status changed from New to In Progress
Actions #14

Updated by Lucas Di Pentima almost 3 years ago

  • Target version changed from 2021-05-26 sprint to 2021-06-09 sprint
Actions #15

Updated by Lucas Di Pentima almost 3 years ago

Status update:

Updates at a641967 - branch 17465-pysdk-storage-classes-support
Test run: developer-run-tests: #2499

  • KeepClient.put() supports storage classes.
  • When talking to a older cluster, it falls back to the previous behavior, only checking for the number of copies.
  • Added tests.

Note: Some of the new tests make KeepWriterQueue to freeze. I'm not quite sure what's going on yet, but I avoided the problem my adding 1 to the number of tries (see line 574 at sdk/python/arvados/keep.py)

Pending: Add storage classes support to arvados.Collection

Actions #16

Updated by Lucas Di Pentima almost 3 years ago

Updates at aeb518534
Test run: developer-run-tests: #2500

  • Adds KeepClient storage classes support to Collection class through BlockManager.
  • Collections being loaded from a manifest locator now also load their storage_classes_desired attribute.
  • Added tests.
Actions #17

Updated by Lucas Di Pentima almost 3 years ago

While reading through arv-put code to see how to tackle #17351, I'm realizing I did something wrong in regards to the precedence of the storage classes being used when calling Collection.save() or Collection.save_new() and passing storage classes there: The current proposal disregards those passed arguments if the Collection instance was created with other storage classes (something that can happen when the code is called with those arguments or when loading a Collection from the API with storage classes desired attributes already set).

I'll fix that and update the ticket when ready.

Actions #18

Updated by Lucas Di Pentima almost 3 years ago

Updates at 5ab34436e
Test run: developer-run-tests: #2507

  • Makes Collection.save() and Collection.save_new() to set the storage classes desired data as a Collection instance variable, so it can be overwritten if already set at instantiation time.
  • Makes BlockManager to access storage classes desired data from its Collection instead of having an internal copy of the same information. This will allow to immediately start writing blocks on a different set of classes if requested.
  • Adds one more test.
Actions #19

Updated by Tom Clegg almost 3 years ago

I'm not sure this is a great solution, but I think it explains/avoids the deadlock you were hitting, where pending_tries reaches 0 because #copies reaches the target, but there are still unsatisfied classes. With this patch, I found tests pass but generate a lot of noise from a queue.Empty exception in testutil -- ISTR the test strategy involves adding extra mocked keep responses in situations like this.

diff --git a/sdk/python/arvados/keep.py b/sdk/python/arvados/keep.py
index 2f20132ae..8fe05f347 100644
--- a/sdk/python/arvados/keep.py
+++ b/sdk/python/arvados/keep.py
@@ -571,7 +571,7 @@ class KeepClient(object):
             self.response = None
             self.storage_classes_tracking = True
             self.queue_data_lock = threading.Lock()
-            self.pending_tries = max(copies, len(classes))+1
+            self.pending_tries = max(copies, len(classes))
             self.pending_tries_notification = threading.Condition()

         def write_success(self, response, replicas_nr, classes_confirmed):
@@ -585,6 +585,11 @@ class KeepClient(object):
                             self.confirmed_storage_classes[st_class] += st_copies
                         except KeyError:
                             self.confirmed_storage_classes[st_class] = st_copies
+                    pending_classes = 0
+                    for st_class in self.wanted_storage_classes:
+                        if st_class not in self.confirmed_storage_classes or self.confirmed_storage_classes[st_class] < self.wanted_copies:
+                            pending_classes += 1
+                    self.pending_tries = max(self.wanted_copies - self.successful_copies, pending_classes)
                 self.response = response
             with self.pending_tries_notification:
                 self.pending_tries_notification.notify_all()

source:sdk/python/arvados/collection.py L1427 ("classes = ...") seems to be unused/unneeded:

        if self._block_manager is None:
            copies = (self.replication_desired or
                      self._my_api()._rootDesc.get('defaultCollectionReplication',
                                                   2))
            classes = self.storage_classes_desired or []
            self._block_manager = _BlockManager(self._my_keep(), copies=copies, put_threads=self.put_threads, num_retries=self.num_retries, storage_classes_func=self.storage_classes_desired)

source:sdk/python/arvados/keep.py L1256 seems like it will print this warning once per block on a big upload. Maybe use an "already warned" flag on the KeepClient object or something like that?

                _logger.warning("X-Keep-Storage-Classes header not supported by the cluster")
Actions #20

Updated by Tom Clegg almost 3 years ago

Actions #21

Updated by Lucas Di Pentima almost 3 years ago

Thanks for your suggestion, I applied it with some tweaks to avoid code duplication.

Updates at 205aecd08
Test run: developer-run-tests: #2509

  • Applied suggested fix to recompute pending tries.
  • Avoids a "unsupported feature" warning logging for every block upload.
  • Fixed tests.
Actions #22

Updated by Tom Clegg almost 3 years ago

205aecd08 LGTM, thanks!

Actions #23

Updated by Lucas Di Pentima almost 3 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved
Actions #24

Updated by Peter Amstutz over 2 years ago

  • Release set to 42
Actions

Also available in: Atom PDF