Story #17465

Support writing blocks to correct storage classes in Python SDK

Added by Peter Amstutz 7 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Start date:
06/01/2021
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

Depends on #13382

The client is responsible for keeping track of how many replicas have been written for each storage class. If one storage class is satisfied but not another, the satisfied one can be removed from the X-Keep-Storage-Classes list. The overall write is a success if all storage classes were satisfied, and a failure if any of the storage classes could not be satisfied.

Add as an option to Python keepclient and Collection class.

"We are not expecting to skip servers, we just need to send the storage classes and know when we are done"

Keepclient code in Go and Python SDK support:

Setting storage classes to use when uploading a block
Passing storage classes in the X-Keep-Storage-Classes header

The client is responsible for keeping track of how many replicas have been written for each storage class. If one storage class is satisfied but not another, the satisfied one can be removed from the X-Keep-Storage-Classes list. The overall write is a success if all storage classes were satisfied, and a failure if any of the storage classes could not be satisfied.

  • Collection instance knows the storage classes
  • Storage classes are passed to the block manager instance
  • Block manager passes storage classes to the put() call via a new optional parameter

Subtasks

Task #17511: Review 17465-pysdk-storage-classes-supportResolvedLucas Di Pentima


Related issues

Related to Arvados Epics - Story #16107: Storage classesIn Progress03/01/202109/30/2021

Related to Arvados - Feature #17351: [arv-put] Storage classesResolved06/03/2021

Blocked by Arvados - Feature #13382: [keepstore] Write new blocks to appropriate storage classResolved04/02/2021

Blocks Arvados - Feature #17572: arv-mount understands storage classesResolved06/21/2021

Associated revisions

Revision 8227b8a1
Added by Lucas Di Pentima 4 months ago

Merge branch '17465-pysdk-storage-classes-support'
Closes #17465

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <>

History

#1 Updated by Peter Amstutz 7 months ago

#2 Updated by Peter Amstutz 7 months ago

  • Description updated (diff)

#3 Updated by Peter Amstutz 7 months ago

  • Description updated (diff)

#4 Updated by Peter Amstutz 7 months ago

  • Target version set to 2021-04-14 sprint

#5 Updated by Peter Amstutz 7 months ago

  • Category set to SDKs

#6 Updated by Lucas Di Pentima 7 months ago

  • Assigned To set to Lucas Di Pentima

#7 Updated by Peter Amstutz 7 months ago

  • Target version changed from 2021-04-14 sprint to 2021-05-12 sprint
  • Assigned To deleted (Lucas Di Pentima)

#8 Updated by Lucas Di Pentima 7 months ago

  • Blocked by Feature #13382: [keepstore] Write new blocks to appropriate storage class added

#9 Updated by Peter Amstutz 6 months ago

#10 Updated by Peter Amstutz 6 months ago

  • Description updated (diff)

#11 Updated by Peter Amstutz 6 months ago

  • Assigned To set to Lucas Di Pentima

#12 Updated by Lucas Di Pentima 5 months ago

  • Target version changed from 2021-05-12 sprint to 2021-05-26 sprint

#13 Updated by Lucas Di Pentima 5 months ago

  • Status changed from New to In Progress

#14 Updated by Lucas Di Pentima 5 months ago

  • Target version changed from 2021-05-26 sprint to 2021-06-09 sprint

#15 Updated by Lucas Di Pentima 5 months ago

Status update:

Updates at a641967 - branch 17465-pysdk-storage-classes-support
Test run: https://ci.arvados.org/job/developer-run-tests/2499/

  • KeepClient.put() supports storage classes.
  • When talking to a older cluster, it falls back to the previous behavior, only checking for the number of copies.
  • Added tests.

Note: Some of the new tests make KeepWriterQueue to freeze. I'm not quite sure what's going on yet, but I avoided the problem my adding 1 to the number of tries (see line 574 at sdk/python/arvados/keep.py)

Pending: Add storage classes support to arvados.Collection

#16 Updated by Lucas Di Pentima 5 months ago

Updates at aeb518534
Test run: https://ci.arvados.org/job/developer-run-tests/2500/

  • Adds KeepClient storage classes support to Collection class through BlockManager.
  • Collections being loaded from a manifest locator now also load their storage_classes_desired attribute.
  • Added tests.

#17 Updated by Lucas Di Pentima 5 months ago

While reading through arv-put code to see how to tackle #17351, I'm realizing I did something wrong in regards to the precedence of the storage classes being used when calling Collection.save() or Collection.save_new() and passing storage classes there: The current proposal disregards those passed arguments if the Collection instance was created with other storage classes (something that can happen when the code is called with those arguments or when loading a Collection from the API with storage classes desired attributes already set).

I'll fix that and update the ticket when ready.

#18 Updated by Lucas Di Pentima 5 months ago

Updates at 5ab34436e
Test run: https://ci.arvados.org/job/developer-run-tests/2507/

  • Makes Collection.save() and Collection.save_new() to set the storage classes desired data as a Collection instance variable, so it can be overwritten if already set at instantiation time.
  • Makes BlockManager to access storage classes desired data from its Collection instead of having an internal copy of the same information. This will allow to immediately start writing blocks on a different set of classes if requested.
  • Adds one more test.

#19 Updated by Tom Clegg 4 months ago

I'm not sure this is a great solution, but I think it explains/avoids the deadlock you were hitting, where pending_tries reaches 0 because #copies reaches the target, but there are still unsatisfied classes. With this patch, I found tests pass but generate a lot of noise from a queue.Empty exception in testutil -- ISTR the test strategy involves adding extra mocked keep responses in situations like this.

diff --git a/sdk/python/arvados/keep.py b/sdk/python/arvados/keep.py
index 2f20132ae..8fe05f347 100644
--- a/sdk/python/arvados/keep.py
+++ b/sdk/python/arvados/keep.py
@@ -571,7 +571,7 @@ class KeepClient(object):
             self.response = None
             self.storage_classes_tracking = True
             self.queue_data_lock = threading.Lock()
-            self.pending_tries = max(copies, len(classes))+1
+            self.pending_tries = max(copies, len(classes))
             self.pending_tries_notification = threading.Condition()

         def write_success(self, response, replicas_nr, classes_confirmed):
@@ -585,6 +585,11 @@ class KeepClient(object):
                             self.confirmed_storage_classes[st_class] += st_copies
                         except KeyError:
                             self.confirmed_storage_classes[st_class] = st_copies
+                    pending_classes = 0
+                    for st_class in self.wanted_storage_classes:
+                        if st_class not in self.confirmed_storage_classes or self.confirmed_storage_classes[st_class] < self.wanted_copies:
+                            pending_classes += 1
+                    self.pending_tries = max(self.wanted_copies - self.successful_copies, pending_classes)
                 self.response = response
             with self.pending_tries_notification:
                 self.pending_tries_notification.notify_all()

source:sdk/python/arvados/collection.py L1427 ("classes = ...") seems to be unused/unneeded:

        if self._block_manager is None:
            copies = (self.replication_desired or
                      self._my_api()._rootDesc.get('defaultCollectionReplication',
                                                   2))
            classes = self.storage_classes_desired or []
            self._block_manager = _BlockManager(self._my_keep(), copies=copies, put_threads=self.put_threads, num_retries=self.num_retries, storage_classes_func=self.storage_classes_desired)

source:sdk/python/arvados/keep.py L1256 seems like it will print this warning once per block on a big upload. Maybe use an "already warned" flag on the KeepClient object or something like that?

                _logger.warning("X-Keep-Storage-Classes header not supported by the cluster")

#20 Updated by Tom Clegg 4 months ago

#21 Updated by Lucas Di Pentima 4 months ago

Thanks for your suggestion, I applied it with some tweaks to avoid code duplication.

Updates at 205aecd08
Test run: https://ci.arvados.org/job/developer-run-tests/2509/

  • Applied suggested fix to recompute pending tries.
  • Avoids a "unsupported feature" warning logging for every block upload.
  • Fixed tests.

#22 Updated by Tom Clegg 4 months ago

205aecd08 LGTM, thanks!

#23 Updated by Lucas Di Pentima 4 months ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved

Also available in: Atom PDF