Story #9180

[PySDK] Avoid overreplication in KeepClient

Added by Brett Smith almost 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
06/08/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

When you upload data to Keep in the Python KeepClient, it starts a thread for each Keep service to do the uploads. It lets N threads run at a time, where N is the desired replication of the block, until N of those threads succeeds, or it has to give up trying.

This can frequently lead to situations where KeepClient overreplicates when things are running smoothly. Imagine a client uploads a block with desired replication 2 and 4 Keep services available. This can happen:

Start upload thread 1
Start upload thread 2
Upload thread 1 succeeds
Start upload thread 3
Upload thread 2 succeeds
Don't start any more threads, because there have been two successes
Upload thread 3 succeeds

Adjust the Python SDK logic so it doesn't let more threads run than are necessary to achieve the desired replication. The simplest possible change would be to adjust the thread limiter so that, rather than simply allowing N threads to run, it only lets (N - successes achieved) threads run.


Subtasks

Task #9387: Review 9180-avoid-overreplication-keepclientResolvedLucas Di Pentima


Related issues

Related to Arvados - Story #9446: [SDK] Refactor keep parallel write strategyResolved06/27/2016

Associated revisions

Revision 4aaa1f6a (diff)
Added by Lucas Di Pentima almost 5 years ago

9180: Changed some of the logic on ThreadLimiter and made unit tests to validate the new behaviour
refs #9180

Revision 22e066d4
Added by Lucas Di Pentima almost 5 years ago

Merge branch '9180-avoid-overreplication-keepclient'

Closes #9180

History

#1 Updated by Brett Smith almost 5 years ago

  • Target version set to Arvados Future Sprints

#2 Updated by Radhika Chippada almost 5 years ago

  • Assigned To set to Lucas Di Pentima

#3 Updated by Radhika Chippada almost 5 years ago

  • Target version changed from Arvados Future Sprints to 2016-06-22 sprint

#4 Updated by Lucas Di Pentima almost 5 years ago

  • Status changed from New to In Progress

#5 Updated by Tom Clegg almost 5 years ago

(Notes from offline discussion)

The fix might be as easy as skipping self._todo_lock.release() in ThreadLimiter.__exit__ if save_response() was called with replicas_stored > 0

Unit-testing ThreadLimiter directly might be easier than (and preferable to) testing whether it's having the desired effect on KeepClient.

#6 Updated by Lucas Di Pentima almost 5 years ago

Commit 4aaa1f6

Wrote a couple unit tests to reproduce the behaviour testing ThreadLimiter directly as Tom suggested (thanks!), and proposed a slight change in its logic. It was no enough to avoid releasing the lock when a thread ended successfully because on total success the code locked, so I added an additional counter to keep track of failed threads and check if the wanted replication is achieved at exit()

#7 Updated by Peter Amstutz almost 5 years ago

Looks good to me, please merge.

#8 Updated by Lucas Di Pentima almost 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:22e066d43cb09beb21c4cd5b12a787b81a00f97e.

Also available in: Atom PDF