Story #12708

[keep-balance] Move blocks to satisfy storage_classes_desired

Added by Tom Clegg almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
04/19/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5
Release:
Release relationship:
Auto

Description

keep-balance needs to take all relevant collections' storage_classes_desired fields into account when deciding to pull/trash replicas of a given block.

When sending pull requests, keep-balance should supply (in "mount_uuid") the UUID of a mount that offers the appropriate storage classes.

keep-balance's log should indicate whether all desired storage classes are achieved. (TBD: how much detail / how should details be reported?)

keep-balance's log should indicate whether all desired storage classes are achievable.

Prerequisites:
  • #7931: keep-balance using the /mounts/{uuid}/blocks API instead of the server-wide index API
  • #11644: keepstore supporting a new "destination mount" field in pull and trash list entries
  • Add `json:"mount_uuid"` tag to the MountUUID field added to PullRequest in keepstore in #11644
Examples:
  • replication_desired=2, storage_classes_desired=["foo","bar"] means at least 2 replicas should be stored on volumes that offer both "foo" and "bar" classes.
  • If a block appears in collection A (replication=2, classes=[foo,bar]) and collection B (replication=3, classes=[default]) then any of the following scenarios are accepted:
    • 2 replicas on 2 volumes with classes=[foo,bar] plus 3 replicas on 3 volumes with classes=[default]
    • 3 replicas on 3 volumes with classes=[foo,bar,default]
    • 2 replicas on 2 volumes with classes=[foo,bar,default] plus 1 replica on a volume with classes=[default]

Subtasks

Task #13295: Review 12708-balance-storage-classesResolvedPeter Amstutz

Task #13415: Report replicas/bytes desired/existing in each storage classResolvedTom Clegg

Task #13483: Review 12708-report-storage-classesResolvedTom Clegg


Related issues

Blocks Arvados - Feature #11184: [Keep] Support multiple storage classesIn Progress

Blocked by Arvados - Story #12707: [API] Add columns for desired/actual storage classes for each collectionResolved02/22/2018

Associated revisions

Revision 932e3d6e
Added by Tom Clegg over 2 years ago

Merge branch '12708-balance-storage-classes'

refs #12708

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision a0f18645
Added by Tom Clegg over 2 years ago

Merge branch '12708-report-storage-classes'

refs #12708

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Tom Clegg almost 3 years ago

  • Blocks Feature #11184: [Keep] Support multiple storage classes added

#2 Updated by Tom Clegg almost 3 years ago

  • Blocked by Story #12707: [API] Add columns for desired/actual storage classes for each collection added

#3 Updated by Tom Clegg almost 3 years ago

  • Tracker changed from Feature to Task

#4 Updated by Tom Clegg almost 3 years ago

  • Tracker changed from Task to Story
  • Story points set to 2.0

#5 Updated by Tom Clegg almost 3 years ago

  • Description updated (diff)
  • Category set to Keep

#6 Updated by Tom Clegg over 2 years ago

  • Description updated (diff)

#7 Updated by Tom Morris over 2 years ago

  • Target version changed from To Be Groomed to Arvados Future Sprints

#8 Updated by Tom Clegg over 2 years ago

  • Description updated (diff)

#9 Updated by Tom Morris over 2 years ago

  • Target version changed from Arvados Future Sprints to 2018-04-11 Sprint

#10 Updated by Tom Clegg over 2 years ago

  • Assigned To set to Tom Clegg

#11 Updated by Tom Clegg over 2 years ago

  • Status changed from New to In Progress

#12 Updated by Tom Clegg over 2 years ago

  • Target version changed from 2018-04-11 Sprint to 2018-04-25 Sprint

#13 Updated by Tom Clegg over 2 years ago

12708-balance-storage-classes @ 3ce3d9f884aed70cc84155554defe614a1bcfaaa

https://ci.curoverse.com/job/developer-run-tests/689/

These parts are done:
  • keep-balance needs to take all relevant collections' storage_classes_desired fields into account when deciding to pull/trash replicas of a given block.
  • When sending pull requests, keep-balance should supply (in "mount_uuid") the UUID of a mount that offers the appropriate storage classes.
Not done:
  • keep-balance's log should indicate whether all desired storage classes are achieved. (TBD: how much detail / how should details be reported?) (can't really tell the difference between "no pulls because repl is OK" and "repl not OK but there's nowhere left to pull to")
  • keep-balance's log should indicate whether all desired storage classes are achievable.

Need to figure out the most useful way to report these things. Meanwhile "move to desired storage classes where possible" is working.

#14 Updated by Tom Clegg over 2 years ago

  • Target version changed from 2018-04-25 Sprint to 2018-05-09 Sprint

#15 Updated by Peter Amstutz over 2 years ago

  • func (bal *Balancer) setupCaches() {
    

nit: The purpose of this method seems to be to fill in mountsByClass but I wouldn't use the word "cache", it is more like a "lookup table" (so suggesting setupCaches() → setupLookupTables()

  • As a minor optimization, you could calculate the number of slots ahead of time, so that on each run of "balanceBlock" the slots can be allocated once with correct capacity to avoid any reallocation.
  • I see that a newly written block can't be trashed until the TTL expires. I also see that it preferentially pulls blocks to slots with the desired storage class. This means blocks written to "default" but intended for a different storage class will tend to be over-replicated for some period of time, which is counter-productive if the goal of storage classes is cost savings. Maybe the sorting should always prefer read-only slots? Ideally the SDKs will write to the intended storage class directly (#7929 and #7930).
  • We need a documentation page in the admin/ sections about using keep-balance. I suppose it would be good to merge #13282 first.
  • We discussed that a collection can specify multiple storage classes, in which case the block is supposed to go only to mounts that satisfy all the storage classes. As a simplifying assumption we're only allowing one storage class. If I squint a little bit I think I see a way to handle multiple storage classes without having to totally overhaul the "distribute to slots" algorithm, but if you have any thoughts on that, it might make a nice TODO comment for whenever we need to implement that feature.
  • I know this is not part of the ticket but I really hate the notation 0=have<want, it is not common mathematical notation and it is important for admins to clearly understand what the report is saying, could we please print something less ambiguous like (0=have)<want or have<want & have=0 (this could also be addressed with a keep-balance admin page).
    bal.logf("%s lost (0=have<want)", bal.stats.lost)
    bal.logf("%s underreplicated (0<have<want)", bal.stats.underrep)
    bal.logf("%s just right (have=want)", bal.stats.justright)
    bal.logf("%s overreplicated (have>want>0)", bal.stats.overrep)
    bal.logf("%s unreferenced (have>want=0, new)", bal.stats.unref)
    bal.logf("%s garbage (have>want=0, old)", bal.stats.garbage)
  • What's your sense of how this is all going to scale when it needs to balance ten million blocks?

#16 Updated by Peter Amstutz over 2 years ago

How would we do an integration test for this? Ideally we should demonstrate it successfully moving blocks on an actual cluster before we try and put it into production...

#17 Updated by Tom Clegg over 2 years ago

Peter Amstutz wrote:

nit: The purpose of this method seems to be to fill in mountsByClass but I wouldn't use the word "cache", it is more like a "lookup table" (so suggesting setupCaches() → setupLookupTables()

Agreed, done.

  • As a minor optimization, you could calculate the number of slots ahead of time, so that on each run of "balanceBlock" the slots can be allocated once with correct capacity to avoid any reallocation.

Done.

  • I see that a newly written block can't be trashed until the TTL expires. I also see that it preferentially pulls blocks to slots with the desired storage class. This means blocks written to "default" but intended for a different storage class will tend to be over-replicated for some period of time, which is counter-productive if the goal of storage classes is cost savings.

"Everything is written to default, and later moved to cheaper storage" is the first use case we want to address, and in that case temporary over-replication is inevitable. Aside from that, no amount of client-side support would have been particularly useful without keep-balance support anyway -- keep-balance would just end up unbalancing whatever the client managed to achieve.

Maybe the sorting should always prefer read-only slots?

It does prefer read-only slots if they satisfy the requirements. Do you mean it should prefer read-only slots that don't satisfy the requirements? That would be an optimization if there was a strict ordering of classes. But I don't think it would work if you had classes like "more-private" and "more-durable": "you asked me to put a copy in more-durable, but we already have a copy in a readonly more-private volume, so I didn't bother making a more-durable copy".

  • We discussed that a collection can specify multiple storage classes, in which case the block is supposed to go only to mounts that satisfy all the storage classes. As a simplifying assumption we're only allowing one storage class. If I squint a little bit I think I see a way to handle multiple storage classes without having to totally overhaul the "distribute to slots" algorithm, but if you have any thoughts on that, it might make a nice TODO comment for whenever we need to implement that feature.

Added.

  • I know this is not part of the ticket but I really hate the notation 0=have<want, it is not common mathematical notation and it is important for admins to clearly understand what the report is saying, could we please print something less ambiguous like (0=have)<want or have<want & have=0 (this could also be addressed with a keep-balance admin page).

Improving the report sounds great but I'm not sure that's a good place to start. Anyway, since this isn't part of the ticket, I think we should leave it alone for now.

  • What's your sense of how this is all going to scale when it needs to balance ten million blocks?

Time is dominated by retrieving collections from API server, and I don't expect that to change here. A map[string]int is bigger than an int, so memory use will be higher. I've added a TODO for pooling Desired maps, so we can reduce memory per megablock if it becomes an issue.

(Aside: The ultimate scaling device is sharding the blocks: balance [0-7].*, then [8-f].*, etc. This makes each balancing cycle cost more CPU time, but scales to an arbitrary number of blocks with a given amount of memory per keep-balance machine.)

12708-balance-storage-classes @ 2957b917aaaefc1485e1c5293d413f0931b3030a

#18 Updated by Peter Amstutz over 2 years ago

12708-balance-storage-classes LGTM

I'd like to make an additional request for a keep-balance documentation page for the "admin" section of the docs. Among other things, it should explain to ops how to use storage classes. (Maybe also your recent adventures in "how to make sure data gets trashed")

#19 Updated by Tom Clegg over 2 years ago

  • Target version changed from 2018-05-09 Sprint to 2018-05-23 Sprint

#20 Updated by Tom Clegg over 2 years ago

  • Story points changed from 2.0 to 0.5

#22 Updated by Lucas Di Pentima over 2 years ago

I was going to ask for logging test additions, but I see that there aren’t any of those about logged stats, are they worth the effort?
Other than that, it lgtm.

#23 Updated by Tom Clegg over 2 years ago

  • Subject changed from [keep-balance] Move blocks to satisfy storage_classes_desired, and update storage_classes_confirmed* to [keep-balance] Move blocks to satisfy storage_classes_desired
  • Status changed from In Progress to Resolved

#24 Updated by Tom Morris over 2 years ago

  • Release set to 13

Also available in: Atom PDF