Project

General

Profile

Actions

Idea #12708

closed

[keep-balance] Move blocks to satisfy storage_classes_desired

Added by Tom Clegg over 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Story points:
0.5
Release:
Release relationship:
Auto

Description

keep-balance needs to take all relevant collections' storage_classes_desired fields into account when deciding to pull/trash replicas of a given block.

When sending pull requests, keep-balance should supply (in "mount_uuid") the UUID of a mount that offers the appropriate storage classes.

keep-balance's log should indicate whether all desired storage classes are achieved. (TBD: how much detail / how should details be reported?)

keep-balance's log should indicate whether all desired storage classes are achievable.

Prerequisites:
  • #7931: keep-balance using the /mounts/{uuid}/blocks API instead of the server-wide index API
  • #11644: keepstore supporting a new "destination mount" field in pull and trash list entries
  • Add `json:"mount_uuid"` tag to the MountUUID field added to PullRequest in keepstore in #11644
Examples:
  • replication_desired=2, storage_classes_desired=["foo","bar"] means at least 2 replicas should be stored on volumes that offer both "foo" and "bar" classes.
  • If a block appears in collection A (replication=2, classes=[foo,bar]) and collection B (replication=3, classes=[default]) then any of the following scenarios are accepted:
    • 2 replicas on 2 volumes with classes=[foo,bar] plus 3 replicas on 3 volumes with classes=[default]
    • 3 replicas on 3 volumes with classes=[foo,bar,default]
    • 2 replicas on 2 volumes with classes=[foo,bar,default] plus 1 replica on a volume with classes=[default]

Subtasks 3 (0 open3 closed)

Task #13295: Review 12708-balance-storage-classesResolvedPeter Amstutz04/19/2018Actions
Task #13415: Report replicas/bytes desired/existing in each storage classResolvedTom Clegg04/19/2018Actions
Task #13483: Review 12708-report-storage-classesResolvedTom Clegg04/19/2018Actions

Related issues

Blocks Arvados - Feature #11184: [Keep] Support multiple storage classesResolvedTom MorrisActions
Blocked by Arvados - Idea #12707: [API] Add columns for desired/actual storage classes for each collectionResolvedLucas Di Pentima02/22/2018Actions
Actions #1

Updated by Tom Clegg over 6 years ago

  • Blocks Feature #11184: [Keep] Support multiple storage classes added
Actions #2

Updated by Tom Clegg over 6 years ago

  • Blocked by Idea #12707: [API] Add columns for desired/actual storage classes for each collection added
Actions #3

Updated by Tom Clegg over 6 years ago

  • Tracker changed from Feature to Task
Actions #4

Updated by Tom Clegg over 6 years ago

  • Tracker changed from Task to Idea
  • Story points set to 2.0
Actions #5

Updated by Tom Clegg over 6 years ago

  • Description updated (diff)
  • Category set to Keep
Actions #6

Updated by Tom Clegg about 6 years ago

  • Description updated (diff)
Actions #7

Updated by Tom Morris almost 6 years ago

  • Target version changed from To Be Groomed to Arvados Future Sprints
Actions #8

Updated by Tom Clegg almost 6 years ago

  • Description updated (diff)
Actions #9

Updated by Tom Morris almost 6 years ago

  • Target version changed from Arvados Future Sprints to 2018-04-11 Sprint
Actions #10

Updated by Tom Clegg almost 6 years ago

  • Assigned To set to Tom Clegg
Actions #11

Updated by Tom Clegg almost 6 years ago

  • Status changed from New to In Progress
Actions #12

Updated by Tom Clegg almost 6 years ago

  • Target version changed from 2018-04-11 Sprint to 2018-04-25 Sprint
Actions #13

Updated by Tom Clegg almost 6 years ago

12708-balance-storage-classes @ 3ce3d9f884aed70cc84155554defe614a1bcfaaa

https://ci.curoverse.com/job/developer-run-tests/689/

These parts are done:
  • keep-balance needs to take all relevant collections' storage_classes_desired fields into account when deciding to pull/trash replicas of a given block.
  • When sending pull requests, keep-balance should supply (in "mount_uuid") the UUID of a mount that offers the appropriate storage classes.
Not done:
  • keep-balance's log should indicate whether all desired storage classes are achieved. (TBD: how much detail / how should details be reported?) (can't really tell the difference between "no pulls because repl is OK" and "repl not OK but there's nowhere left to pull to")
  • keep-balance's log should indicate whether all desired storage classes are achievable.

Need to figure out the most useful way to report these things. Meanwhile "move to desired storage classes where possible" is working.

Actions #14

Updated by Tom Clegg almost 6 years ago

  • Target version changed from 2018-04-25 Sprint to 2018-05-09 Sprint
Actions #15

Updated by Peter Amstutz almost 6 years ago

  • func (bal *Balancer) setupCaches() {
    

nit: The purpose of this method seems to be to fill in mountsByClass but I wouldn't use the word "cache", it is more like a "lookup table" (so suggesting setupCaches() → setupLookupTables()

  • As a minor optimization, you could calculate the number of slots ahead of time, so that on each run of "balanceBlock" the slots can be allocated once with correct capacity to avoid any reallocation.
  • I see that a newly written block can't be trashed until the TTL expires. I also see that it preferentially pulls blocks to slots with the desired storage class. This means blocks written to "default" but intended for a different storage class will tend to be over-replicated for some period of time, which is counter-productive if the goal of storage classes is cost savings. Maybe the sorting should always prefer read-only slots? Ideally the SDKs will write to the intended storage class directly (#7929 and #7930).
  • We need a documentation page in the admin/ sections about using keep-balance. I suppose it would be good to merge #13282 first.
  • We discussed that a collection can specify multiple storage classes, in which case the block is supposed to go only to mounts that satisfy all the storage classes. As a simplifying assumption we're only allowing one storage class. If I squint a little bit I think I see a way to handle multiple storage classes without having to totally overhaul the "distribute to slots" algorithm, but if you have any thoughts on that, it might make a nice TODO comment for whenever we need to implement that feature.
  • I know this is not part of the ticket but I really hate the notation 0=have<want, it is not common mathematical notation and it is important for admins to clearly understand what the report is saying, could we please print something less ambiguous like (0=have)<want or have<want & have=0 (this could also be addressed with a keep-balance admin page).
    bal.logf("%s lost (0=have<want)", bal.stats.lost)
    bal.logf("%s underreplicated (0<have<want)", bal.stats.underrep)
    bal.logf("%s just right (have=want)", bal.stats.justright)
    bal.logf("%s overreplicated (have>want>0)", bal.stats.overrep)
    bal.logf("%s unreferenced (have>want=0, new)", bal.stats.unref)
    bal.logf("%s garbage (have>want=0, old)", bal.stats.garbage)
  • What's your sense of how this is all going to scale when it needs to balance ten million blocks?
Actions #16

Updated by Peter Amstutz almost 6 years ago

How would we do an integration test for this? Ideally we should demonstrate it successfully moving blocks on an actual cluster before we try and put it into production...

Actions #17

Updated by Tom Clegg almost 6 years ago

Peter Amstutz wrote:

nit: The purpose of this method seems to be to fill in mountsByClass but I wouldn't use the word "cache", it is more like a "lookup table" (so suggesting setupCaches() → setupLookupTables()

Agreed, done.

  • As a minor optimization, you could calculate the number of slots ahead of time, so that on each run of "balanceBlock" the slots can be allocated once with correct capacity to avoid any reallocation.

Done.

  • I see that a newly written block can't be trashed until the TTL expires. I also see that it preferentially pulls blocks to slots with the desired storage class. This means blocks written to "default" but intended for a different storage class will tend to be over-replicated for some period of time, which is counter-productive if the goal of storage classes is cost savings.

"Everything is written to default, and later moved to cheaper storage" is the first use case we want to address, and in that case temporary over-replication is inevitable. Aside from that, no amount of client-side support would have been particularly useful without keep-balance support anyway -- keep-balance would just end up unbalancing whatever the client managed to achieve.

Maybe the sorting should always prefer read-only slots?

It does prefer read-only slots if they satisfy the requirements. Do you mean it should prefer read-only slots that don't satisfy the requirements? That would be an optimization if there was a strict ordering of classes. But I don't think it would work if you had classes like "more-private" and "more-durable": "you asked me to put a copy in more-durable, but we already have a copy in a readonly more-private volume, so I didn't bother making a more-durable copy".

  • We discussed that a collection can specify multiple storage classes, in which case the block is supposed to go only to mounts that satisfy all the storage classes. As a simplifying assumption we're only allowing one storage class. If I squint a little bit I think I see a way to handle multiple storage classes without having to totally overhaul the "distribute to slots" algorithm, but if you have any thoughts on that, it might make a nice TODO comment for whenever we need to implement that feature.

Added.

  • I know this is not part of the ticket but I really hate the notation 0=have<want, it is not common mathematical notation and it is important for admins to clearly understand what the report is saying, could we please print something less ambiguous like (0=have)<want or have<want & have=0 (this could also be addressed with a keep-balance admin page).

Improving the report sounds great but I'm not sure that's a good place to start. Anyway, since this isn't part of the ticket, I think we should leave it alone for now.

  • What's your sense of how this is all going to scale when it needs to balance ten million blocks?

Time is dominated by retrieving collections from API server, and I don't expect that to change here. A map[string]int is bigger than an int, so memory use will be higher. I've added a TODO for pooling Desired maps, so we can reduce memory per megablock if it becomes an issue.

(Aside: The ultimate scaling device is sharding the blocks: balance [0-7].*, then [8-f].*, etc. This makes each balancing cycle cost more CPU time, but scales to an arbitrary number of blocks with a given amount of memory per keep-balance machine.)

12708-balance-storage-classes @ 2957b917aaaefc1485e1c5293d413f0931b3030a

Actions #18

Updated by Peter Amstutz almost 6 years ago

12708-balance-storage-classes LGTM

I'd like to make an additional request for a keep-balance documentation page for the "admin" section of the docs. Among other things, it should explain to ops how to use storage classes. (Maybe also your recent adventures in "how to make sure data gets trashed")

Actions #19

Updated by Tom Clegg almost 6 years ago

  • Target version changed from 2018-05-09 Sprint to 2018-05-23 Sprint
Actions #20

Updated by Tom Clegg almost 6 years ago

  • Story points changed from 2.0 to 0.5
Actions #22

Updated by Lucas Di Pentima almost 6 years ago

I was going to ask for logging test additions, but I see that there aren’t any of those about logged stats, are they worth the effort?
Other than that, it lgtm.

Actions #23

Updated by Tom Clegg almost 6 years ago

  • Subject changed from [keep-balance] Move blocks to satisfy storage_classes_desired, and update storage_classes_confirmed* to [keep-balance] Move blocks to satisfy storage_classes_desired
  • Status changed from In Progress to Resolved
Actions #24

Updated by Tom Morris over 5 years ago

  • Release set to 13
Actions

Also available in: Atom PDF