Idea #12708
closed[keep-balance] Move blocks to satisfy storage_classes_desired
Description
keep-balance needs to take all relevant collections' storage_classes_desired
fields into account when deciding to pull/trash replicas of a given block.
When sending pull requests, keep-balance should supply (in "mount_uuid") the UUID of a mount that offers the appropriate storage classes.
keep-balance's log should indicate whether all desired storage classes are achieved. (TBD: how much detail / how should details be reported?)
keep-balance's log should indicate whether all desired storage classes are achievable.
Prerequisites:- #7931: keep-balance using the
/mounts/{uuid}/blocks
API instead of the server-wide index API - #11644: keepstore supporting a new "destination mount" field in pull and trash list entries
- Add
`json:"mount_uuid"`
tag to the MountUUID field added to PullRequest in keepstore in #11644
- replication_desired=2, storage_classes_desired=["foo","bar"] means at least 2 replicas should be stored on volumes that offer both "foo" and "bar" classes.
- If a block appears in collection A (replication=2, classes=[foo,bar]) and collection B (replication=3, classes=[default]) then any of the following scenarios are accepted:
- 2 replicas on 2 volumes with classes=[foo,bar] plus 3 replicas on 3 volumes with classes=[default]
- 3 replicas on 3 volumes with classes=[foo,bar,default]
- 2 replicas on 2 volumes with classes=[foo,bar,default] plus 1 replica on a volume with classes=[default]
Updated by Tom Clegg about 7 years ago
- Blocks Feature #11184: [Keep] Support multiple storage classes added
Updated by Tom Clegg about 7 years ago
- Blocked by Idea #12707: [API] Add columns for desired/actual storage classes for each collection added
Updated by Tom Clegg about 7 years ago
- Tracker changed from Task to Idea
- Story points set to 2.0
Updated by Tom Clegg about 7 years ago
- Description updated (diff)
- Category set to Keep
Updated by Tom Morris almost 7 years ago
- Target version changed from To Be Groomed to Arvados Future Sprints
Updated by Tom Morris almost 7 years ago
- Target version changed from Arvados Future Sprints to 2018-04-11 Sprint
Updated by Tom Clegg almost 7 years ago
- Target version changed from 2018-04-11 Sprint to 2018-04-25 Sprint
Updated by Tom Clegg almost 7 years ago
12708-balance-storage-classes @ 3ce3d9f884aed70cc84155554defe614a1bcfaaa
https://ci.curoverse.com/job/developer-run-tests/689/
These parts are done:- keep-balance needs to take all relevant collections' storage_classes_desired fields into account when deciding to pull/trash replicas of a given block.
- When sending pull requests, keep-balance should supply (in "mount_uuid") the UUID of a mount that offers the appropriate storage classes.
- keep-balance's log should indicate whether all desired storage classes are achieved. (TBD: how much detail / how should details be reported?) (can't really tell the difference between "no pulls because repl is OK" and "repl not OK but there's nowhere left to pull to")
- keep-balance's log should indicate whether all desired storage classes are achievable.
Need to figure out the most useful way to report these things. Meanwhile "move to desired storage classes where possible" is working.
Updated by Tom Clegg over 6 years ago
- Target version changed from 2018-04-25 Sprint to 2018-05-09 Sprint
Updated by Peter Amstutz over 6 years ago
func (bal *Balancer) setupCaches() {
nit: The purpose of this method seems to be to fill in mountsByClass
but I wouldn't use the word "cache", it is more like a "lookup table" (so suggesting setupCaches() → setupLookupTables()
- As a minor optimization, you could calculate the number of slots ahead of time, so that on each run of "balanceBlock" the slots can be allocated once with correct capacity to avoid any reallocation.
- I see that a newly written block can't be trashed until the TTL expires. I also see that it preferentially pulls blocks to slots with the desired storage class. This means blocks written to "default" but intended for a different storage class will tend to be over-replicated for some period of time, which is counter-productive if the goal of storage classes is cost savings. Maybe the sorting should always prefer read-only slots? Ideally the SDKs will write to the intended storage class directly (#7929 and #7930).
- We need a documentation page in the admin/ sections about using keep-balance. I suppose it would be good to merge #13282 first.
- We discussed that a collection can specify multiple storage classes, in which case the block is supposed to go only to mounts that satisfy all the storage classes. As a simplifying assumption we're only allowing one storage class. If I squint a little bit I think I see a way to handle multiple storage classes without having to totally overhaul the "distribute to slots" algorithm, but if you have any thoughts on that, it might make a nice TODO comment for whenever we need to implement that feature.
- I know this is not part of the ticket but I really hate the notation
0=have<want
, it is not common mathematical notation and it is important for admins to clearly understand what the report is saying, could we please print something less ambiguous like(0=have)<want
orhave<want & have=0
(this could also be addressed with a keep-balance admin page).
bal.logf("%s lost (0=have<want)", bal.stats.lost) bal.logf("%s underreplicated (0<have<want)", bal.stats.underrep) bal.logf("%s just right (have=want)", bal.stats.justright) bal.logf("%s overreplicated (have>want>0)", bal.stats.overrep) bal.logf("%s unreferenced (have>want=0, new)", bal.stats.unref) bal.logf("%s garbage (have>want=0, old)", bal.stats.garbage)
- What's your sense of how this is all going to scale when it needs to balance ten million blocks?
Updated by Peter Amstutz over 6 years ago
How would we do an integration test for this? Ideally we should demonstrate it successfully moving blocks on an actual cluster before we try and put it into production...
Updated by Tom Clegg over 6 years ago
Peter Amstutz wrote:
nit: The purpose of this method seems to be to fill in
mountsByClass
but I wouldn't use the word "cache", it is more like a "lookup table" (so suggesting setupCaches() → setupLookupTables()
Agreed, done.
- As a minor optimization, you could calculate the number of slots ahead of time, so that on each run of "balanceBlock" the slots can be allocated once with correct capacity to avoid any reallocation.
Done.
- I see that a newly written block can't be trashed until the TTL expires. I also see that it preferentially pulls blocks to slots with the desired storage class. This means blocks written to "default" but intended for a different storage class will tend to be over-replicated for some period of time, which is counter-productive if the goal of storage classes is cost savings.
"Everything is written to default, and later moved to cheaper storage" is the first use case we want to address, and in that case temporary over-replication is inevitable. Aside from that, no amount of client-side support would have been particularly useful without keep-balance support anyway -- keep-balance would just end up unbalancing whatever the client managed to achieve.
Maybe the sorting should always prefer read-only slots?
It does prefer read-only slots if they satisfy the requirements. Do you mean it should prefer read-only slots that don't satisfy the requirements? That would be an optimization if there was a strict ordering of classes. But I don't think it would work if you had classes like "more-private" and "more-durable": "you asked me to put a copy in more-durable, but we already have a copy in a readonly more-private volume, so I didn't bother making a more-durable copy".
- We discussed that a collection can specify multiple storage classes, in which case the block is supposed to go only to mounts that satisfy all the storage classes. As a simplifying assumption we're only allowing one storage class. If I squint a little bit I think I see a way to handle multiple storage classes without having to totally overhaul the "distribute to slots" algorithm, but if you have any thoughts on that, it might make a nice TODO comment for whenever we need to implement that feature.
Added.
- I know this is not part of the ticket but I really hate the notation
0=have<want
, it is not common mathematical notation and it is important for admins to clearly understand what the report is saying, could we please print something less ambiguous like(0=have)<want
orhave<want & have=0
(this could also be addressed with a keep-balance admin page).
Improving the report sounds great but I'm not sure that's a good place to start. Anyway, since this isn't part of the ticket, I think we should leave it alone for now.
- What's your sense of how this is all going to scale when it needs to balance ten million blocks?
Time is dominated by retrieving collections from API server, and I don't expect that to change here. A map[string]int is bigger than an int, so memory use will be higher. I've added a TODO for pooling Desired maps, so we can reduce memory per megablock if it becomes an issue.
(Aside: The ultimate scaling device is sharding the blocks: balance [0-7].*
, then [8-f].*
, etc. This makes each balancing cycle cost more CPU time, but scales to an arbitrary number of blocks with a given amount of memory per keep-balance machine.)
12708-balance-storage-classes @ 2957b917aaaefc1485e1c5293d413f0931b3030a
Updated by Peter Amstutz over 6 years ago
12708-balance-storage-classes LGTM
I'd like to make an additional request for a keep-balance documentation page for the "admin" section of the docs. Among other things, it should explain to ops how to use storage classes. (Maybe also your recent adventures in "how to make sure data gets trashed")
Updated by Tom Clegg over 6 years ago
- Target version changed from 2018-05-09 Sprint to 2018-05-23 Sprint
Updated by Tom Clegg over 6 years ago
12708-report-storage-classes @ 5012102bb1854af9ae591c755feb2ea9c874b708
Updated by Lucas Di Pentima over 6 years ago
I was going to ask for logging test additions, but I see that there aren’t any of those about logged stats, are they worth the effort?
Other than that, it lgtm.
Updated by Tom Clegg over 6 years ago
- Subject changed from [keep-balance] Move blocks to satisfy storage_classes_desired, and update storage_classes_confirmed* to [keep-balance] Move blocks to satisfy storage_classes_desired
- Status changed from In Progress to Resolved