Feature #21126
closedKeepstore volume flag to enable/disable delete on a volume
Added by Peter Amstutz about 1 year ago. Updated about 1 year ago.
Description
Add flag to Keep Volume config "AllowTrashWhenReadOnly"
Default false, when true, the "ReadOnly" flag rejects writes but allows Trash behavior.
Updated by Tom Clegg about 1 year ago
- Status changed from New to In Progress
21126-trash-when-ro @ da83807d6bcef1c1f0bb78479c5ec17f150f5eda -- developer-run-tests: #3874
Updated by Tom Clegg about 1 year ago
21126-trash-when-ro @ c9b029a3b2efaf6babe458cfaa70b08a3debd4cc -- developer-run-tests: #3878
- All agreed upon points are implemented / addressed.
- ✅
AllowTrashWhenReadOnly
volume parameter, default false
- ✅
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- n/a
- Code is tested and passing, both automated and manual, what manual testing was done is described
- ✅ tests added/updated
- Documentation has been updated.
- Is this self-explanatory enough?
- Behaves appropriately at the intended scale (describe intended scale).
- n/a
- Considered backwards and forwards compatibility issues between client and server.
- In the KeepMount struct (used by keepstore to send its config info to keep-balance) the
ReadOnly
flag is replaced withAllowWrite
andAllowTrash
.- Old keepstore + new keep-balance: old
ReadOnly
flag will be present and ignored by keep-balance, andAllowWrite
andAllowTrash
will be absent (implying false), so keep-balance will treat the volume as non-writable and non-trashable, so it will not try to collect garbage on the relevant volume(s) or use them to create additional needed replicas. At worst, this will result in additional needed replicas being written to other volumes (running on servers where keepstore has been upgraded) even though they are not the optimal rendezvous order. This would be corrected automatically by subsequent keep-balance runs after the affected keepstore servers are upgraded. - New keepstore + old keep-balance: old
ReadOnly
flag will not be present (implying false), so keep-balance will treat the volume as writable and might decide to send trash/pull requests to the server for the relevant volume(s). These trash/pull requests will be ignored/skipped by keepstore. - With a tiny bit more trouble we could continue to also send/obey the ReadOnly flag (keep-balance would expect to receive all three fields until a future version when
ReadOnly
gets dropped). I'm mostly inclined to skip that because the side effects described above seem benign -- but a compatibility shim would be somewhat beneficial if there's a significant amount of time between keep-balance and keepstore upgrades.
- Old keepstore + new keep-balance: old
- In the KeepMount struct (used by keepstore to send its config info to keep-balance) the
- Follows our coding standards and GUI style guidelines.
- ✅
Updated by Tom Clegg about 1 year ago
Merge main and add upgrade note ("upgrade keepstore first").
21126-trash-when-ro @ 235f2d60f025823a72927d7ce56f4214c92cc875 -- developer-run-tests: #3880
Updated by Peter Amstutz about 1 year ago
Tom Clegg wrote in #note-8:
Merge main and add upgrade note ("upgrade keepstore first").
21126-trash-when-ro @ 235f2d60f025823a72927d7ce56f4214c92cc875 -- developer-run-tests: #3880
Although "AllowTrashWhenReadOnly" is sort of self-explanatory, it would still benefit from a comment about what it does and why you'd use it.
Rest LGTM
Updated by Peter Amstutz about 1 year ago
It would also be a good idea to put a note in the admin documentation somewhere describing this procedure of putting a volume in unsafe delete + no trash + delete-only mode when large numbers of blocks need to be deleted from object storage.
Updated by Tom Clegg about 1 year ago
21126-trash-when-ro @ 357858970aef82a2a87b5105d133a52196987381
Adds admin doc page with "collect a lot of garbage collection" recipe.
In the single-bucket case (where you need to start by creating a new bucket for clients to write to while the target volume is read-only),- at the end of the recipe your cluster now uses two buckets
- there's still no way to merge them in either direction
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-11-08 sprint to Development 2023-11-29 sprint
Updated by Peter Amstutz about 1 year ago
Tom Clegg wrote in #note-11:
21126-trash-when-ro @ 357858970aef82a2a87b5105d133a52196987381
Adds admin doc page with "collect a lot of garbage collection" recipe.
In the single-bucket case (where you need to start by creating a new bucket for clients to write to while the target volume is read-only),
- at the end of the recipe your cluster now uses two buckets
- there's still no way to merge them in either direction
My one comment about the process here is that it would be nice if there was a CommitPulls configuration option that was equivalent to -commit-pulls instead of having to muck with the the service file. (Being able to set it through the configuration file means you don't have to go outside the installer's deploy process). How hard would it be to add that flag?
Updated by Tom Clegg about 1 year ago
- enable pull if Collections.BalancePull (default true) is true and
-commit-pull=false
is not given on command line - enable trash if Collections.BalanceTrash (default true) is true and
-commit-trash=false
is not given on command line
(Ignoring the command-line arguments seems a bit sketchy, and erroring out into a restart loop seems unnecessarily annoying.)
That said, but the idea of implementing those as command line arguments was that you'd probably want to run keep-balance on the command line in special cases like this one, and changing arguments is more convenient than changing the config file (especially if the config file is managed by an orchestration tool). Using that logic we could change these instructions to something like
systemctl stop keep-balance keep-balance -commit-pulls=false -commit-trash=false -once 2>&1 | tee >(systemd-cat -t keep-balance) # (check that everything looks ok) keep-balance -commit-pulls=false -commit-trash=true 2>&1 | tee >(systemd-cat -t keep-balance) systemctl start keep-balance
I'm still leaning "move to config" though.
Updated by Brett Smith about 1 year ago
Tom Clegg wrote in #note-14:
(Ignoring the command-line arguments seems a bit sketchy, and erroring out into a restart loop seems unnecessarily annoying.)
If you can dedicate a specific exit code to this, you could error out without the restart loop by adding that code to RestartPreventExitStatus=
in the service definition. Maybe one of the standard-ish Process Exit Codes like EX_USAGE
or EX_CONFIG
?
Updated by Peter Amstutz about 1 year ago
Tom Clegg wrote in #note-14:
That said, but the idea of implementing those as command line arguments was that you'd probably want to run keep-balance on the command line in special cases like this one, and changing arguments is more convenient than changing the config file (especially if the config file is managed by an orchestration tool). Using that logic we could change these instructions to something like
I think from an ops perspective, the purpose of orchestration and configuration management is to avoid having to log into the node and run things on the command line. Doing that as a special case is more complicated and offers more ways to mess up, so it is actually significantly less convenient.
I'm still leaning "move to config" though.
Yes please.
Updated by Brett Smith about 1 year ago
Brett Smith wrote in #note-15:
If you can dedicate a specific exit code to this, you could error out without the restart loop by adding that code to
RestartPreventExitStatus=
in the service definition. Maybe one of the standard-ish Process Exit Codes likeEX_USAGE
orEX_CONFIG
?
Well, this is awkward… but, for what it's worth, exiting 2 seems equally good. It's just as standard as EX_USAGE
, it's EXIT_INVALIDARGUMENT
from LSB. It also has the benefit of being consistent with our Python stuff: by default argparse
exits 2 when you ask it to handle invalid arguments.
Sorry, I wrote my earlier comment not realizing our Go services were already settled on EXIT_INVALIDARGUMENT
. If I had known I definitely would've suggested just codifying that in RestartPreventExitStatus
. That said, I don't think it's a big deal either way. If we want to carry on with EX_USAGE
, sure. Just flagging it.
Updated by Tom Clegg about 1 year ago
arguments→config change is happening in #21189 (we're adding configurable limits there, so disabling trash or pull will just be "limit 0"). When that's merged we can update the admin instructions here accordingly.
Updated by Tom Clegg about 1 year ago
Docs updated from -commit-...
flags to Collections.BalancePullLimit
and Collections.BalanceTrashLimit
configs.
21126-trash-when-ro @ 3fa9678a0a6caef757209f8666de66fb0896c953
Updated by Peter Amstutz about 1 year ago
Tom Clegg wrote in #note-19:
Docs updated from
-commit-...
flags toCollections.BalancePullLimit
andCollections.BalanceTrashLimit
configs.21126-trash-when-ro @ 3fa9678a0a6caef757209f8666de66fb0896c953
Instead of describing which parameters to set in prose, I think it would be clearer to provide config snippets of the parameters that need to be changed:
Volumes: SAMPLE: ReadOnly: true AllowTrashWhenReadOnly: true DriverParameters: UnsafeDelete: true
Collections: BlobTrashLifetime: 0 BalancePullLimit: 0
In the
Collections
configuration section, setBlobTrashLifetime: 0
. Note this will effectively disable garbage collection on other S3-backed volumes, and garbage collection will delete blocks outright (bypassing the recoverable trash phase) on any non-S3 volumes.
This sentence is really confusing because it describes seemingly contradictory behavior between S3 and non-S3 drivers. It would probably benefit from a sentence explaining what the config item actually means, and then implications for the different drivers.
Garbage collection should now proceed faster on the target volume.
I would phrase this differently, like "objects will be deleted immediately instead of being first copied to trash on the S3 volume, which should significantly speed up cleanup of trashed objects."
Rest LGTM.
Updated by Tom Clegg about 1 year ago
Updated wording as suggested.
21126-trash-when-ro @ 2e88c72ca46a0912c8fb580ca3d63c4f57911c22
Updated by Peter Amstutz about 1 year ago
Tom Clegg wrote in #note-21:
Updated wording as suggested.
21126-trash-when-ro @ 2e88c72ca46a0912c8fb580ca3d63c4f57911c22
Much better. LGTM.
Updated by Tom Clegg about 1 year ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|45ebdd1005f12c3c18355ab511e7a2e7f623358a.