Support #16421
closed[doc] document deletion lifecycle of collections, and steps to undelete collections
Description
We have some 'Keep collection lifecycle' documentation at https://doc.arvados.org/v2.0/user/tutorials/tutorial-keep-collection-lifecycle.html, which is intended for Arvados users.
We need something similar but more in-depth geared at administrators. It should cover:
a) the various phases of deletion
b) what is necessary to enable deletion (keep-balance and keepstore configuration)
c) what are the flags to tweak the duration of each of the phases of deletion
d) steps to recover collections in each of the phases of deletion (where possible)
For d) we may want to write some code to automate the steps. See also #16427.
Please also review https://doc.arvados.org/v2.0/admin/keep-balance.html which covers some of this ground.
And also https://dev.arvados.org/projects/arvados/wiki/Recovering_lost_data as well as https://doc.arvados.org/2.0/admin/recovering-deleted-collections.html
Let's consolidate all this stuff in one place. Note that that last URL is referenced from the 1.3.3 release notes at https://lists.arvados.org/pipermail/arvados/2019-May/000210.html so it should be replaced with a link to the new canonical source, not just deleted.
Updated by Ward Vandewege over 4 years ago
- Related to Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collection added
Updated by Ward Vandewege over 4 years ago
- Blocks Idea #16514: Actionable insight into keep usage added
Updated by Peter Amstutz over 4 years ago
- Target version changed from To Be Groomed to 2020-09-09 Sprint
Updated by Ward Vandewege over 4 years ago
- Status changed from New to In Progress
Updated by Ward Vandewege over 4 years ago
Ready for first look in f7f5dfb456c1ddb6db63cf6e6a79e5d468d20a20 on 16421-document-collection-deletion-lifecycle
Pushed missing files in 8a00acc17403e7836b88b1e9e66b4ff47d5505f2
Updated by Peter Amstutz over 4 years ago
Ward Vandewege wrote:
Ready for first look in f7f5dfb456c1ddb6db63cf6e6a79e5d468d20a20 on 16421-document-collection-deletion-lifecycle
Recovering data¶
I usually try to have a brief introduction at the top, like, "Arvados has several features to prevent accidental loss/deletion of data, but accidents happen..."
Mention data recovery can be used for both deleting collections and overwritten collection contents.
Link to "Data lifecycle" architecture page.
Tell user to look in the trash first.
Tell user collection contents may be available somewhere else and try searching by PDH.
Mention/link to collection versioning as a way to prevent accidental overwriting of collection contents.
When mentioning configuration items, include the section, eg AuditLogs.UnloggedAttributes
, AuditLogs.MaxAge
, Collections.BlobMissingReport
"Obviously this could be improved ..." if it is obvious you don't need to say it, but then raises the question of why the example doesn't do it.
Architecture¶
I would tweak the section labels to avoid parenthesis, something like "Storage in Keep" and "Computation with Crunch"
"Keep - Content-Addressable Storage" the leading "Keep" in the title is redundant.
I would reorder the pages:
- Content-Addressable Storage
- Keep clients
- Manifest format
- Data lifecycle
User guide¶
On "Keep collection lifecycle" there's a section "Collection lifecycle attributes" which is mostly a duplicate of the "Data lifecycle" page, it should just link to that page instead.
Updated by Ward Vandewege over 4 years ago
Peter Amstutz wrote:
Ward Vandewege wrote:
Ready for first look in f7f5dfb456c1ddb6db63cf6e6a79e5d468d20a20 on 16421-document-collection-deletion-lifecycle
Recovering data¶
I usually try to have a brief introduction at the top, like, "Arvados has several features to prevent accidental loss/deletion of data, but accidents happen..."
Mention data recovery can be used for both deleting collections and overwritten collection contents.
Link to "Data lifecycle" architecture page.
Tell user to look in the trash first.
Tell user collection contents may be available somewhere else and try searching by PDH.
Mention/link to collection versioning as a way to prevent accidental overwriting of collection contents.
OK, I've made a bunch of changes along those lines. I've also reduced duplication in the user guide, by making the "Keep lifecycle" page there just about "Trashing and untrashing data", linking to the main "Collection lifecycle" page in the Architecture section.
When mentioning configuration items, include the section, eg
AuditLogs.UnloggedAttributes
,AuditLogs.MaxAge
,Collections.BlobMissingReport
Fixed.
"Obviously this could be improved ..." if it is obvious you don't need to say it, but then raises the question of why the example doesn't do it.
I dropped that line. That code is mostly redundant, the new recover-collections tool is the better way to get out of that situation. But it's helpful to show the 'manual way' so that the admin can see that there is a manual way that is not complicated. Increases confidence.
Architecture¶
I would tweak the section labels to avoid parenthesis, something like "Storage in Keep" and "Computation with Crunch"
OK done.
"Keep - Content-Addressable Storage" the leading "Keep" in the title is redundant.
The title in the menu is also the title of the page. So I want to keep the word 'Keep' in there. I've renamed it to 'Introduction to Keep'.
I would reorder the pages:
- Content-Addressable Storage
- Keep clients
- Manifest format
- Data lifecycle
Done, though I swapped the lifecycle and manifest format pages, the latter being lower level.
User guide¶
On "Keep collection lifecycle" there's a section "Collection lifecycle attributes" which is mostly a duplicate of the "Data lifecycle" page, it should just link to that page instead.
Ah, yes, I forgot to do that. Done now.
changes ready for another look at dafd68e639a416fcae216c639c6e38beab53b214 on branch 16421-document-collection-deletion-lifecycle
Updated by Ward Vandewege over 4 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|9314e5dbcf51231eba0deb7e14522a281da22925.