Support #16421
[doc] document deletion lifecycle of collections, and steps to undelete collections
100%
Description
We have some 'Keep collection lifecycle' documentation at https://doc.arvados.org/v2.0/user/tutorials/tutorial-keep-collection-lifecycle.html, which is intended for Arvados users.
We need something similar but more in-depth geared at administrators. It should cover:
a) the various phases of deletion
b) what is necessary to enable deletion (keep-balance and keepstore configuration)
c) what are the flags to tweak the duration of each of the phases of deletion
d) steps to recover collections in each of the phases of deletion (where possible)
For d) we may want to write some code to automate the steps. See also #16427.
Please also review https://doc.arvados.org/v2.0/admin/keep-balance.html which covers some of this ground.
And also https://dev.arvados.org/projects/arvados/wiki/Recovering_lost_data as well as https://doc.arvados.org/2.0/admin/recovering-deleted-collections.html
Let's consolidate all this stuff in one place. Note that that last URL is referenced from the 1.3.3 release notes at https://lists.arvados.org/pipermail/arvados/2019-May/000210.html so it should be replaced with a link to the new canonical source, not just deleted.
Subtasks
Related issues
Associated revisions
History
#1
Updated by Ward Vandewege 9 months ago
- Description updated (diff)
#2
Updated by Ward Vandewege 9 months ago
- Description updated (diff)
#3
Updated by Ward Vandewege 9 months ago
- Description updated (diff)
#4
Updated by Ward Vandewege 8 months ago
- Related to Story #16427: "undelete" command to recover trashed blocks and restore a deleted collection added
#5
Updated by Ward Vandewege 8 months ago
- Description updated (diff)
#6
Updated by Ward Vandewege 8 months ago
- Blocks Story #16514: Actionable insight into keep usage added
#7
Updated by Peter Amstutz 6 months ago
- Release set to 25
#8
Updated by Peter Amstutz 5 months ago
- Target version changed from To Be Groomed to 2020-09-09 Sprint
#9
Updated by Peter Amstutz 5 months ago
- Assigned To set to Ward Vandewege
#10
Updated by Ward Vandewege 5 months ago
- Description updated (diff)
#11
Updated by Ward Vandewege 5 months ago
- Status changed from New to In Progress
#12
Updated by Ward Vandewege 5 months ago
Ready for first look in f7f5dfb456c1ddb6db63cf6e6a79e5d468d20a20 on 16421-document-collection-deletion-lifecycle
Pushed missing files in 8a00acc17403e7836b88b1e9e66b4ff47d5505f2
#13
Updated by Peter Amstutz 5 months ago
Ward Vandewege wrote:
Ready for first look in f7f5dfb456c1ddb6db63cf6e6a79e5d468d20a20 on 16421-document-collection-deletion-lifecycle
Recovering data¶
I usually try to have a brief introduction at the top, like, "Arvados has several features to prevent accidental loss/deletion of data, but accidents happen..."
Mention data recovery can be used for both deleting collections and overwritten collection contents.
Link to "Data lifecycle" architecture page.
Tell user to look in the trash first.
Tell user collection contents may be available somewhere else and try searching by PDH.
Mention/link to collection versioning as a way to prevent accidental overwriting of collection contents.
When mentioning configuration items, include the section, eg AuditLogs.UnloggedAttributes
, AuditLogs.MaxAge
, Collections.BlobMissingReport
"Obviously this could be improved ..." if it is obvious you don't need to say it, but then raises the question of why the example doesn't do it.
Architecture¶
I would tweak the section labels to avoid parenthesis, something like "Storage in Keep" and "Computation with Crunch"
"Keep - Content-Addressable Storage" the leading "Keep" in the title is redundant.
I would reorder the pages:
- Content-Addressable Storage
- Keep clients
- Manifest format
- Data lifecycle
User guide¶
On "Keep collection lifecycle" there's a section "Collection lifecycle attributes" which is mostly a duplicate of the "Data lifecycle" page, it should just link to that page instead.
#14
Updated by Ward Vandewege 5 months ago
Peter Amstutz wrote:
Ward Vandewege wrote:
Ready for first look in f7f5dfb456c1ddb6db63cf6e6a79e5d468d20a20 on 16421-document-collection-deletion-lifecycle
Recovering data¶
I usually try to have a brief introduction at the top, like, "Arvados has several features to prevent accidental loss/deletion of data, but accidents happen..."
Mention data recovery can be used for both deleting collections and overwritten collection contents.
Link to "Data lifecycle" architecture page.
Tell user to look in the trash first.
Tell user collection contents may be available somewhere else and try searching by PDH.
Mention/link to collection versioning as a way to prevent accidental overwriting of collection contents.
OK, I've made a bunch of changes along those lines. I've also reduced duplication in the user guide, by making the "Keep lifecycle" page there just about "Trashing and untrashing data", linking to the main "Collection lifecycle" page in the Architecture section.
When mentioning configuration items, include the section, eg
AuditLogs.UnloggedAttributes
,AuditLogs.MaxAge
,Collections.BlobMissingReport
Fixed.
"Obviously this could be improved ..." if it is obvious you don't need to say it, but then raises the question of why the example doesn't do it.
I dropped that line. That code is mostly redundant, the new recover-collections tool is the better way to get out of that situation. But it's helpful to show the 'manual way' so that the admin can see that there is a manual way that is not complicated. Increases confidence.
Architecture¶
I would tweak the section labels to avoid parenthesis, something like "Storage in Keep" and "Computation with Crunch"
OK done.
"Keep - Content-Addressable Storage" the leading "Keep" in the title is redundant.
The title in the menu is also the title of the page. So I want to keep the word 'Keep' in there. I've renamed it to 'Introduction to Keep'.
I would reorder the pages:
- Content-Addressable Storage
- Keep clients
- Manifest format
- Data lifecycle
Done, though I swapped the lifecycle and manifest format pages, the latter being lower level.
User guide¶
On "Keep collection lifecycle" there's a section "Collection lifecycle attributes" which is mostly a duplicate of the "Data lifecycle" page, it should just link to that page instead.
Ah, yes, I forgot to do that. Done now.
changes ready for another look at dafd68e639a416fcae216c639c6e38beab53b214 on branch 16421-document-collection-deletion-lifecycle
#15
Updated by Peter Amstutz 5 months ago
LGTM
#16
Updated by Ward Vandewege 5 months ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|9314e5dbcf51231eba0deb7e14522a281da22925.
Merge branch '16421-document-collection-deletion-lifecycle'
closes #16421
Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <ward@curii.com>