Support #16421

[doc] document deletion lifecycle of collections, and steps to undelete collections

Added by Ward Vandewege 11 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
09/02/2020
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

We have some 'Keep collection lifecycle' documentation at https://doc.arvados.org/v2.0/user/tutorials/tutorial-keep-collection-lifecycle.html, which is intended for Arvados users.

We need something similar but more in-depth geared at administrators. It should cover:

a) the various phases of deletion
b) what is necessary to enable deletion (keep-balance and keepstore configuration)
c) what are the flags to tweak the duration of each of the phases of deletion
d) steps to recover collections in each of the phases of deletion (where possible)

For d) we may want to write some code to automate the steps. See also #16427.

Please also review https://doc.arvados.org/v2.0/admin/keep-balance.html which covers some of this ground.

And also https://dev.arvados.org/projects/arvados/wiki/Recovering_lost_data as well as https://doc.arvados.org/2.0/admin/recovering-deleted-collections.html

Let's consolidate all this stuff in one place. Note that that last URL is referenced from the 1.3.3 release notes at https://lists.arvados.org/pipermail/arvados/2019-May/000210.html so it should be replaced with a link to the new canonical source, not just deleted.


Subtasks

Task #16767: Review 16421-document-collection-deletion-lifecycleResolvedPeter Amstutz


Related issues

Related to Arvados - Story #16427: "undelete" command to recover trashed blocks and restore a deleted collectionResolved06/01/2020

Blocks Arvados Epics - Story #16514: Actionable insight into keep usageNew09/01/202112/31/2021

Associated revisions

Revision 9314e5db
Added by Ward Vandewege 7 months ago

Merge branch '16421-document-collection-deletion-lifecycle'

closes #16421

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

History

#1 Updated by Ward Vandewege 11 months ago

  • Description updated (diff)

#2 Updated by Ward Vandewege 11 months ago

  • Description updated (diff)

#3 Updated by Ward Vandewege 11 months ago

  • Description updated (diff)

#4 Updated by Ward Vandewege 11 months ago

  • Related to Story #16427: "undelete" command to recover trashed blocks and restore a deleted collection added

#5 Updated by Ward Vandewege 11 months ago

  • Description updated (diff)

#6 Updated by Ward Vandewege 10 months ago

  • Blocks Story #16514: Actionable insight into keep usage added

#7 Updated by Peter Amstutz 9 months ago

  • Release set to 25

#8 Updated by Peter Amstutz 8 months ago

  • Target version changed from To Be Groomed to 2020-09-09 Sprint

#9 Updated by Peter Amstutz 8 months ago

  • Assigned To set to Ward Vandewege

#10 Updated by Ward Vandewege 8 months ago

  • Description updated (diff)

#11 Updated by Ward Vandewege 8 months ago

  • Status changed from New to In Progress

#12 Updated by Ward Vandewege 8 months ago

Ready for first look in f7f5dfb456c1ddb6db63cf6e6a79e5d468d20a20 on 16421-document-collection-deletion-lifecycle

Pushed missing files in 8a00acc17403e7836b88b1e9e66b4ff47d5505f2

#13 Updated by Peter Amstutz 8 months ago

Ward Vandewege wrote:

Ready for first look in f7f5dfb456c1ddb6db63cf6e6a79e5d468d20a20 on 16421-document-collection-deletion-lifecycle

Recovering data

I usually try to have a brief introduction at the top, like, "Arvados has several features to prevent accidental loss/deletion of data, but accidents happen..."

Mention data recovery can be used for both deleting collections and overwritten collection contents.

Link to "Data lifecycle" architecture page.

Tell user to look in the trash first.

Tell user collection contents may be available somewhere else and try searching by PDH.

Mention/link to collection versioning as a way to prevent accidental overwriting of collection contents.

When mentioning configuration items, include the section, eg AuditLogs.UnloggedAttributes, AuditLogs.MaxAge, Collections.BlobMissingReport

"Obviously this could be improved ..." if it is obvious you don't need to say it, but then raises the question of why the example doesn't do it.

Architecture

I would tweak the section labels to avoid parenthesis, something like "Storage in Keep" and "Computation with Crunch"

"Keep - Content-Addressable Storage" the leading "Keep" in the title is redundant.

I would reorder the pages:

  1. Content-Addressable Storage
  2. Keep clients
  3. Manifest format
  4. Data lifecycle

User guide

On "Keep collection lifecycle" there's a section "Collection lifecycle attributes" which is mostly a duplicate of the "Data lifecycle" page, it should just link to that page instead.

#14 Updated by Ward Vandewege 8 months ago

Peter Amstutz wrote:

Ward Vandewege wrote:

Ready for first look in f7f5dfb456c1ddb6db63cf6e6a79e5d468d20a20 on 16421-document-collection-deletion-lifecycle

Recovering data

I usually try to have a brief introduction at the top, like, "Arvados has several features to prevent accidental loss/deletion of data, but accidents happen..."

Mention data recovery can be used for both deleting collections and overwritten collection contents.

Link to "Data lifecycle" architecture page.

Tell user to look in the trash first.

Tell user collection contents may be available somewhere else and try searching by PDH.

Mention/link to collection versioning as a way to prevent accidental overwriting of collection contents.

OK, I've made a bunch of changes along those lines. I've also reduced duplication in the user guide, by making the "Keep lifecycle" page there just about "Trashing and untrashing data", linking to the main "Collection lifecycle" page in the Architecture section.

When mentioning configuration items, include the section, eg AuditLogs.UnloggedAttributes, AuditLogs.MaxAge, Collections.BlobMissingReport

Fixed.

"Obviously this could be improved ..." if it is obvious you don't need to say it, but then raises the question of why the example doesn't do it.

I dropped that line. That code is mostly redundant, the new recover-collections tool is the better way to get out of that situation. But it's helpful to show the 'manual way' so that the admin can see that there is a manual way that is not complicated. Increases confidence.

Architecture

I would tweak the section labels to avoid parenthesis, something like "Storage in Keep" and "Computation with Crunch"

OK done.

"Keep - Content-Addressable Storage" the leading "Keep" in the title is redundant.

The title in the menu is also the title of the page. So I want to keep the word 'Keep' in there. I've renamed it to 'Introduction to Keep'.

I would reorder the pages:

  1. Content-Addressable Storage
  2. Keep clients
  3. Manifest format
  4. Data lifecycle

Done, though I swapped the lifecycle and manifest format pages, the latter being lower level.

User guide

On "Keep collection lifecycle" there's a section "Collection lifecycle attributes" which is mostly a duplicate of the "Data lifecycle" page, it should just link to that page instead.

Ah, yes, I forgot to do that. Done now.

changes ready for another look at dafd68e639a416fcae216c639c6e38beab53b214 on branch 16421-document-collection-deletion-lifecycle

#15 Updated by Peter Amstutz 7 months ago

LGTM

#16 Updated by Ward Vandewege 7 months ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF