Project

General

Profile

Actions

Support #16421

closed

[doc] document deletion lifecycle of collections, and steps to undelete collections

Added by Ward Vandewege over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Due date:
Story points:
-
Release relationship:
Auto

Description

We have some 'Keep collection lifecycle' documentation at https://doc.arvados.org/v2.0/user/tutorials/tutorial-keep-collection-lifecycle.html, which is intended for Arvados users.

We need something similar but more in-depth geared at administrators. It should cover:

a) the various phases of deletion
b) what is necessary to enable deletion (keep-balance and keepstore configuration)
c) what are the flags to tweak the duration of each of the phases of deletion
d) steps to recover collections in each of the phases of deletion (where possible)

For d) we may want to write some code to automate the steps. See also #16427.

Please also review https://doc.arvados.org/v2.0/admin/keep-balance.html which covers some of this ground.

And also https://dev.arvados.org/projects/arvados/wiki/Recovering_lost_data as well as https://doc.arvados.org/2.0/admin/recovering-deleted-collections.html

Let's consolidate all this stuff in one place. Note that that last URL is referenced from the 1.3.3 release notes at https://lists.arvados.org/pipermail/arvados/2019-May/000210.html so it should be replaced with a link to the new canonical source, not just deleted.


Subtasks 1 (0 open1 closed)

Task #16767: Review 16421-document-collection-deletion-lifecycleResolvedPeter Amstutz09/02/2020Actions

Related issues

Related to Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionResolvedTom Clegg06/01/2020Actions
Blocks Arvados Epics - Idea #16514: Actionable insight into keep usageNewActions
Actions #1

Updated by Ward Vandewege over 4 years ago

  • Description updated (diff)
Actions #2

Updated by Ward Vandewege over 4 years ago

  • Description updated (diff)
Actions #3

Updated by Ward Vandewege over 4 years ago

  • Description updated (diff)
Actions #4

Updated by Ward Vandewege over 4 years ago

  • Related to Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collection added
Actions #5

Updated by Ward Vandewege over 4 years ago

  • Description updated (diff)
Actions #6

Updated by Ward Vandewege over 4 years ago

  • Blocks Idea #16514: Actionable insight into keep usage added
Actions #7

Updated by Peter Amstutz over 4 years ago

  • Release set to 25
Actions #8

Updated by Peter Amstutz over 4 years ago

  • Target version changed from To Be Groomed to 2020-09-09 Sprint
Actions #9

Updated by Peter Amstutz about 4 years ago

  • Assigned To set to Ward Vandewege
Actions #10

Updated by Ward Vandewege about 4 years ago

  • Description updated (diff)
Actions #11

Updated by Ward Vandewege about 4 years ago

  • Status changed from New to In Progress
Actions #12

Updated by Ward Vandewege about 4 years ago

Ready for first look in f7f5dfb456c1ddb6db63cf6e6a79e5d468d20a20 on 16421-document-collection-deletion-lifecycle

Pushed missing files in 8a00acc17403e7836b88b1e9e66b4ff47d5505f2

Actions #13

Updated by Peter Amstutz about 4 years ago

Ward Vandewege wrote:

Ready for first look in f7f5dfb456c1ddb6db63cf6e6a79e5d468d20a20 on 16421-document-collection-deletion-lifecycle

Recovering data

I usually try to have a brief introduction at the top, like, "Arvados has several features to prevent accidental loss/deletion of data, but accidents happen..."

Mention data recovery can be used for both deleting collections and overwritten collection contents.

Link to "Data lifecycle" architecture page.

Tell user to look in the trash first.

Tell user collection contents may be available somewhere else and try searching by PDH.

Mention/link to collection versioning as a way to prevent accidental overwriting of collection contents.

When mentioning configuration items, include the section, eg AuditLogs.UnloggedAttributes, AuditLogs.MaxAge, Collections.BlobMissingReport

"Obviously this could be improved ..." if it is obvious you don't need to say it, but then raises the question of why the example doesn't do it.

Architecture

I would tweak the section labels to avoid parenthesis, something like "Storage in Keep" and "Computation with Crunch"

"Keep - Content-Addressable Storage" the leading "Keep" in the title is redundant.

I would reorder the pages:

  1. Content-Addressable Storage
  2. Keep clients
  3. Manifest format
  4. Data lifecycle

User guide

On "Keep collection lifecycle" there's a section "Collection lifecycle attributes" which is mostly a duplicate of the "Data lifecycle" page, it should just link to that page instead.

Actions #14

Updated by Ward Vandewege about 4 years ago

Peter Amstutz wrote:

Ward Vandewege wrote:

Ready for first look in f7f5dfb456c1ddb6db63cf6e6a79e5d468d20a20 on 16421-document-collection-deletion-lifecycle

Recovering data

I usually try to have a brief introduction at the top, like, "Arvados has several features to prevent accidental loss/deletion of data, but accidents happen..."

Mention data recovery can be used for both deleting collections and overwritten collection contents.

Link to "Data lifecycle" architecture page.

Tell user to look in the trash first.

Tell user collection contents may be available somewhere else and try searching by PDH.

Mention/link to collection versioning as a way to prevent accidental overwriting of collection contents.

OK, I've made a bunch of changes along those lines. I've also reduced duplication in the user guide, by making the "Keep lifecycle" page there just about "Trashing and untrashing data", linking to the main "Collection lifecycle" page in the Architecture section.

When mentioning configuration items, include the section, eg AuditLogs.UnloggedAttributes, AuditLogs.MaxAge, Collections.BlobMissingReport

Fixed.

"Obviously this could be improved ..." if it is obvious you don't need to say it, but then raises the question of why the example doesn't do it.

I dropped that line. That code is mostly redundant, the new recover-collections tool is the better way to get out of that situation. But it's helpful to show the 'manual way' so that the admin can see that there is a manual way that is not complicated. Increases confidence.

Architecture

I would tweak the section labels to avoid parenthesis, something like "Storage in Keep" and "Computation with Crunch"

OK done.

"Keep - Content-Addressable Storage" the leading "Keep" in the title is redundant.

The title in the menu is also the title of the page. So I want to keep the word 'Keep' in there. I've renamed it to 'Introduction to Keep'.

I would reorder the pages:

  1. Content-Addressable Storage
  2. Keep clients
  3. Manifest format
  4. Data lifecycle

Done, though I swapped the lifecycle and manifest format pages, the latter being lower level.

User guide

On "Keep collection lifecycle" there's a section "Collection lifecycle attributes" which is mostly a duplicate of the "Data lifecycle" page, it should just link to that page instead.

Ah, yes, I forgot to do that. Done now.

changes ready for another look at dafd68e639a416fcae216c639c6e38beab53b214 on branch 16421-document-collection-deletion-lifecycle

Actions #15

Updated by Peter Amstutz about 4 years ago

LGTM

Actions #16

Updated by Ward Vandewege about 4 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF