Project

General

Profile

Actions

Idea #22458

open

Ability to intentionally turn a collection a "ghost" collection

Added by Peter Amstutz 3 months ago. Updated about 20 hours ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Keep
Target version:
Start date:
Due date:
Story points:
-

Description

For provenance, I would like to keep collection records around.

However, in some cases I don't want to store the intermediate data. For example, I might have processing steps where the output is just as large or larger than the input data.

Propose being able to set replication_desired to zero to indicate that the underlying blocks can be trashed by keep-balance, without them being reported as "missing" blocks. Once set to zero, replication_desired cannot be increased. I call these "ghost collections".

(Another name that just came to me is "dehydrated" or "freeze dried" collections).

Fetching a ghost collection returns an unsigned manifest.

Ghost collection records should behave similarly to frozen projects: read-only, except for being moved between projects (it might be ok to edit metadata such as name and properties as well).

Similar to trash_at / delete_at, it would also be nice to have a ghost_at field, and a corresponding output_ghost_ttl on container requests that lets you specify that a collection should be ghosted at some point in the future -- helpful to keep intermediate results around for a little while, but not forever.

Clients such as Workbench, keep-web, Python SDK, etc should be made aware of ghost collections, so that they return a sensible error if the user tries to read a file, instead of a scary "failed to read block" error.

If the ghost collection exists on another cluster readable by the user, it should be possible to automatically fetch the blocks via federation, or rematerialize/rehydrate the collection by downloading all the blocks from somewhere else and re-writing the manifest with current block signatures as proof the collection is readable again.


Related issues 1 (1 open0 closed)

Related to Arvados - Idea #22459: Manual "empty trash" commandNewActions
Actions #1

Updated by Peter Amstutz 3 months ago

  • Position changed from -933268 to -933261
Actions #2

Updated by Peter Amstutz 3 months ago

  • Description updated (diff)
  • Subject changed from Ability to intentional make a collection a "ghost" collection to Ability to intentionally turn a collection a "ghost" collection
Actions #3

Updated by Peter Amstutz 3 months ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz 3 months ago

  • Related to Idea #22459: Manual "empty trash" command added
Actions #5

Updated by Peter Amstutz 3 days ago

  • Target version changed from Future to Development 2025-05-14
Actions #6

Updated by Peter Amstutz 3 days ago

  • Target version changed from Development 2025-05-14 to Development 2025-05-28
Actions #7

Updated by Peter Amstutz 3 days ago

  • Target version changed from Development 2025-05-28 to Development 2025-05-14
Actions #8

Updated by Peter Amstutz about 20 hours ago

  • Description updated (diff)
Actions

Also available in: Atom PDF