Idea #22458
Updated by Peter Amstutz about 1 month ago
For provenance, I would like to keep collection records around.
However, in some cases I don't want to store the intermediate data. For example, I might have processing steps where the output is just as large or larger than the input data.
Propose being able to set @replication_desired@ to zero to indicate that the underlying blocks can be trashed by keep-balance, without them being reported as "missing" blocks. Once set to zero, @replication_desired@ cannot be increased. I call these "ghost collections".
Fetching a ghost collection returns an unsigned manifest.
Ghost collection records should behave similarly to frozen projects: read-only, except for being moved between projects (it might be ok to edit metadata such as name and properties as well).
Similar to @trash_at@ / @delete_at@, it would also be nice to have a @ghost_at@ field, and a corresponding @output_ghost_ttl@ on container requests that lets you specify that a collection should be ghosted at some point in the future -- helpful to keep intermediate results around for a little while, but not forever.
Clients such as Workbench, keep-web, Python SDK, etc should be made aware of ghost collections, so that they return a sensible error if the user tries to read a file, instead of a scary "failed to read block" error.