Story #11016

Document how to choose a suitable blob signature TTL

Added by Tom Clegg over 2 years ago. Updated 10 days ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
Story points:
-

Subtasks

Task #15482: ReviewNewTom Morris

History

#1 Updated by Tom Morris almost 2 years ago

  • Target version set to Arvados Future Sprints

#2 Updated by Nico C├ęsar about 2 months ago

I am re-using this ticket, since we had a lot of inquiries about this. This is an example of it:

We are trying to delete data from the keepstore. But after we run "arv collection delete --uuid=zzzzz-4zz18-7p9s7j1qa" it trashes it, but sets the final delete date 2 weeks in the future.

I ask because on our test server we have filled up the keepstore and can't delete data to continue testing. It's put a stop to the project.

 "delete_at":"2019-07-11T20:45:30.557593000Z",
 "trash_at":"2019-06-27T20:45:30.557593000Z",
 "is_trashed":true,

Even if we change the trash times in " /etc/arvados/keepstore/keepstore.yml" and reinstall keepstore + restart the service.

\# How often to check for (and delete) trashed blocks whose

 # TrashLifetime has expired.
 TrashCheckInterval: 1h0m0s

 # Time duration after a block is trashed during which it can be
 # recovered using an /untrash request.
 TrashLifetime: 1h0m0s

this was my answer:

Just to make sure we are all in the same page in terms of terminology, there is a good explanation here: https://doc.arvados.org/user/tutorials/tutorial-keep-collection-lifecycle.html

And the method you used is "delete" in collections, from https://doc.arvados.org/v1.4/api/methods/collections.html
----------
delete

Put a Collection in the trash. This sets the trash_at field to now and delete_at field to now + token TTL. A trashed collection is invisible to most API calls unless the include_trash parameter is true.
-----------

As you can see the "token TTL" expressed there is set by default in 2 weeks, this is the Collections->BlobSigningTTL and Collections->DefaultThrashLifetime parameters in the configuration. Here is description from in https://doc.arvados.org/v1.4/admin/config.html

-------

# Lifetime (in seconds) of blob permission signatures generated by
# the API server. This determines how long a client can take (after
# retrieving a collection record) to retrieve the collection data
# from Keep. If the client needs more time than that (assuming the
# collection still has the same content and the relevant user/token
# still has permission) the client can retrieve the collection again
# to get fresh signatures.
#
# This must be exactly equal to the -blob-signature-ttl flag used by
# keepstore servers.  Otherwise, reading data blocks and saving
# collections will fail with HTTP 403 permission errors.
#
# Modifying blob_signature_ttl invalidates existing signatures; see
# blob_signing_key note above.
#
# The default is 2 weeks.
BlobSigningTTL: 336h

# Default lifetime for ephemeral collections: 2 weeks. This must not
# be less than blob_signature_ttl.
DefaultTrashLifetime: 336h

------

This assumes that you have the central configuration in /etc/arvados/config.yml, keep-balance.service up and running.

As you can see we have 3 different places with pieces of the information. And usually our test server we have filled up the keepstore is the reason they need a quick "delete all this" process without having to wait 2 weeks.

#3 Updated by Tom Morris about 2 months ago

One of the main things that's missing from https://doc.arvados.org/user/tutorials/tutorial-keep-collection-lifecycle.html (which is really conceptual documentation, not a tutorial)
is "when do I get my disk space back?" and the associated Keep store pieces of the data lifecycle.

#4 Updated by Tom Morris about 2 months ago

  • Target version changed from Arvados Future Sprints to 2019-07-31 Sprint

#5 Updated by Tom Morris about 1 month ago

  • Assigned To set to Tom Clegg

#6 Updated by Tom Clegg 24 days ago

  • Target version changed from 2019-07-31 Sprint to 2019-08-14 Sprint

#7 Updated by Tom Clegg 10 days ago

  • Status changed from New to In Progress

BlobSigningTTL determines the minimum lifetime of transient data, i.e., blocks that have been written to disk/cloud backend devices, but are not referenced by collections.

If BlobSigningTTL is too long, data will still be stored long after the collections are deleted, and you will needlessly fill up disks or waste money on cloud storage.

If BlobSigningTTL is too short, long-running processes/containers will fail when they take too long (a) between writing blocks and writing collections that reference them, or (b) between reading collections and reading the referenced blocks.

#8 Updated by Tom Clegg 10 days ago

  • Target version changed from 2019-08-14 Sprint to 2019-08-28 Sprint

Also available in: Atom PDF