Feature #14930
closed
Add flag to arv-put to set a trash_at date
Added by Bryan Cosca over 5 years ago.
Updated over 5 years ago.
Release relationship:
Auto
Description
The use case would be for when sequencers finish uploading to arvados, we can set the timer to delete that data after some designated time period.
Trash times could be specified in 2 ways:
1. Absolute datetime: Could be accepted from a param like --trash_at "YYYY-MM-DD HH:MM"
2. Relative times: Could be accepted from a param like --trash_after XX
(where XX is number of days)
Both parameters would be mutually exclusive.
The accepted format for absolute datetimes would be the one described at https://en.wikipedia.org/wiki/ISO_8601
The accepted relative time parameter unit will be number of days (to be converted to amount of seconds) and it should take into account possible timezone changes.
The relative trash time should take note of the upload finish process datetime. When uploading using the "rsync mode", it should update the trash_at value when checkpointing.
- Target version set to To Be Groomed
- Description updated (diff)
- Description updated (diff)
- Subject changed from Add flag to arv-put to set a trash_at / delete_at date to Add flag to arv-put to set a trash_at date
- Description updated (diff)
- Target version changed from To Be Groomed to Arvados Future Sprints
- Story points set to 1.0
- Target version changed from Arvados Future Sprints to 2019-06-05 Sprint
- Assigned To set to Lucas Di Pentima
- Status changed from New to In Progress
Updates at 56e6ffd4a - branch 14930-arvput-trash-at
Test run: https://ci.curoverse.com/job/developer-run-tests/1276/
arv-put
accepts --trash-at DATE
where DATE's format is the subset of ISO8601 supported by ciso8601 python module.
arv-put
also accepts --trash-after N
, being N the number of days in the future from the upload finish datetime.
- When not passing timezone information to
--trash-at
, it will assume the provided date is expressed in the local system's timezone configuration.
- When using
--update-collection UUID
and --trash-after N
, it periodically updates the updated collection's trash_at
attribute.
- Target version changed from 2019-06-05 Sprint to 2019-06-19 Sprint
+ if trash_at.tzinfo is not None:
+ # Timezone aware datetime provided.
+ utcoffset = trash_at.utcoffset()
+ else:
+ # Timezone naive datetime provided. Assume is local.
+ utcoffset = datetime.timedelta(hours=-time.timezone/3600)
+ # Convert to UTC timezone naive datetime.
+ trash_at = trash_at.replace(tzinfo=None) - utcoffset
Not sure why you're rounding off to hours if the timezone struct has the offset in seconds? Also why calculate a negative offset and then subtract it when the source value is meant to be added?
+ logger.error("--trash-at argument should be set in the future")
+ sys.exit(1)
+ if args.trash_after is not None:
+ if args.trash_after < 1:
+ logger.error("--trash-after argument should be >= 1")
"should" → "must" since it is being enforced.
What happens if the user gives YYYY-MM-DD with no time? My guess is that you probably get 00:00 (midnight) at the start of the day as the expire date, but we should confirm. We should also consider if it would be more user friendly to make the expire date 23:59 at the end of the day. Does it accept YYYY-MM or YYYY? ¿En Español se escribe "AAAA-MM-DD"?
Updates at 64eadab02
Test run: https://ci.curoverse.com/job/developer-run-tests/1291/
- Don't round off to hours the timezone
- Don't double substract the utc offset on the code
- Error message corrections
- Don't accept day-less dates (eg: 2020-01) on
--trash-at
- If the user doesn't provide HH:MM on
--trash-at
, assume it to be at the end of the specified day (local time, of course) instead of at 00:00
- Log the expiration date after saving the collection.
- Tests additions
I think you have a daylight savings time bug (time zones are awful, and daylight savings is the worst):
$ date
Mon Jun 10 10:57:07 EDT 2019
$ arv-put --trash-after 2 lightning-2.6.3-sm+tb-linux.xpi
2019-06-10 10:57:39 arvados.arv_put[6212] INFO: Creating new cache file at /home/peter/.cache/arvados/arv-put/5b9195b264c8dda29afac74006b05663
3M / 3M 100.0% 2019-06-10 10:57:42 arvados.arv_put[6212] INFO:
2019-06-10 10:57:42 arvados.arv_put[6212] INFO: Collection saved as 'Saved at 2019-06-10 14:57:39 UTC by peter@petervg'. It will expire on 2019-06-12 09:57:40 -0400.
c97qk-4zz18-rj4m8nvzt0lokcv
I would have expected now + 2 days to be 2019-06-12 10:57:40 -0400
Daylight savings time works.
I don't want to go back and forth on this one any more but there was one more thing I noticed with manual testing.
$ arv-put --trash-at '2019-06-12 4:51:00' arvados_version.py
2019-06-11 14:40:55 arvados.arv_put[15306] ERROR: --trash-at argument format invalid, use --help to see examples. (X-Request-Id: req-fl7ankvzwv2h14tusx71)
(venv) peter@petervg:[pts/3]:~/.arvbox/arvbox/arvados/sdk/python [(HEAD detached at origin/14930-arvput-trash-at)]
$ arv-put --trash-at '2019-06-12 04:51:00' arvados_version.py
2019-06-11 14:41:00 arvados.arv_put[15334] INFO: Resuming upload from cache file /home/peter/.cache/arvados/arv-put/4eaf371b61459e2188bc4e4cbeaa497e
0M / 0M 100.0% 2019-06-11 14:41:00 arvados.arv_put[15334] INFO:
2019-06-11 14:41:00 arvados.arv_put[15334] INFO: Collection saved as 'Saved at 2019-06-11 18:41:00 UTC by peter@petervg'. It will expire on 2019-06-12 04:51:00 -0400.
4xphq-4zz18-yps644jfbjt1v8f
So '2019-06-12 4:51:00' fails but '2019-06-12 04:51:00' works, it looks like the parser is very particular about field widths. (Also noticed that it is documented to have a 'T' in between the date and time but it also accepts a space).
I'll leave this one up to you if you want to do anything about it, otherwise LGTM.
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Also available in: Atom
PDF