Add flag to arv-put to set a trash_at date
The use case would be for when sequencers finish uploading to arvados, we can set the timer to delete that data after some designated time period.
Trash times could be specified in 2 ways:
1. Absolute datetime: Could be accepted from a param like
--trash_at "YYYY-MM-DD HH:MM"
2. Relative times: Could be accepted from a param like
--trash_after XX (where XX is number of days)
Both parameters would be mutually exclusive.
The accepted format for absolute datetimes would be the one described at https://en.wikipedia.org/wiki/ISO_8601
The accepted relative time parameter unit will be number of days (to be converted to amount of seconds) and it should take into account possible timezone changes.
The relative trash time should take note of the upload finish process datetime. When uploading using the "rsync mode", it should update the trash_at value when checkpointing.
#8 Updated by Lucas Di Pentima about 2 months ago
Updates at 56e6ffd4a - branch
Test run: https://ci.curoverse.com/job/developer-run-tests/1276/
--trash-at DATEwhere DATE's format is the subset of ISO8601 supported by ciso8601 python module.
--trash-after N, being N the number of days in the future from the upload finish datetime.
- When not passing timezone information to
--trash-at, it will assume the provided date is expressed in the local system's timezone configuration.
- When using
--trash-after N, it periodically updates the updated collection's
#10 Updated by Peter Amstutz about 2 months ago
+ if trash_at.tzinfo is not None: + # Timezone aware datetime provided. + utcoffset = trash_at.utcoffset() + else: + # Timezone naive datetime provided. Assume is local. + utcoffset = datetime.timedelta(hours=-time.timezone/3600) + # Convert to UTC timezone naive datetime. + trash_at = trash_at.replace(tzinfo=None) - utcoffset
Not sure why you're rounding off to hours if the timezone struct has the offset in seconds? Also why calculate a negative offset and then subtract it when the source value is meant to be added?
+ logger.error("--trash-at argument should be set in the future") + sys.exit(1) + if args.trash_after is not None: + if args.trash_after < 1: + logger.error("--trash-after argument should be >= 1")
"should" → "must" since it is being enforced.
What happens if the user gives YYYY-MM-DD with no time? My guess is that you probably get 00:00 (midnight) at the start of the day as the expire date, but we should confirm. We should also consider if it would be more user friendly to make the expire date 23:59 at the end of the day. Does it accept YYYY-MM or YYYY? ¿En Español se escribe "AAAA-MM-DD"?
#11 Updated by Lucas Di Pentima about 2 months ago
Updates at 64eadab02
Test run: https://ci.curoverse.com/job/developer-run-tests/1291/
- Don't round off to hours the timezone
- Don't double substract the utc offset on the code
- Error message corrections
- Don't accept day-less dates (eg: 2020-01) on
- If the user doesn't provide HH:MM on
--trash-at, assume it to be at the end of the specified day (local time, of course) instead of at 00:00
- Log the expiration date after saving the collection.
- Tests additions
#13 Updated by Peter Amstutz about 1 month ago
I think you have a daylight savings time bug (time zones are awful, and daylight savings is the worst):
$ date Mon Jun 10 10:57:07 EDT 2019 $ arv-put --trash-after 2 lightning-2.6.3-sm+tb-linux.xpi 2019-06-10 10:57:39 arvados.arv_put INFO: Creating new cache file at /home/peter/.cache/arvados/arv-put/5b9195b264c8dda29afac74006b05663 3M / 3M 100.0% 2019-06-10 10:57:42 arvados.arv_put INFO: 2019-06-10 10:57:42 arvados.arv_put INFO: Collection saved as 'Saved at 2019-06-10 14:57:39 UTC by peter@petervg'. It will expire on 2019-06-12 09:57:40 -0400. c97qk-4zz18-rj4m8nvzt0lokcv
I would have expected now + 2 days to be 2019-06-12 10:57:40 -0400
#14 Updated by Lucas Di Pentima about 1 month ago
Was able to handle DST without including any additional dependencies. Not sure how to write a test covering different DST scenarios. I tested it manually, though.
#15 Updated by Peter Amstutz about 1 month ago
Daylight savings time works.
I don't want to go back and forth on this one any more but there was one more thing I noticed with manual testing.
$ arv-put --trash-at '2019-06-12 4:51:00' arvados_version.py 2019-06-11 14:40:55 arvados.arv_put ERROR: --trash-at argument format invalid, use --help to see examples. (X-Request-Id: req-fl7ankvzwv2h14tusx71) (venv) peter@petervg:[pts/3]:~/.arvbox/arvbox/arvados/sdk/python [(HEAD detached at origin/14930-arvput-trash-at)]
$ arv-put --trash-at '2019-06-12 04:51:00' arvados_version.py 2019-06-11 14:41:00 arvados.arv_put INFO: Resuming upload from cache file /home/peter/.cache/arvados/arv-put/4eaf371b61459e2188bc4e4cbeaa497e 0M / 0M 100.0% 2019-06-11 14:41:00 arvados.arv_put INFO: 2019-06-11 14:41:00 arvados.arv_put INFO: Collection saved as 'Saved at 2019-06-11 18:41:00 UTC by peter@petervg'. It will expire on 2019-06-12 04:51:00 -0400. 4xphq-4zz18-yps644jfbjt1v8f
So '2019-06-12 4:51:00' fails but '2019-06-12 04:51:00' works, it looks like the parser is very particular about field widths. (Also noticed that it is documented to have a 'T' in between the date and time but it also accepts a space).
I'll leave this one up to you if you want to do anything about it, otherwise LGTM.