[DRAFT] [SDKs] arv-keepdocker should be smarter about caching the image to upload
As of this writing, when arv-keepdocker uploads an image to Keep, it saves the image to disk first, under
~/.cache/arvados/docker. It will use this cache to resume an interrupted download.
On shell nodes, this has been known to fill the home directory partition. When this happens, you get the lovely error message "write /dev/stdout: no space left on device"
arv-keepdocker should be smarter about this cache. Possibilities:
- Check space available on the home directory partition. If there's not enough space to save the Docker image, don't try; just pipe
docker savedirectly into
- Support caching the image somewhere else of the user's choosing.
- Others? We should spec out a preferred approach before implementation begins.
#4 Updated by Joshua Randall over 3 years ago
I can understand why you'd want to be able to resume an interrupted arv-put since that it might take a while depending on available bandwidth, but the `docker save` operation seems to be fairly lightweight -- why can't it just be run again rather than cached? Is `docker save` not pure? Oh, I see that it isn't - nice one, docker: https://github.com/docker/docker/issues/8819
In more recent versions of docker (1.8.0+) where all of the issues regarding actual nondeterministic output from `docker save` have been resolved, I believe the output of `docker save` can be "purified" by zeroing out the mtimes in the tar file. I've written a script to do that:
#!/usr/bin/env python import tarfile from sys import stdin, stdout tar_in = tarfile.open(fileobj=stdin, mode="r|") tar_out = tarfile.open(fileobj=stdout, mode="w|", format=tarfile.PAX_FORMAT) for tarinfo in tar_in: tarinfo.mtime = 0 tar_out.addfile(tarinfo, fileobj=tar_in.extractfile(tarinfo)) tar_in.close() tar_out.close()
Using this results in always the same image (assuming new versions of docker):
$ docker save | ./tar-mtime-zero.py | md5sum 8a1ec35dc408270d0dbcceb6d2f064fe - $ docker save | ./tar-mtime-zero.py | md5sum 8a1ec35dc408270d0dbcceb6d2f064fe - $ docker save | ./tar-mtime-zero.py | md5sum 8a1ec35dc408270d0dbcceb6d2f064fe -
Given this, transfers could be resumed by recreating the temporary file using the "purified" `docker save` output, or potentially with no temporary file given an appropriate interface to arv-put and the ability to skip ahead in the stream to the place where the previous transfer was left off.
#5 Updated by Brett Smith over 3 years ago
Joshua Randall wrote:
I can understand why you'd want to be able to resume an interrupted arv-put since that it might take a while depending on available bandwidth, but the `docker save` operation seems to be fairly lightweight -- why can't it just be run again rather than cached?
One contributing factor: in order to safely accommodate the general case, arv-put only resumes uploads for files whose inode hasn't changed. If we write to
/tmp or another location that gets cleaned regularly, arv-put won't resume after the original file is removed, even if we put the exact same file in the exact same location.
Of course, from here you might simply argue that arv-put should use some more interesting metadata to ensure a file hasn't changed. Some wacky harebrained technology like a checksum.