Bug #4438

[DRAFT] [SDKs] arv-keepdocker should be smarter about caching the image to upload

Added by Brett Smith over 4 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
SDKs
Target version:
Start date:
11/05/2014
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

As of this writing, when arv-keepdocker uploads an image to Keep, it saves the image to disk first, under ~/.cache/arvados/docker. It will use this cache to resume an interrupted download.

On shell nodes, this has been known to fill the home directory partition. When this happens, you get the lovely error message "write /dev/stdout: no space left on device"

arv-keepdocker should be smarter about this cache. Possibilities:

  • Check space available on the home directory partition. If there's not enough space to save the Docker image, don't try; just pipe docker save directly into arv-put.
  • Support caching the image somewhere else of the user's choosing.
  • Others? We should spec out a preferred approach before implementation begins.

Related issues

Has duplicate Arvados - Bug #7203: arv keep docker fails to save image when home directory is fullDuplicate09/03/2015

History

#1 Updated by Ward Vandewege over 4 years ago

  • Target version changed from Bug Triage to Arvados Future Sprints

#2 Updated by Tom Clegg over 4 years ago

  • Subject changed from [SDKs] arv-keepdocker should be smarter about caching the image to upload to [DRAFT] [SDKs] arv-keepdocker should be smarter about caching the image to upload

#3 Updated by Brett Smith over 4 years ago

I think basically the entire science team has seen this, so fixing it would be a worthwhile use of time.

#4 Updated by Joshua Randall over 3 years ago

I can understand why you'd want to be able to resume an interrupted arv-put since that it might take a while depending on available bandwidth, but the `docker save` operation seems to be fairly lightweight -- why can't it just be run again rather than cached? Is `docker save` not pure? Oh, I see that it isn't - nice one, docker: https://github.com/docker/docker/issues/8819

In more recent versions of docker (1.8.0+) where all of the issues regarding actual nondeterministic output from `docker save` have been resolved, I believe the output of `docker save` can be "purified" by zeroing out the mtimes in the tar file. I've written a script to do that:

tar-mtime-zero.py:

#!/usr/bin/env python

import tarfile
from sys import stdin, stdout

tar_in = tarfile.open(fileobj=stdin, mode="r|")
tar_out = tarfile.open(fileobj=stdout, mode="w|", format=tarfile.PAX_FORMAT)

for tarinfo in tar_in:
    tarinfo.mtime = 0
    tar_out.addfile(tarinfo, fileobj=tar_in.extractfile(tarinfo))

tar_in.close()
tar_out.close()

Using this results in always the same image (assuming new versions of docker):

$ docker save | ./tar-mtime-zero.py | md5sum
8a1ec35dc408270d0dbcceb6d2f064fe  -
$ docker save | ./tar-mtime-zero.py | md5sum
8a1ec35dc408270d0dbcceb6d2f064fe  -
$ docker save | ./tar-mtime-zero.py | md5sum
8a1ec35dc408270d0dbcceb6d2f064fe  -

Given this, transfers could be resumed by recreating the temporary file using the "purified" `docker save` output, or potentially with no temporary file given an appropriate interface to arv-put and the ability to skip ahead in the stream to the place where the previous transfer was left off.

#5 Updated by Brett Smith over 3 years ago

Joshua Randall wrote:

I can understand why you'd want to be able to resume an interrupted arv-put since that it might take a while depending on available bandwidth, but the `docker save` operation seems to be fairly lightweight -- why can't it just be run again rather than cached?

One contributing factor: in order to safely accommodate the general case, arv-put only resumes uploads for files whose inode hasn't changed. If we write to /tmp or another location that gets cleaned regularly, arv-put won't resume after the original file is removed, even if we put the exact same file in the exact same location.

Of course, from here you might simply argue that arv-put should use some more interesting metadata to ensure a file hasn't changed. Some wacky harebrained technology like a checksum.

Also available in: Atom PDF