Bug #14495

[crunch2] include space required to download/unpack docker image in tmp disk request

Added by Ward Vandewege 27 days ago. Updated about 4 hours ago.

Status:
New
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
12/10/2018
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

Proposed fix:

API server should add Docker image size multiplied by 3 to the disk space request. (The multiplication factor is to account for expansion of compressed layers, and staging the layers to scratch space while they are decompressed.)

Original description:

Container request d79c1-xvhdp-28emqt3jby9s2a8 was seemingly stuck - its child container requests remained in the queued state after many hours.

What was actually happening was that the child container d79c1-dz642-3apw65ik2snziqh was scheduled on a compute node that didn't have sufficient scratch space available to load the (large) docker image:

2018-11-14T13:33:31.066367347Z Docker response: {"errorDetail":{"message":"Error processing tar file(exit status 1): write /195522a76483d96f4cc529d0b11e5e840596992eddfb89f4f86499a4af381832/layer.tar: no space left on device"},"error":"Error processing tar file(exit status 1): write /195522a76483d96f4cc529d0b11e5e840596992eddfb89f4f86499a4af381832/layer.tar: no space left on device"}
2018-11-14T13:33:31.066643166Z Running [arv-mount --foreground --allow-other --read-write --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id /tmp/crunch-run.d79c1-dz642-3apw65ik2snziqh.694256924/keep198730187]

This kept happening, and because the container couldn't be started, it remained in Queued state in workbench, with the only hint the above lines in the logs.

The workaround is easy - specify a large enough tmpdirMin in the workflow, but it's way too hard for the user to figure that out.

We should probably error out immediately when this happens, and we need to make it clear to the user what the actual problem is.

Or maybe we can take the size of the docker image into account before allocating a job to a compute node? That would be even better.


Subtasks

Task #14544: Review 14495-crunch-docker-spaceIn ProgressPeter Amstutz


Related issues

Related to Arvados - Bug #14540: [API] Limit number of container lock/unlock cyclesNew

History

#1 Updated by Ward Vandewege 27 days ago

  • Target version set to To Be Groomed

#2 Updated by Ward Vandewege 27 days ago

  • Subject changed from [crunch2] fail container if the compute node it is being run on doesn't have sufficient space to load the docker image to [crunch2] containers are retried indefinitely if the compute node it is being run on doesn't have sufficient space to load the docker image
  • Description updated (diff)

#3 Updated by Peter Amstutz 26 days ago

Yes, this should be handled by the infrastructure.

I had a discussion on this exact topic with the Cromwell folks recently, the heuristic we came up with was to reserve 3x space as the size of the sum of the image layers (that's just the size of the image tarball).

We also allocate nodes based on total space not _available_space so if a node has been up for a while, it could end up caching multiple Docker images reducing available space and throwing off the calculation.

#4 Updated by Peter Amstutz 26 days ago

Also we should limit the number of lock/unlock cycles to avoid this "infinite retry" problem.

#5 Updated by Tom Morris 14 days ago

  • Target version changed from To Be Groomed to 2018-12-12 Sprint

#6 Updated by Peter Amstutz 14 days ago

  • Subject changed from [crunch2] containers are retried indefinitely if the compute node it is being run on doesn't have sufficient space to load the docker image to [crunch2] include space required to download/unpack docker image in tmp disk request

#7 Updated by Peter Amstutz 14 days ago

  • Related to Bug #14540: [API] Limit number of container lock/unlock cycles added

#8 Updated by Peter Amstutz 14 days ago

  • Description updated (diff)

#9 Updated by Peter Amstutz 14 days ago

  • Assigned To set to Peter Amstutz

#10 Updated by Peter Amstutz 8 days ago

  • Description updated (diff)

#11 Updated by Peter Amstutz 5 days ago

We calculate the disk space request as the sum of the requested capacity of "tmp" mounts, not a single number stored in runtime_constraints (unlike the ram request). So the dispatcher should be the one that incorporates image size into the disk space request, not the API server. On the plus side, not modifying the container record means it doesn't invalidate reuse.

#12 Updated by Tom Clegg 5 days ago

Peter Amstutz wrote:

the dispatcher should be the one that incorporates image size into the disk space request, not the API server.

Agreed

#13 Updated by Peter Amstutz 2 days ago

14495-crunch-docker-space @ 934d880aa5d10ed3382f9924a9a9f5694b41f266

  • Estimate size of docker image
  • Incorporate estimate into disk space request

https://ci.curoverse.com/view/Developer/job/developer-run-tests/1006/

#14 Updated by Lucas Di Pentima 2 days ago

  • Nice code comments and clever way to do the estimation!
  • On the two cases that the estimation is 0, can we log a warning message for potential debug needs?
  • Apart from that, LGTM.

#15 Updated by Peter Amstutz about 4 hours ago

Need to go and double check that this fix would have fixed the original problem report.

#16 Updated by Peter Amstutz about 4 hours ago

  • Target version changed from 2018-12-12 Sprint to 2018-12-21 Sprint

Also available in: Atom PDF