Idea #2492: Run Job tasks in a Docker container - Arvados

Actions

Copy link

Idea #2492

closed

Run Job tasks in a Docker container

Added by Brett Smith about 11 years ago. Updated almost 11 years ago.

Status:

Resolved

Priority:

Normal

Assigned To:

Brett Smith

Category:

Target version:

2014-05-07 Storing and Organizing Data

Start date:

05/05/2014

Due date:

Story points:

1.0

Description

Write a tool that can set up an appropriate environment to run a Job. At the latest, Crunch v2 will use this to actually run those Jobs in a more stable, predictable environment.

Subtasks 3 (0 open — 3 closed)

Actions

Copy link

Updated by Brett Smith about 11 years ago

Description updated (diff)
Assigned To set to Brett Smith

Actions

Copy link

Updated by Brett Smith about 11 years ago

Interesting issue: right now we can't build one Dockerfile that accommodates arv-crunch-job. That calls arv-mount, which needs FUSE, which needs /dev/fuse, and Docker just gained the ability to mknod inside a Dockerfile.

We have a few options:

Use one of the hacky solutions for mknod that people have been doing to date—most likely, with docker run --privileged in the Makefile.
Rearchitect Crunch so that the mount always lives on the compute node, and then expose it to the Job container as a volume.
Wait for this PR to make it to release, and then rely on it.

Actions

Copy link

Updated by Tom Clegg about 11 years ago

Subject changed from Run Jobs in a Docker container to Run Job tasks in a Docker container

Actions

Copy link

Updated by Tom Clegg about 11 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by Brett Smith almost 11 years ago

Project changed from Arvados to 35
Status changed from In Progress to New
Target version deleted (~~2014-04-16 Dev tools and data/resource management~~)

The branch 2492-docker-crunch-jobs has a Dockerfile with all the SDKs installed, as well as a proposed patch to crunch-job to support a specified docker_image as a runtime constraint. I'm coordinating with Ward to test this in the staging environment. It requires a new Linux, so that's at least a little involved.

Actions

Copy link

Updated by Brett Smith almost 11 years ago

Project changed from 35 to Arvados
Status changed from New to In Progress
Target version set to 2014-04-16 Dev tools and data/resource management

Actions

Copy link

Updated by Tom Clegg almost 11 years ago

Target version changed from 2014-04-16 Dev tools and data/resource management to 2014-05-07 Storing and Organizing Data

Actions

Copy link

Updated by Brett Smith almost 11 years ago

Estimated time set to 8.00 h
Story points changed from 2.0 to 1.0

Updated numbers for this sprint.

Actions

Copy link

Updated by Peter Amstutz almost 11 years ago

If the user uses a symbolic name for the docker image, can we resolve that to a hash and record the hash for the job, like we do for script versions?

Actions

Copy link

#10

Updated by Brett Smith almost 11 years ago

Peter Amstutz wrote:

If the user uses a symbolic name for the docker image, can we resolve that to a hash and record the hash for the job, like we do for script versions?

Definitely this represents the ultimate direction we want to head with Docker, applying the same job reuse logic to images that we do to script versions. And the output of docker.io images --no-trunc is easy enough to parse to do the translation.

Unfortunately, we have a logistical snafu in that the necessary information is far away from the API server. Docker is only installed on the compute nodes, which the API server can only interact with via SLURM. And as far as I can tell, the only way to find out from the command line if a new image is available from the repository is to try to pull it. All this means that it wouldn't be too difficult to record the information, but then trying to use that hash to figure out job reuse would be unreasonably expensive: you'd have to wait for a compute node to become available, run docker.io pull on it, wait for a potentially lengthy install to finish, and then see if the image hash changed.

Our story around Docker image management still needs to be hammered out. We've talked generally about storing those images in Keep, and then identifying them by their Collection hash, which would go much farther to enable the kind of smarts you're anticipating. I fully expect we'll do that, and I think it's a story for a future sprint. This story is first about building the arvados/jobs image, and second about providing a consistent environment for Jobs. Containerizing Jobs in crunch-job is something we knew we wanted, and let us prove that arvados/jobs works to spec. The docker pull logic in it is more of a stopgap, the quickest way to get the desired image on all the compute nodes. This means support for provenance is admittedly not fully baked yet, and I think solving that means more consideration about how Docker images live in Arvados as a whole. Any halfway effort to address it now will probably get replaced when that happens.

tl;dr: It's a great idea, but I'm unsure now is the right time.

(Writing this up made me realize we have a bug in that specifying an image hash is fine for docker.io run but not docker.io pull. I'll have to figure out a bugfix for that.)

Actions

Copy link

#11

Updated by Brett Smith almost 11 years ago

Brett Smith wrote:

(Writing this up made me realize we have a bug in that specifying an image hash is fine for docker.io run but not docker.io pull. I'll have to figure out a bugfix for that.)

Did the simplest possible thing in 2e31424. It's ready for another look.

Actions

Copy link

#12

Updated by Brett Smith almost 11 years ago

Status changed from In Progress to Resolved

Applied in changeset arvados|commit:222ce386e36b3d146e718a5d2f64a95fb30996bb.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Arvados

Custom queries

Idea #2492

Run Job tasks in a Docker container

Updated by Brett Smith about 11 years ago

Updated by Brett Smith about 11 years ago

Updated by Tom Clegg about 11 years ago

Updated by Tom Clegg about 11 years ago

Updated by Brett Smith almost 11 years ago

Updated by Brett Smith almost 11 years ago

Updated by Tom Clegg almost 11 years ago

Updated by Brett Smith almost 11 years ago

Updated by Peter Amstutz almost 11 years ago

Updated by Brett Smith almost 11 years ago

Updated by Brett Smith almost 11 years ago

Updated by Brett Smith almost 11 years ago