Project

General

Profile

Containers API » History » Revision 16

Revision 15 (Tom Clegg, 07/01/2015 06:04 PM) → Revision 16/64 (Tom Clegg, 07/06/2015 08:53 PM)

{{>TOC}} 

 h1. Containers Jobs API (DRAFT) 

 A Container resource Job is a record of a computational process. 
 * Its goal is to capture, unambiguously, as much information as possible about the environment in which the process was run. For example, git trees, data collections, and docker images are stored as content addresses. This makes it possible to reason about the difference between two processes, and to replay a process at a different time and place. 
 * Clients can read Container Job records, but only the system can create or modify them. 

 *Note about the term "containers" vs. "jobs" and "services":* Here, we focus on the use of containers as producers of output data. We anticipate extending the feature set to cover service containers as well. The distinguishing feature of a service container is that _having it running_ is inherently valuable because of the way it interacts with the outside world. 

 A ContainerRequest JobRequest is a client's expression of interest in knowing the outcome of a computational process. 
 * Typically, in this context the client's description of the process is less precise than a Container: Job: a ContainerRequest JobRequest describes container job _constraints_ which can have different interpretations over time. For example, a ContainerRequest JobRequest with a @{"kind":"git_tree","commit_range":"abc123..master",...}@ mount might be satisfiable by any of several different source trees, and this set of satisfying source trees can change when the repository's "master" branch is updated. 
 * The system is responsible for finding suitable Containers Jobs and assigning them to ContainerRequests. JobRequests. (Currently this is expected to be done synchronously during the containerRequests.create jobRequests.create and containerRequests.update jobRequests.update transactions.) 
 * A ContainerRequest JobRequest may indicate that it can _only_ be satisfied by a new Container Job record (i.e., existing results should not be reused). In this case creating a ContainerRequest JobRequest amounts to a submission to the container job queue. This is appropriate when the purpose of the ContainerRequest JobRequest is to test whether a process is repeatable. 
 * A ContainerRequest JobRequest may indicate that it _cannot_ be satisfied by a new Container Job record. This is an appropriate way to test whether a result is already available. 

 When the system has assigned a Container Job to a ContainerRequest, JobRequest, anyone with permission to read the ContainerRequest JobRequest also has permission to read the Container. Job. 

 h2. Use cases 

 h3. Preview 

 Tell me how you would satisfy container job request X. Which pdh/commits would be used? Is the satisfying container job already started? finished? 

 h3. Submit a previewed existing container job 

 I'm happy with the already-running/finished container job you showed me in "preview". Give me access to that container, job, its logs, and [when it finishes] its output. 

 h3. Submit a previewed new container job 

 I'm happy with the new container job the "preview" response proposed to run. Run that container. job. 

 h3. Submit a new container job (disable reuse) 

 I don't want to use an already-running/finished container. job. Run a new container job that satisfies my container job request. 

 h3. Submit a new duplicate container job (disable reuse) 

 I'm happy with the already-running/finished container job you showed me in "preview". Run a new container job exactly like that one. 

 h3. Select a container job and associate it with my ContainerRequest JobRequest 

 I'm not happy with the container job you chose, but I know of another container job that satisfies my request. Assuming I'm right about that, attach my ContainerRequest JobRequest to the existing container job of my choice. 

 h3. Just do the right thing without a preview 

 Satisfy container job request X one way or another, and tell me the resulting container's job's UUID. 

 h2. ContainerRequest/Container JobRequest/Job life cycle 

 Illustrating container job re-use and preview facility: 
 # Client ClientA CA creates a ContainerRequest CRA JobRequest JRA with priority=0. 
 # Server creates container CX job JX and assigns CX JX to CRA, JRA, but does not try to run CX JX yet because max(priority)=0. 
 # Client ClientA CA presents CX JX to the user. "We haven't computed this result yet, so we'll have to run a new container. job. Is this OK?" 
 # Client ClientB CB creates a ContainerRequest CRB JobRequest JRB with priority=1. 
 # Server assigns CX JX to CRB JRB and puts CX JX in the execution queue with priority=1. 
 # Client ClientA CA updates CRA JRA with priority=2. 
 # Server updates CX JX with priority=2. 
 # Container CX Job JX starts. 
 # Client ClientA CA updates CRA JRA with priority=0. (This is as close as we get to a "cancel" operation.) 
 # Server updates CX JX with priority=1. (CRB (JRB still wants this container job to complete.) 
 # Container CX Job JX finishes. 
 # Clients ClientA CA and ClientB CB have permission to read CX JX (ever since CX JX was assigned to their respective ContainerRequests) JobRequests) as well as its progress indicators, output, and log. 

 h2. "ContainerRequest" "JobRequest" schema 

 |Attribute|Type|Description|Discussion|Examples| 
 |uuid, owner_uuid, modified_by_client_uuid,    modified_by_user_uuid|string|Usual Arvados model attributes||| 
 | 
 |created_at, modified_at|datetime|Usual Arvados model attributes||| 
 | 
 |name|string|Unparsed||| 
 | 
 |description|text|Unparsed||| 
 | 
 |properties|object|Client-defined structured data that does not affect how the container job is run.||| 
 | 
 |state|string|Once a request is committed, priority is the only attribute that can be modified.||@"Uncommitted"@ 
 @"Committed"@| 
 | 
 |requesting_container_uuid|string|When |requesting_job_uuid|string|When the referenced container job ends, the container job request is automatically cancelled.|Can be null. If changed to a non-null value, it must refer to a container job that is running.|| 
 | 
 |container_uuid|uuid|The container |job_uuid|uuid|The job that satisfies this container job request.|See "methods" below.|| 
 | 
 |mounts|hash|Objects to attach to the container's filesystem and stdin/stdout. 
 Keys starting with a forward slash indicate objects mounted in the container's filesystem. 
 Other keys are given special meanings here.| 
 We use "stdin" instead of "/dev/stdin" because literally replacing /dev/stdin with a file would have a confusing effect on many unix programs. The stdin feature only affects the standard input of the first process started in the container; after that, the usual rules apply.| 
 <pre>{ 
  "/input/foo":{ 
   "kind":"collection", 
   "portable_data_hash":"d41d8cd98f00b204e9800998ecf8427e+0" 
  }, 
  "stdin":{ 
   "kind":"collection_file", 
   "uuid":"zzzzz-4zz18-yyyyyyyyyyyyyyy", 
   "path":"/foo.txt" 
  }, 
  "stdout":{ 
   "kind":"regular_file", 
   "path":"/tmp/a.out" 
  } 
 }</pre>| 
 | 
 |runtime_constraints|hash|Restrict the container's job's access to compute resources and the outside world (in addition to its explicitly stated inputs and output). 
 -- Each key is the name of a capability, like "internet" or "API" or "clock". The corresponding value is @true@ (the capability must be available in the container's job's runtime environment) or @false@ (must not) or a value or an array of two numbers indicating an inclusive range. If a key is omitted, availability of the corresponding capability is acceptable but not necessary.|This is a generalized version of "enforce purity restrictions": it is not a claim that the container job will be pure. Rather, it helps us control and track runtime restrictions, which can be helpful when reasoning about whether a given container job was pure. 
 -- In the most basic implementation, no capabilities are defined, and the only acceptable value of this attribute is the empty hash. 
 (TC)Should this structure be extensible like mounts?| 
 <pre> 
 { 
   "ram":12000000000, 
   "vcpus":[1,null] 
 }</pre>| 
 | 
 |container_image|string|Docker image repository and tag, docker image hash, collection UUID, or collection PDH.||| 
 | 
 |environment|hash|environment variables and values that should be set in the container environment (@docker run --env@). This augments and (when conflicts exists) overrides environment variables given in the image's Dockerfile.||| 
 | 
 |cwd|string|initial working directory, given as an absolute path (in the container) or a path relative to the WORKDIR given in the image's Dockerfile. The default is @"."@.||<pre>"/tmp"</pre>| 
 | 
 |command|array of strings|Command to execute in the container. Default is the CMD given in the image's Dockerfile.| 
 To use a UNIX pipeline, like "echo foo &#124; tr f b", or to interpolate environment variables, make sure your container image has a shell, and use a command like @["sh","-c","echo $PATH &#124; wc"]@.|| 
 | 
 |output_path|string|Path to a directory or file inside the container that should be preserved as container's job's output when it finishes.|This path _must_ be, or be inside, one of the mount targets. 
 For best performance, point output_path to a writable collection mount.|| 
 | 
 |priority|number|Higher number means spend more resources (e.g., go ahead of other queued containers, jobs, bring up more nodes). 
 -- Zero means a container job should not be run on behalf of this request. (Clients are expected to submit ContainerRequests JobRequests with zero priority in order to prevew the container job that will be used to satisfy it.) 
 -- Priority is null if and only if @state="Uncommitted"@.|| 
 null 
 @0@ 
 @1000.5@ 
 @-1@| 
 | 
 |expires_at|datetime|After this time, priority is considered to be zero. If the assigned container job is running at that time, the container job _may_ be cancelled to conserve resources.|| 
 null 
 @2015-07-01T00:00:01Z@| 
 | 
 |filters|array|Additional constraints for satisfying the request, given in the same form as the @filters@ parameter accepted by the @containers.list@ @jobs.list@ API.|| 
 @["created_at","<","2015-07-01T00:00:01Z"]@| 
 | 

 h2. "Container" "Job" schema 

 |Attribute|Type|Description|Discussion|Examples| 
 | 
 |uuid, owner_uuid, created_at, modified_at, modified_by_client_uuid,    modified_by_user_uuid|string|Usual Arvados model attributes||| 
 | 
 |state|string||| 
 <pre> 
 "Queued" 
 "Running" 
 "Cancelled" 
 "Failed" 
 "Complete" 
 </pre>| 
 | 
 |started_at, finished_at, log||Same as current container||| job||| 
 | 
 |environment|hash|Must be equal to a ContainerRequest's JobRequest's environment in order to satisfy the ContainerRequest.|(TC)We JobRequest.|(TC)We could offer a "resolve" process here like we do with mounts: e.g., hash values in the ContainerRequest JobRequest environment could be resolved according to the given "kind". I propose we leave room for this feature but don't add it yet.|| 
 | 
 |cwd, command, output_path|string|Must be equal to the corresponding values in a ContainerRequest JobRequest in order to satisfy that ContainerRequest.||| JobRequest.||| 
 | 
 |mounts|hash|Must contain the same keys as the ContainerRequest JobRequest being satisfied. Each value must be within the range of values described in the ContainerRequest JobRequest _at the time the Container Job is assigned to the ContainerRequest._||| JobRequest._||| 
 | 
 |runtime_constraints|hash|Compute resources, and access to the outside world, that are/were available to the container. job. 
 -- Generally this will contain additional keys that are not present in any corresponding ContainerRequests: JobRequests: for example, even if no ContainerRequests JobRequests specified constraints on the number of CPU cores, the number of cores actually used will be recorded here.| 
 Permission/access types will change over time and it may be hard/impossible to translate old types to new. Such cases may cause old Containers Jobs to be inelegible for assignment to new ContainerRequests. JobRequests. 
 -- (TC)Is it permissible for this to gain keys over time? For example, a container job scheduler might not be able to predict how many CPU cores will be available until the container job starts.|| 
 | 
 |output|string|Portable data hash of the output collection.||| 
 | 
 |-pure-|-boolean-|-The container's job's output is thought to be dependent solely on its inputs, i.e., it is expected to produce identical output if repeated.-| 
 We want a feature along these lines, but "pure" seems to be a conclusion we can come to after examining various facts -- rather than a property of an individual container job execution event -- and it probably needs something more subtle than a boolean.|| 
 | 
 |container_image|string|Portable data hash of a collection containing the docker image used to run the container.|(TC) job.|(TC) *If* docker image hashes can be verified efficiently, we can use the native docker image hash here instead of a collection PDH.|| 
 | 
 |progress|number|A number between 0.0 and 1.0 describing the fraction of work done.| 
 If a container job submits containers jobs of its own, it should update its own progress as the child containers jobs progress/finish.|| 
 | 
 |priority|number|Priority assigned by the system, taking into account the priorities of all associated ContainerRequests.||| JobRequests.||| 

 h2. Mount types 

 The "mounts" hash is the primary mechanism for adding data to the container at runtime (beyond what is already in the container image). 

 Each value of the "mounts" hash is itself a hash, whose "kind" key determines the handler used to attach data to the container. 

 |Mount type|@kind@|Expected keys|Description|Examples|Discussion| 
 | 
 |Arvados data collection|@collection@| 
 @"portable_data_hash"@ _or_ @"uuid"@ _may_ be provided. If not provided, a new collection will be created. This is useful when @"writable":true@ and the container's job's @output_path@ is (or is a subdirectory of) this mount target. 
 @"writable"@ may be provided with a @true@ or @false@ to indicate the path must (or must not) be writable. If not specified, the system can choose. 
 @"path"@ may be provided, and defaults to @"/"@.| 
 At container job startup, the target path will have the same directory structure as the given path within the collection. Even if the files/directories are writable in the container, modifications will _not_ be saved back to the original collections when the container job ends.| 
 <pre> 
 { 
  "kind":"collection", 
  "uuid":"...", 
  "path":"/foo.txt" 
 } 

 { 
  "kind":"collection", 
  "uuid":"..." 
 } 
 </pre>|| 
 | 
 |Git tree|@git_tree@| 
 One of {@"git-url"@, @"repository_name"@, @"uuid"@} must be provided. 
 One of {@"commit"@, @"revisions"@} must be provided. 
 "path" may be provided. The default path is "/".| 
 At container job startup, the target path will have the source tree indicated by the given revision. The @.git@ metadata directory _will not_ be available: typically the system will use @git-archive@ rather than @git-checkout@ to prepare the target directory. 
 -- If a value is given for @"revisions"@, it will be resolved to a set of commits (as desribed in the "ranges" section of git-revisions(1)) and the container job request will be satisfiable by any commit in that set. 
 -- If a value is given for @"commit"@, it will be resolved to a single commit, and the tree resulting from that commit will be used. 
 -- @"path"@ can be used to select a subdirectory or a single file from the tree indicated by the selected commit. 
 -- Multiple commits can resolve to the same tree: for example, the file/directory given in @"path"@ might not have changed between commits A and B. 
 -- The resolved mount (found in the Container Job record) will have only the "kind" key and a "blob" or "tree" key indicating the 40-character hash of the git tree/blob used.| 
 <pre> 
 { 
  "kind":"git_tree", 
  "uuid":"zzzzz-s0uqq-xxxxxxxxxxxxxxx", 
  "commit":"master" 
 } 

 { 
  "kind":"git_tree", 
  "uuid":"zzzzz-s0uqq-xxxxxxxxxxxxxxx", 
  "commit_range":"bugfix^..master", 
  "path":"/crunch_scripts/grep" 
 } 
 </pre>|| 
 | 
 |Temporary directory|@tmp@| 
 @"capacity"@: capacity (in bytes) of the storage device| 
 At container job startup, the target path will be empty. When the container job finishes, the content will be discarded. This will be backed by a memory-based filesystem where possible.| 
 <pre> 
 { 
  "kind":"tmp", 
  "capacity":10000000000 
 } 
 </pre>|| 
 | 
 |Keep|@keep@| 
 Expose all readable collections via arv-mount.|Requires suitable runtime constraints.| 
 <pre> 
 { 
  "kind":"keep" 
 } 
 </pre>|| 
 | 


 h2. Permissions 

 Users own ContainerRequests JobRequests but the system owns Containers. Jobs.    Users get permission to read Containers Jobs by virtue of linked ContainerRequests. JobRequests. 

 h2. API methods 

 Changes from the usual REST APIs: 

 h3. arvados.v1.container_requests.create arvados.v1.job_requests.create and .update 

 These methods can fail when objects referenced in the "mounts" hash do not exist, or the acting user has insufficient permission on them. 

 If @state="Uncommitted"@: 
 * has null @priority@. 
 * can have its @container_uuid@ @job_uuid@ reset to null by a client. 
 * can have its @container_uuid@ @job_uuid@ set to a non-null value by a system process. 

 If @state="Committed"@: 
 * has non-null @priority@. 
 * cannot be modified, except that its @priority@ can be changed to another non-null value. 

 h3. arvados.v1.container_requests.cancel arvados.v1.job_requests.cancel 

 Set @priority@ to zero. 

 h3. arvados.v1.container_requests.satisfy arvados.v1.job_requests.satisfy 

 Find or create a suitable container, job, and update @container_uuid@. @job_uuid@. 

 Return an error if @container_uuid@ @job_uuid@ is not null. 

 Q: Can this be requested during create? Create+satisfy is a common operation so having a way to do it in a single API call might be a worthwhile convenience. 

 Q: Better name? 

 h3. arvados.v1.containers.create arvados.v1.jobs.create and .update 

 These methods are not callable except by system processes. 

 h3. arvados.v1.containers.progress arvados.v1.jobs.progress 

 This method permits specific types of updates while a container job is running: update progress, record success/failure. 

 Q: [How] can a client submitting container job B indicate it shouldn't run unless/until container job A succeeds? 

 h2. Debugging 

 Q: Need any infrastructure debug-logging controls in this API? 

 Q: Need any container job debug-logging controls in this API? Or just use environment vars? 

 h2. Scheduling and running containers jobs 

 Q: When/how should we implement a hooks for futures/promises: e.g., "run container job Y when containers jobs X0, X1, and X2 have finished"? 

 h2. Accounting 

 A complete design for resource accounting and quota is out of scope here, but we do assert here that the API makes it feasible to retain accounting data. 

 It should be possible to retrieve, for a given container, job, a complete set of resource allocation intervals, each one including: 
 * interval start time 
 * interval end time (presented as null or now if the interval hasn't ended yet) 
 * user uuid 
 * container job request id 
 * container job request priority 
 * container job state