Project

General

Profile

Containers API » History » Version 9

Tom Clegg, 06/05/2015 07:05 PM

1 9 Tom Clegg
{{>TOC}}
2
3 2 Tom Clegg
h1. Jobs API (DRAFT)
4 1 Tom Clegg
5 6 Tom Clegg
Clients control JobRequests. The system controls Jobs, and assigns them to JobRequests. When the system has assigned a Job to a JobRequest, anyone with permission to read the JobRequest also has permission to read the Job.
6 1 Tom Clegg
7 9 Tom Clegg
A JobRequest describes job _constraints_ which can have different interpretations over time. For example, a JobRequest with a @{"kind":"git_tree","commit_range":"abc123..master",...}@ mount might be satisfiable by any of several different source trees, and this set of satisfying source trees can change when the repository's "master" branch is updated.
8 6 Tom Clegg
9 9 Tom Clegg
A Job is an unambiguously specified process. Git trees, data collections, docker images, etc. are specified using content addresses. A Job serves as a statement of exactly _what computation will be attempted_ and, later, a record of _what computation was done_.
10 1 Tom Clegg
11 9 Tom Clegg
h2. Use cases
12
13
h3. Preview
14
15
Tell me how you would satisfy job request X. Which pdh/commits would be used? Is the satisfying job already started? finished?
16
17
h3. Submit a previewed existing job
18
19
I'm happy with the already-running/finished job you showed me in "preview". Give me access to that job, its logs, and [when it finishes] its output.
20
21
h3. Submit a previewed new job
22
23
I'm happy with the new job the "preview" response proposed to run. Run that job.
24
25
h3. Submit a new job (disable reuse)
26
27
I don't want to use an already-running/finished job. Run a new job that satisfies my job request.
28
29
h3. Submit a new duplicate job (disable reuse)
30
31
I'm happy with the already-running/finished job you showed me in "preview". Run a new job exactly like that one.
32
33
h3. Select a job and associate it with my JobRequest
34
35
I'm not happy with the job you chose, but I know of another job that satisfies my request. Assuming I'm right about that, attach my JobRequest to the existing job of my choice.
36
37
h3. Just do the right thing without a preview
38
39
Satisfy job request X one way or another, and tell me the resulting job's UUID.
40
41 6 Tom Clegg
h2. JobRequest/Job life cycle
42
43
Illustrating job re-use and preview facility:
44
# Client CA creates a JobRequest JRA with priority=0.
45
# Server creates job JX and assigns JX to JRA, but does not try to run JX yet because max(priority)=0.
46
# Client CA presents JX to the user. "We haven't computed this result yet, so we'll have to run a new job. Is this OK?"
47
# Client CB creates a JobRequest JRB with priority=1.
48
# Server assigns JX to JRB and puts JX in the execution queue with priority=1.
49
# Client CA updates JRA with priority=2.
50
# Server updates JX with priority=2.
51
# Job JX starts.
52
# Client CA updates JRA with priority=0. (This is the "cancel" operation.)
53
# Server updates JX with priority=1. (JRB still wants this job to complete.)
54
# Job JX finishes.
55 1 Tom Clegg
# Clients CA and CB have permission to read JX (ever since JX was assigned to their respective JobRequests) as well as its progress indicators, output, and log.
56 5 Tom Clegg
57 1 Tom Clegg
h2. "JobRequest" schema
58
59
|Attribute|Type|Description|Discussion|Examples|
60 6 Tom Clegg
|uuid, owner_uuid, modified_by_client_uuid,  modified_by_user_uuid|string|Usual Arvados model attributes|||
61
|
62
|created_at, modified_at|datetime|Usual Arvados model attributes|||
63
|
64
|name|string|Unparsed|||
65
|
66
|description|text|Unparsed|||
67
|
68
|job_uuid|uuid|The job that satisfies this job request.|
69 8 Tom Clegg
Can be null if a suitable job has not yet been found or queued.
70
Assigned by the system: cannot be modified directly by clients.
71
If null, it can be changed by the system at any time.
72
If not null, it can be reset to null by a client _if priority is zero_.||
73
|
74
|mounts|hash|Objects to attach to the container's filesystem and stdin/stdout.
75
Keys starting with a forward slash indicate objects mounted in the container's filesystem.
76
Other keys are given special meanings here.|
77
We use "stdin" instead of "/dev/stdin" because literally replacing /dev/stdin with a file would have a confusing effect on many unix programs. The stdin feature only affects the standard input of the first process started in the container; after that, the usual rules apply.|
78 1 Tom Clegg
<pre>{
79
 "/input/foo":{
80
  "kind":"collection",
81
  "portable_data_hash":"d41d8cd98f00b204e9800998ecf8427e+0"
82
 },
83
 "stdin":{
84 9 Tom Clegg
  "kind":"collection_file",
85 1 Tom Clegg
  "uuid":"zzzzz-4zz18-yyyyyyyyyyyyyyy",
86
  "path":"/foo.txt"
87
 },
88 8 Tom Clegg
 "stdout":{
89 9 Tom Clegg
  "kind":"regular_file",
90
  "path":"/tmp/a.out"
91 1 Tom Clegg
 }
92
}</pre>|
93 8 Tom Clegg
|
94 1 Tom Clegg
|runtime_permissions|hash|Restrict the job's access to the outside world (apart from its explicitly stated inputs and output).
95 9 Tom Clegg
Each key is the name of a capability, like "internet" or "API" or "clock". The corresponding value is @true@ (the capability must be available in the job's runtime environment) or @false@ (must not). If a key is omitted, availability of the corresponding capability is acceptable but not necessary.|This is a generalized version of "enforce purity restrictions": it is not a claim that the job will be pure. Rather, it helps us control and track runtime restrictions, which can be helpful when reasoning about whether a given job was pure.
96 1 Tom Clegg
In the most basic implementation, no capabilities are defined, and the only acceptable value of this attribute is the empty hash.
97 9 Tom Clegg
(TC)This name isn't great, and conflicts with the "readable/writable" kind of permissions. Perhaps something along the lines of capabilities or interfaces?
98
(TC)Is this the same type of feature as requesting memory/disk/cores? Or are those resources assumed not to affect reproducibility?|
99 1 Tom Clegg
<pre>{}</pre>|
100
|
101 9 Tom Clegg
|docker_image|string|Docker image repository and tag, docker image hash, collection UUID, or collection PDH.|(TC)Could this be just another mount point, with target "docker_image"?||
102 1 Tom Clegg
|
103
|environment|hash|environment variables and values that should be set in the container environment (docker run --env). This augments and (when conflicts exists) overrides environment variables given in the image's Dockerfile.|||
104
|
105
|cwd|string|initial working directory, given as an absolute path (in the container) or a path relative to the WORKDIR given in the image's Dockerfile. The default is @"."@.||<pre>"/tmp"</pre>|
106
|
107 8 Tom Clegg
|command|array of strings|Command to execute in the container. Default is the CMD given in the image's Dockerfile.|
108
(TC)Possible to specify a pipe, like "echo foo &#124; tr f b"? Any shell variables supported? Or do you just use @["sh","-c","echo $PATH &#124; wc"]@ if you want a shell?||
109
|
110 9 Tom Clegg
|output_path|string|Path to a directory or file inside the container that should be preserved as job's output when it finishes.|For best performance, point output_path to a writable collection mount.||
111
|
112 1 Tom Clegg
|priority|number|Higher number means spend more resources (e.g., go ahead of other queued jobs, bring up more nodes).
113 9 Tom Clegg
Zero means a job should not be run on behalf of this request. (Clients are expected to submit JobRequests with zero priority in order to prevew the job that will be used to satisfy it.)||@0@, @1000.5@, @-1@|
114
|
115
|expires_at|datetime|After this time, priority is considered to be zero. If the assigned job is running at that time, the job _may_ be cancelled to conserve resources.||@null@, @2015-07-01T00:00:01Z@|
116 1 Tom Clegg
117
118
h2. "Job" schema
119
120
|Attribute|Type|Description|Discussion|Examples|
121
|
122 9 Tom Clegg
|uuid, owner_uuid, created_at, modified_at, modified_by_client_uuid,  modified_by_user_uuid|string|Usual Arvados model attributes|||
123 1 Tom Clegg
|
124 9 Tom Clegg
|state|string|||
125
<pre>
126
"Queued"
127
"Running"
128
"Cancelled"
129
"Failed"
130
"Complete"
131
</pre>|
132 8 Tom Clegg
|
133 9 Tom Clegg
|started_at, finished_at, log||Same as current job|||
134 8 Tom Clegg
|
135 9 Tom Clegg
|environment|hash|Must be equal to a JobRequest's environment in order to satisfy the JobRequest.|(TC)We could offer a "resolve" process here like we do with mounts: e.g., hash values in the JobRequest environment could be resolved according to the given "kind". I propose we leave room for this feature but don't add it yet.||
136
|
137
|cwd, command, output_path|string|Must be equal to the corresponding values in a JobRequest in order to satisfy that JobRequest.|||
138
|
139
|mounts|hash|Must contain the same keys as the JobRequest being satisfied. Each value must be within the range of values described in the JobRequest _at the time the Job is assigned to the JobRequest._|||
140
|
141
|runtime_permissions|hash|The types of access to the outside world (apart from its explicitly stated inputs and output) available to the job when it runs/ran.|
142
Permission/access types will change over time and it may be hard/impossible to translate old types to new. Such cases may cause old Jobs to be inelegible for assignment to new JobRequests.||
143
|
144
|output|string|Portable data hash of the output collection.|||
145
|
146
|-pure-|-boolean-|-The job's output is thought to be dependent solely on its inputs, i.e., it is expected to produce identical output if repeated.-|
147
We want a feature along these lines, but "pure" seems to be a conclusion we can come to after examining various facts -- rather than a property of an individual job execution event -- and it probably needs something more subtle than a boolean.||
148
|
149 1 Tom Clegg
|docker_image_pdh|string|Portable data hash of a collection containing the docker image used to run the job.|(TC) *If* docker image hashes can be verified efficiently, we can use the native docker image hash here instead of a collection PDH.||
150 8 Tom Clegg
|
151
|progress|number|A number between 0.0 and 1.0 describing the fraction of work done.|
152
(TC)How does this relate to child tasks? E.g., is a job supposed to update this itself as its child tasks complete?||
153
|
154
|priority|number|Highest priority of all associated JobRequests|||
155
156
h2. Mount types
157
158
The "mounts" hash is the primary mechanism for adding data to the container at runtime (beyond what is already in the container image).
159
160
Each value of the "mounts" hash is itself a hash, whose "kind" key determines the handler used to attach data to the container.
161
162
|Mount type|@kind@|Expected keys|Description|Examples|Discussion|
163
|
164 9 Tom Clegg
|Arvados data collection|@collection@|
165
@"portable_data_hash"@ _or_ @"uuid"@ _may_ be provided. If not provided, a new collection will be created. This is useful when @"writable":true@ and the job's @output_path@ is (or is a subdirectory of) this mount target.
166
@"writable"@ may be provided with a @true@ or @false@ to indicate the path must (or must not) be writable. If not specified, the system can choose.
167
@"path"@ may be provided, and defaults to @"/"@.|
168
At job startup, the target path will have the same directory structure as the given path within the collection. Even if the files/directories are writable in the container, modifications will _not_ be saved back to the original collections when the job ends.|
169 8 Tom Clegg
<pre>
170
{
171
 "kind":"collection",
172 9 Tom Clegg
 "uuid":"...",
173
 "path":"/foo.txt"
174 8 Tom Clegg
}
175 9 Tom Clegg
176 8 Tom Clegg
{
177 9 Tom Clegg
 "kind":"collection",
178 8 Tom Clegg
 "uuid":"..."
179
}
180
</pre>|
181
|
182
|Git tree|@git_tree@|
183
One of {@"git-url"@, @"repository_name"@, @"uuid"@} must be provided.
184 9 Tom Clegg
One of {@"commit"@, @"revisions"@} must be provided.
185
"path" may be provided. The default path is "/".|
186 8 Tom Clegg
At job startup, the target path will have the source tree indicated by the given revision. The @.git@ metadata directory _will not_ be available: typically the system will use @git-archive@ rather than @git-checkout@ to prepare the target directory.
187
If a value is given for @"revisions"@, it will be resolved to a set of commits (as desribed in the "ranges" section of git-revisions(1)) and the job request will be satisfiable by any commit in that set.
188
If a value is given for @"commit"@, it will be resolved to a single commit, and the tree resulting from that commit will be used.
189 9 Tom Clegg
@"path"@ can be used to select a subdirectory or a single file from the tree indicated by the selected commit.
190
Note that multiple commits can resolve to the same tree: for example, the file/directory given in @"path"@ might not have changed between commits A and B.|
191 8 Tom Clegg
<pre>
192
{
193
 "kind":"git_tree",
194
 "uuid":"zzzzz-s0uqq-xxxxxxxxxxxxxxx",
195
 "commit":"master"
196
}
197
198
{
199
 "kind":"git_tree",
200
 "uuid":"zzzzz-s0uqq-xxxxxxxxxxxxxxx",
201 9 Tom Clegg
 "commit_range":"bugfix^..master",
202
 "path":"/crunch_scripts/grep"
203 8 Tom Clegg
}
204 9 Tom Clegg
</pre>|The resolved mount (found in the Job record) will have only the "kind" key and a "blob" or "tree" key indicating the 40-character hash of the git tree/blob used.|
205 8 Tom Clegg
|
206 5 Tom Clegg
|Temporary directory|@tmp@|
207
None|
208
At job startup, the target path will be empty. When the job finishes, the content will be discarded. This will be backed by a memory-based filesystem where possible.|
209
<pre>
210 8 Tom Clegg
{
211 5 Tom Clegg
 "kind":"tmp",
212 8 Tom Clegg
}
213 4 Peter Amstutz
</pre>|
214 8 Tom Clegg
(TC)Should add a "max size" feature, to help memfs-backed implementations.|
215 4 Peter Amstutz
|
216 8 Tom Clegg
217 5 Tom Clegg
218 8 Tom Clegg
h2. Permissions
219 5 Tom Clegg
220 8 Tom Clegg
Users own JobRequests but the system owns Jobs.  Users get permission to read Jobs by virtue of linked JobRequests.
221
222
h2. API methods
223
224 2 Tom Clegg
Changes from the usual REST APIs:
225 8 Tom Clegg
226
h3. arvados.v1.job_requests.create and .update
227
228
These methods can fail when objects referenced in the "mounts" hash do not exist, or the acting user has insufficient permission on them.
229 2 Tom Clegg
230 8 Tom Clegg
h3. arvados.v1.job_requests.update
231
232
The @job_uuid@ attribute is special:
233
* It cannot be changed from null to non-null by a regular client.
234 2 Tom Clegg
* It cannot be changed from non-null to null by system processes.
235 8 Tom Clegg
* It _can_ be reset from non-null to null by the system _during a client-initiated update transaction that modifies attributes other than @state@ and @priority@._
236 2 Tom Clegg
237 8 Tom Clegg
Apart from @job_uuid@, updates are restricted by the current @state@ of the job request.
238
* When @state="Preview"@, all attributes can be updated.
239
* When @state="Request"@, only @priority@ and @state@ can be updated.
240
* When @state="Done"@, no attributes can be updated.
241
242
@state@ cannot be null. The following state transitions are the only ones permitted.
243
* Preview &rarr; Request
244
* Preview &rarr; Done
245
* Request &rarr; Done
246
247
h3. arvados.v1.jobs.create and .update
248
249
These methods are not callable except by system processes.
250 1 Tom Clegg
251
h3. arvados.v1.jobs.progress
252
253
This method permits specific types of updates while a job is running: update progress, record success/failure.
254
255
Q: [How] can a client submitting job B indicate it shouldn't run unless/until job A succeeds?
256
257
h2. Debugging
258
259
Q: Need any infrastructure debug-logging controls in this API?
260
261
Q: Need any job debug-logging controls in this API? Or just use environment vars?
262
263
h2. Scheduling and running jobs
264
265
Q: If two users submit identical pure jobs and ask to reuse existing jobs, whose token does the job get to use?
266
* Should pure jobs be run as a pseudo-user that is given read access to the relevant objects for the duration of the job? (This would make it safer to share jobs -- see #5823)
267
268
Q: If two users submit identical pure jobs with different priority, which priority is used?
269
* Choices include "whichever is greater" and "sum".
270
271
Q: If two users submit identical pure jobs and one cancels -- or one user submits two identical jobs and cancels one -- does the work stop, or continue? What do the job records look like after this?