Project

General

Profile

Containers API » History » Version 6

Tom Clegg, 06/02/2015 01:55 PM

1 2 Tom Clegg
h1. Jobs API (DRAFT)
2 1 Tom Clegg
3 6 Tom Clegg
Clients control JobRequests. The system controls Jobs, and assigns them to JobRequests. When the system has assigned a Job to a JobRequest, anyone with permission to read the JobRequest also has permission to read the Job.
4
5
A JobRequest describes job _constraints_ which can have different interpretations over time. For example, a JobRequest with @{"git_revision":"abc123..master"}@ might be satisfiable by any of several different source trees, and this set of satisfying source trees changes whenever the repository's "master" branch is updated.
6
7
A Job is an unambiguously specified process. Git revisions, data collections, docker images, etc. are specified using content addresses. A Job serves as a statement of exactly _what computation will be attempted_ and, later, a record of _what computation was done_.
8
9
h2. JobRequest/Job life cycle
10
11
Illustrating job re-use and preview facility:
12
# Client CA creates a JobRequest JRA with priority=0.
13
# Server creates job JX and assigns JX to JRA, but does not try to run JX yet because max(priority)=0.
14
# Client CA presents JX to the user. "We haven't computed this result yet, so we'll have to run a new job. Is this OK?"
15
# Client CB creates a JobRequest JRB with priority=1.
16
# Server assigns JX to JRB and puts JX in the execution queue with priority=1.
17
# Client CA updates JRA with priority=2.
18
# Server updates JX with priority=2.
19
# Job JX starts.
20
# Client CA updates JRA with priority=0. (This is the "cancel" operation.)
21
# Server updates JX with priority=1. (JRB still wants this job to complete.)
22
# Job JX finishes.
23
# Clients CA and CB have permission to read JX (ever since JX was assigned to their respective JobRequests) as well as its progress indicators, output, and log.
24
25 1 Tom Clegg
h2. "JobRequest" schema
26 5 Tom Clegg
27 1 Tom Clegg
|Attribute|Type|Description|Discussion|Examples|
28 6 Tom Clegg
|uuid, owner_uuid, modified_by_client_uuid,  modified_by_user_uuid|string|Usual Arvados model attributes|||
29 1 Tom Clegg
|
30 6 Tom Clegg
|created_at, modified_at|datetime|Usual Arvados model attributes|||
31 1 Tom Clegg
|
32 6 Tom Clegg
|name|string|Unparsed|||
33 1 Tom Clegg
|
34 6 Tom Clegg
|description|text|Unparsed|||
35
|
36
|job_uuid|uuid|The job that satisfies this job request.|
37
Can be null if a suitable job has not yet been found or queued.
38
Assigned by the system: cannot be modified directly by clients.
39
If null, it can be changed by the system at any time.
40
If not null, it can be reset to null by a client _if priority is zero_.||
41
|
42
|input|hash|Hash of arbitrary keys and values.|Any collection UUID appearing here (as an array element or hash value) will be resolved to a PDH in order to find or create a Job record.
43
It is an error to refer to a collection here (by UUID or PDH) unless it exists and is readable by the submitting user.|<pre>{
44 1 Tom Clegg
 "foo":"d41d8cd98f00b204e9800998ecf8427e+0",
45 5 Tom Clegg
 "bar":123
46
}</pre>|
47 1 Tom Clegg
|
48 5 Tom Clegg
|pure|boolean|Process is thought to be pure (see below).|(TC)What do we do when given two JobRequests that are identical except that "pure" is different?||
49 1 Tom Clegg
|
50
|git_repository, git_revision|string|Set of git commits suitable for running the job.
51
git_revision can be either a commit or a range -- see @gitrevisions(1)@.|
52
(TC)Perhaps we should take the opportunity to support these semantics on multiple git repositories per job (#3820).||
53
|
54
|docker_image|string|Docker image repository and tag, docker image hash, collection UUID, or collection PDH.|||
55
|
56
|git_checkout_dir, temp_dir, output_dir, keep_dir|string|Desired paths *inside the docker container* where git checkout, temporary directory, output directories and keep mount should go.|
57
(TC)What are the defaults? This flexibility seems useful for a job that submits other jobs (like a workflow/pipeline runner) but would be cumbersome to specify every time ("remind me, where does workflow runner X expect its keep mount to be?).
58
(TC)What is the significance of output_dir? [How] does Crunch merge the content of the @output_dir@ and the value of the @output@ attribute to arrive at the final output?||
59
|
60 6 Tom Clegg
|stdin|string|A file in Keep that should sent to standard input.
61
Given as an absolute path (relative to the container filesystem root).
62
The process must not rely on stdin being a regular file (the system is not required to set up stdin so that it's seekable.)
63
This cannot be used to make additional inputs available to the process beyond those listed in the input hash.|
64
(TC)If given as a relative path, relative to where?
65
(TC)How does stdin refer to one of the inputs in the input hash?
66
(TC)If the job does not finish reading it, is that an error, like @set -o pipefail@ in bash?|
67
@/data/foo.txt@|
68 5 Tom Clegg
|
69 1 Tom Clegg
|stdout|string|A filename in the output directory to which standard output should be directed.|(TC)If this is not given, is stdout sent to stderr/logs as it is now?
70
(TC)Relationship between stdout and output is unclear. If I specify a "stdout" but the job process sets its output by itself, is Crunch expected to clobber that output with the collection resulting from the "stdout" mechanism?||
71
|
72
|environment|hash|environment variables and values that should be set in the container environment (docker run --env)|
73
(TC)If this contains variables already used by Crunch (TASK_KEEPMOUNT), which has precedence?||
74
|
75
|initial_collection|uuid|A collection describing the starting contents of the output directory.|
76
(TC)Not a fan of this attribute name.
77
(TC)Is it an error if this collection is not one of the inputs? Or do all provenance queries need to treat this separately?
78
(TC)Perhaps better if each @input@ item were available at @{job_workdir}/input/{inputkey}@ and the "preload" behavior could be achieved by setting @output_dir@ to @input/foo@?||
79
|
80
|cwd|string|initial working directory, given as an absolute path (in the container) or relative to {job_workdir}. Default "output".||/tmp
81
output
82
input/foo|
83
|
84
|command|array of strings|parameters to the actual executable command line.|
85
(TC)Possible to specify a pipe, like "echo foo &#124; tr f b"? Any shell variables supported? Or do you just use @["sh","-c","echo $PATH &#124; wc"]@ if you want a shell?||
86
|
87
|runtime_debugging|boolean|Enable debug logging for the infrastructure (such as arv-mount) (this might get logged privately away from the end user)|
88
(TC)This doesn't sound like it should be a job attribute. Infrastructure debugging shouldn't require touching users' job records. An analogous user feature would be useful, but perhaps it just boils down to adding DEBUG=1 to @environment@?||
89
|
90 6 Tom Clegg
|priority|number|Higher number means spend more resources (e.g., go ahead of other queued jobs, bring up more nodes).
91
Zero means a job should not be run. Clients are expected to submit JobRequests with zero priority in order to prevew the job that will be used to satisfy it.|(TC)Do we need something more subtle than a single number?
92 1 Tom Clegg
(TC)What if a high priority job is waiting for a low priority job to finish?|@0@, @1000.5@, @-1@|
93
94 5 Tom Clegg
95 1 Tom Clegg
h2. "Job" schema
96 5 Tom Clegg
97
|Attribute|Type|Description|Discussion|Examples|
98
|state, started_at, finished_at, log||Same as current job|||
99
|
100
|input, stdin, stdout, environment, initial_collection, cwd, command, runtime_debugging, git_checkout_dir, temp_dir, output_dir, keep_dir||Copied from the relevant JobRequest(s) and made available to the job process.|
101
||
102
|
103
|output|hash|Arbitrary hash provided by the job process.|
104
(PA)Changing the basic output type from a collection to a JSON object is important for native CWL support.
105
(TC)Need examples of how "output is one collection", "output is multiple collections", "output is collections plus other stuff(?)", and "output is other stuff without collections" are to be encoded.||
106
|
107
|pure|boolean|The job's output is thought to be dependent solely on its inputs (i.e., it is expected to produce identical output if repeated)|
108
(TC)Is this merely an assertion by the submitter? Is the job itself expected to set or reset it? Does the system behave differently while running the job (e.g., different firewall rules, some APIs disabled)? [Under what conditions] is the system allowed to change it from true to false? Is null allowed, presumably signifying "not known"?|@null@ (?)
109
@true@
110
@false@|
111
|
112
|git_commit_sha1|string|Full 40-character commit hash used to run the job.|(TC)Should we store the tree hash as well? Or _instead_ of the commit hash, if we prevent the job from seeing the git metadata, which would be good for reproducibility (consider a job that starts by doing "git checkout master" in its working directory).
113
(TC)Do we need to store git_repository here too? Presumably, the relevant git tree should be in the internal git repository as a prerequisite of Job creation. And if two repositories have the same commit/tree, it shouldn't matter which we pull it from when running the job.||
114
|docker_image_pdh|string|Portable data hash of a collection containing the docker image used to run the job.|(TC) *If* docker image hashes can be verified efficiently, we can use the native docker image hash here instead of a collection PDH.||
115
|
116
|progress|number|A number between 0.0 and 1.0 describing the fraction of work done.|
117
(TC)How does this relate to child tasks? E.g., is a job supposed to update this itself as its child tasks complete?||
118
|
119 4 Peter Amstutz
|priority|number|Highest priority of all associated JobRequests|||
120 5 Tom Clegg
121 4 Peter Amstutz
h2. Permissions
122 5 Tom Clegg
123
Users own JobRequests but the system owns Jobs.  Users get permission to read Jobs by virtue of linked JobRequests.
124 2 Tom Clegg
125 5 Tom Clegg
h2. "jobs" API methods
126
127 2 Tom Clegg
*TODO: bring this section up to speed with distinct JobRequest and Job records.*
128
129
Reuse and reproducibility require some changes to the usual REST APIs.
130
131
h3. arvados.v1.jobs.create
132
133
Q: How does "find or create" work?
134
135
Q: How does a client submitting job B indicate it shouldn't run unless/until job A succeeds?
136
137
h3. arvados.v1.jobs.update
138
139
Most attributes cannot be changed after a job starts. Some attributes _can_ be changed:
140
* name, description, priority
141
* output, progress, state, finished_at, log (ideally only by the job itself - should this be enforced?)
142
* modified_*
143
* Q: (any more?)
144 3 Tom Clegg
145
h3. arvados.v1.jobs.get
146
147
Q: Should this omit mutable attributes when retrieved by a pure job? (Ideally, pure jobs should not be able to retrieve data other than their stated immutable / content-addressed inputs, either through Keep or through the API.)
148
149
h2. Scheduling and running jobs
150
151
Q: If two users submit identical pure jobs and ask to reuse existing jobs, whose token does the job get to use?
152
* Should pure jobs be run as a pseudo-user that is given read access to the relevant objects for the duration of the job? (This would make it safer to share jobs -- see #5823)
153
154 1 Tom Clegg
Q: If two users submit identical pure jobs with different priority, which priority is used?
155
* Choices include "whichever is greater" and "sum".
156
157
Q: If two users submit identical pure jobs and one cancels -- or one user submits two identical jobs and cancels one -- does the work stop, or continue? What do the job records look like after this?