Project

General

Profile

Containers API » History » Version 4

Peter Amstutz, 05/07/2015 03:04 PM

1 2 Tom Clegg
h1. Jobs API (DRAFT)
2 1 Tom Clegg
3 2 Tom Clegg
h2. "Job" schema
4 1 Tom Clegg
5
|Attribute|Type|Description|Discussion|Examples|
6
|uuid, owner_uuid, modified_by_client_uuid,  modified_by_user_uuid, created_at, modified_at||Standard fields|||
7
|
8
|name, description||User-friendly information about the job|(TC)Does "user friendly" just mean "user controlled", or is Arvados expected to do something here?||
9
|
10
|state, started_at, finished_at, log||Same as current job|||
11
|
12
|created_by_job_uuid|uuid|The job that spawned this job, or null if it is a root job initiated by a user.|||
13
|
14
|input_object|hash|functionally the same as script_parameters|
15
(TC)Why _object, not just input?|{"input":"d41d8cd98f00b204e9800998ecf8427e+0"}|
16
|
17
|output_object|hash|output of the job (jobs are no longer required to write to Keep, could also have several fields for multiple output collections.)|
18
(PA)Changing the basic output type from a collection to a JSON object is important for native CWL support.
19
(TC)Need examples of how "output is one collection", "output is multiple collections", "output is collections plus other stuff(?)", and "output is other stuff without collections" are to be encoded.
20
(TC)Ditto re _object||
21
|
22
|pure|boolean|Whether this job can be reused (== "nondeterministic" ref #3555)|
23
(TC)"Can be reused" can only be judged by the reuser, not the job itself. If the field is called "pure" it should mean "pure", i.e., output depends only on inputs, not randomness or external state.
24
(TC)Is this merely an assertion by the submitter? Is the job itself expected to set or reset it? Does the system behave differently while running the job (e.g., different firewall rules, some APIs disabled)? [Under what conditions] is the system allowed to change it from true to false? Is null allowed, presumably signifying "not known"?|@null@ (?)
25
@true@
26
@false@|
27
|
28
|git_repository, git_commit, resolved_git_commit|string|Basically same as before, except that the user supplies "git_commit" and the API server fills in "resolved_git_commit" to the full SHA1 hash instead of rewriting the user-supplied field.|
29
(TC)Perhaps we should take the opportunity to support these semantics on multiple git repositories per job (#3820).
30
(TC)Not keen on resolved_git_commit. Prefer more git-like language, like git_commit_sha1.
31
(TC)If git_commit is git_commit_range, the original "which versions are acceptable" constraint wouldn't be lost, and it would be possible to change git_commit_sha1 while a job is queued in order to increase reuse.||
32
|
33
|docker_image, resolved_docker_image|string|Similar to git, the user supplies docker_image and the API server resolves that to resolved_docker_image.  Also this ought to be the Docker image hash, not the collection PDH.|
34
(TC)We can use a docker image hash only if we can safely verify docker image hashes. Otherwise, renaming a new docker image to {old-hash}.tar breaks reproducibility.||
35
|
36
|git_checkout_dir, temp_dir, output_dir, keep_dir|string|Desired paths *inside the docker container* where git checkout, temporary directory, output directories and keep mount should go.|
37
(TC)What are the defaults? This flexibility seems useful for a job that submits other jobs (like a workflow/pipeline runner) but would be cumbersome to specify every time ("remind me, where does workflow runner X expect its keep mount to be?).
38
(TC)What is the significance of output_dir? [How] does Crunch merge the content of the @output_dir@ and the value of the @output@ attribute to arrive at the final output?||
39
|
40
|stdin|string|A file in Keep that should sent to standard input.|
41
(TC)Is this required to be a regular file or can it be a pipe?
42
(TC)If the job does not finish reading it, is that an error, like @set -o pipefail@ in bash?|@{pdh}/foo.txt@|
43
|
44
|stdout|string|A filename in the output directory to which standard output should be directed.|(TC)If this is not given, is stdout sent to stderr/logs as it is now?||
45
|
46
|environment|hash|environment variables and values that should be set in the container environment (docker run --env)|
47
(TC)If this contains variables already used by Crunch (TASK_KEEPMOUNT), which has precedence?||
48
|
49
|initial_collection|uuid|A collection describing the starting contents of the output directory.|
50
(TC)Not a fan of this attribute name.
51
(TC)Is it an error if this collection is not one of the inputs? Or do all provenance queries need to treat this separately?
52
(TC)Perhaps better if each @input@ item were available at @{job_workdir}/input/{inputkey}@ and the "preload" behavior could be achieved by setting @output_dir@ to @input/foo@?||
53
|
54
|cwd|string|initial working directory, given as an absolute path (in the container) or relative to {job_workdir}. Default "output".||/tmp
55
output
56
input/foo|
57
|
58
|command|array of strings|parameters to the actual executable command line.|
59
(TC)Possible to specify a pipe, like "echo foo | tr f b"? Any shell variables supported? Or do you just use @["sh","-c","echo $PATH | wc"]@ if you want a shell?||
60
|
61 2 Tom Clegg
|progress|number|A number between 0.0 and 1.0 describing the fraction of work done.|
62 1 Tom Clegg
(TC)How does this relate to child tasks? E.g., is a job supposed to update this itself as its child tasks complete?||
63
|
64
|runtime_debugging|boolean|Enable debug logging for the infrastructure (such as arv-mount) (this might get logged privately away from the end user)|
65
(TC)This doesn't sound like it should be a job attribute. Infrastructure debugging shouldn't require touching users' job records. An analogous user feature would be useful, but perhaps it just boils down to adding DEBUG=1 to @environment@?||
66 2 Tom Clegg
|
67
|priority|number|Higher number means spend more resources (e.g., go ahead of other queued jobs, bring up more nodes)|(TC)Do we need something more subtle than a single number?
68
(TC)What if a high priority job is waiting for a low priority job to finish?|@0@, @1000.5@, @-1@|
69
70
Q: When two identical pure jobs were submitted with reuse enabled, and only one runs, how do the two job records differ?
71
* (TC)I'm assuming this has to result in two job records, not one: otherwise fields like name, description, and priority will be confusing.
72
73 4 Peter Amstutz
h2. Separate job requests and work items (proposal) (PA)
74
75
Propose separating concepts of "job request" and "job work" into separate object types:
76
** Job request is created by the user with input object, command, git_commit, docker_image, name, description, etc
77
** Job work represents an actual unit of work that is queued, running, complete, failed, etc and has the git_commit hash, docker_image_hash, output_object etc that describe the actual job.
78
** The job request is fulfilled by linking it to a job work item.  A single job work item may be used to fulfill multiple job requests.
79
* "When two identical pure jobs were submitted with reuse enabled, and only one runs, how do the two job records differ?"
80
** You would have two job_request objects (with separate name, description, priority) that both link to the same job_work object.
81
82
Users would own job request records but not the actual job work items.  Access to a job work item flows only through access to a job request that links to the work items.
83 2 Tom Clegg
84
h2. "jobs" API methods
85
86
Reuse and reproducibility require some changes to the usual REST APIs.
87
88
h3. arvados.v1.jobs.create
89
90
Q: How does "find or create" work?
91
92
Q: How does a client submitting job B indicate it shouldn't run unless/until job A succeeds?
93
94
h3. arvados.v1.jobs.update
95
96
Most attributes cannot be changed after a job starts. Some attributes _can_ be changed:
97
* name, description, priority
98
* output, progress, state, finished_at, log (ideally only by the job itself - should this be enforced?)
99
* modified_*
100
* Q: (any more?)
101
102
h3. arvados.v1.jobs.get
103
104
Q: Should this omit mutable attributes when retrieved by a pure job? (Ideally, pure jobs should not be able to retrieve data other than their stated immutable / content-addressed inputs, either through Keep or through the API.)
105 3 Tom Clegg
106
h2. Scheduling and running jobs
107
108
Q: If two users submit identical pure jobs and ask to reuse existing jobs, whose token does the job get to use?
109
* Should pure jobs be run as a pseudo-user that is given read access to the relevant objects for the duration of the job? (This would make it safer to share jobs -- see #5823)
110
111
Q: If two users submit identical pure jobs with different priority, which priority is used?
112
* Choices include "whichever is greater" and "sum".
113
114
Q: If two users submit identical pure jobs and one cancels -- or one user submits two identical jobs and cancels one -- does the work stop, or continue? What do the job records look like after this?