Idea #13219
closedAllow user to specify time limit for submitted jobs
Description
Allow a user to specify a maximum amount of run time which can be used by a job before it is cancelled.
Involves:
- Document "time_limit" in container_request/container scheduling_parameters (time in seconds)
- crunch-dispatch-slurm passes "--time" parameter to sbatch
- crunch-run also stops container after exceeding time_limit (so the feature works independently of slurm, eg crunch-dispatch-local)
- Support TimeLimit requirement in arvados-cwl-runner (upcoming feature of CWL v1.1)
Related issues
Updated by Peter Amstutz over 6 years ago
Here's the relevant sbatch feature:
-t, --time=<time>
Set a limit on the total run time of the job allocation. If the requested time limit exceeds the partition's time limit, the job will be left in a PENDING state (possibly indefinitely). The default time limit is the partition's default time limit. When the time limit is reached, each task in each job step is sent SIGTERM followed by SIGKILL. The interval between signals is specified by the Slurm configuration parameter KillWait. The OverTimeLimit configuration parameter may permit the job to run longer than scheduled. Time resolution is one minute and second values are rounded up to the next minute.
A time limit of zero requests that no time limit be imposed. Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".
Updated by Peter Amstutz over 6 years ago
Also:
--time-min=<time>
Set a minimum time limit on the job allocation. If specified, the job may have it's --time limit lowered to a value no lower than --time-min if doing so permits the job to begin execution earlier than otherwise possible. The job's time limit will not be changed after the job is allocated resources. This is performed by a backfill scheduling algorithm to allocate resources otherwise reserved for higher priority jobs. Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".
Updated by Peter Amstutz over 6 years ago
- Subject changed from Allow user to specify resource limits for submitted jobs to Allow user to specify time limit for submitted jobs
- Description updated (diff)
Updated by Tom Morris over 6 years ago
- Target version changed from To Be Groomed to Arvados Future Sprints
Updated by Tom Morris over 6 years ago
- Target version changed from Arvados Future Sprints to 2018-07-03 Sprint
Updated by Lucas Di Pentima over 6 years ago
- Assigned To set to Lucas Di Pentima
Updated by Tom Clegg over 6 years ago
Might be better to implement this in sdk/go/dispatch / crunch-dispatch-slurm rather than asking slurm to do it. It sounds like slurm will send SIGKILL crunch-run after KillWait, which can prevent logs and (partial) outputs from being written to Keep.
Also suggest a more specific name than "time_limit", which (as a scheduling parameter) sounds like it could also mean queue time or queue+run time. "max_run_time"?
Updated by Peter Amstutz over 6 years ago
Tom Clegg wrote:
Might be better to implement this in sdk/go/dispatch / crunch-dispatch-slurm rather than asking slurm to do it. It sounds like slurm will send SIGKILL crunch-run after KillWait, which can prevent logs and (partial) outputs from being written to Keep.
I think we want both. I agree we should prefer a graceful shutdown. However, slurm uses time limit in for its backfill scheduler. I don't think we really care on cloud but it is relevant for HPC. Maybe the slurm time limit should have some extra head room.
Also suggest a more specific name than "time_limit", which (as a scheduling parameter) sounds like it could also mean queue time or queue+run time. "max_run_time"?
Updated by Lucas Di Pentima over 6 years ago
- Status changed from New to In Progress
Updated by Tom Clegg over 6 years ago
Peter Amstutz wrote:
I think we want both. I agree we should prefer a graceful shutdown. However, slurm uses time limit in for its backfill scheduler. I don't think we really care on cloud but it is relevant for HPC. Maybe the slurm time limit should have some extra head room.
I see -- if we give slurm a time limit, it can make better scheduling decisions. But what would be the appropriate amount of time to allow for writing logs/outputs? It might be better to offer the "abandon the job completely, even if that means abandoning logs/outputs of a successful run" behavior with a separate knob. It seems like that's the only kind of limit slurm could use for scheduling purposes.
The objective isn't mentioned explicitly here but I think it's to reduce the cost of user containers that sometimes deadlock, or have pathologically low resource usage (e.g., arv-mount cache thrashing).
IMO we should implement this in a way that isn't slurm-specific at all, and avoid introducing other side effects like killing crunch-run while it's wrapping up.
Updated by Lucas Di Pentima over 6 years ago
(WIP) updates at 57fd9fa6b - branch 13219-jobs-time-limit
Tom: Do you think the updates at dispatch.go
on this commit is the correct approach? Want to check with you before start writing tests.
Updated by Tom Clegg over 6 years ago
- easy to log the "max runtime exceeded" message live + to the permanent log
- crunch-run already knows the container start time, so we don't have to start tracking that separately
- WaitFinish() can just make a time.NewTimer() and add a section to the select block, similar to the "arv-mount exited" case.
Other comment: scheduling_parameters should continue to be empty by default, in both container and container_request.
Updated by Lucas Di Pentima over 6 years ago
Updates at 1f9519fba
Test run: https://ci.curoverse.com/job/developer-run-tests/780/
- Removed unnecessary default scheduling_parameter on the API server
- Moved time out code from
dispatch
library tocrunch-run
- Added CWL
TimeLimit
support onarvados-cwl-runner
Updated by Tom Clegg over 6 years ago
Can the cwl test assertion be more focused? Diffing the whole submission seems like a bad habit that makes it hard to diagnose failing tests. Looking at test_initial_work_dir() I'm guessing we can do something like this:
_, kwargs = runner.api.container_requests().create.call_args
self.assertEqual(42, kwargs['body']['scheduling_parameters'].get('max_run_time'))
The rest LGTM, thanks
Updated by Lucas Di Pentima over 6 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|816764a283c2cbf2d41b4582113065922b99bd52.
Updated by Lucas Di Pentima over 6 years ago
Update at 380e4da5a - branch 13219-arv-cwl-schema-fix
Test run: https://ci.curoverse.com/job/developer-run-tests/783/
Arvados CWL schema updated -- now arvados-cwl-runner
accepts the TimeLimit parameter. Example:
class: CommandLineTool cwlVersion: v1.0 $namespaces: cwltool: "http://commonwl.org/cwltool#" inputs: [] outputs: [] requirements: cwltool:TimeLimit: timelimit: 5 baseCommand: [sleep, "30"]
Updated by Peter Amstutz over 6 years ago
Lucas Di Pentima wrote:
Update at 380e4da5a - branch
13219-arv-cwl-schema-fix
Test run: https://ci.curoverse.com/job/developer-run-tests/783/Arvados CWL schema updated -- now
arvados-cwl-runner
accepts the TimeLimit parameter. Example:[...]
This needs to be documented in http://doc.arvados.org/user/cwl/cwl-extensions.html
Updated by Peter Amstutz over 6 years ago
Tom Clegg wrote:
Peter Amstutz wrote:
I think we want both. I agree we should prefer a graceful shutdown. However, slurm uses time limit in for its backfill scheduler. I don't think we really care on cloud but it is relevant for HPC. Maybe the slurm time limit should have some extra head room.
I see -- if we give slurm a time limit, it can make better scheduling decisions. But what would be the appropriate amount of time to allow for writing logs/outputs? It might be better to offer the "abandon the job completely, even if that means abandoning logs/outputs of a successful run" behavior with a separate knob. It seems like that's the only kind of limit slurm could use for scheduling purposes.
The objective isn't mentioned explicitly here but I think it's to reduce the cost of user containers that sometimes deadlock, or have pathologically low resource usage (e.g., arv-mount cache thrashing).
IMO we should implement this in a way that isn't slurm-specific at all, and avoid introducing other side effects like killing crunch-run while it's wrapping up.
I agree that doing it in crunch-run so that it only applies to the runtime of the actual job (and not the setup/teardown overhead) is the right way to do it, but I still think crunch-dispatch-slurm should also be setting a SLURM time limit for scheduling, with some configurable amount of head room. Perhaps we should reach out to our HPC users and see what they think?
Updated by Lucas Di Pentima over 6 years ago
Update at 9b6abcd04
Adds documentation to CWL extension user's guide page.
Updated by Peter Amstutz over 6 years ago
- Related to Idea #13760: Provide more information to SLURM to make scheduling decisions on HPC added