Story #13219

Allow user to specify time limit for submitted jobs

Added by Tom Morris over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
07/02/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
3.0
Release:
Release relationship:
Auto

Description

Allow a user to specify a maximum amount of run time which can be used by a job before it is cancelled.

Involves:

  • Document "time_limit" in container_request/container scheduling_parameters (time in seconds)
  • crunch-dispatch-slurm passes "--time" parameter to sbatch
  • crunch-run also stops container after exceeding time_limit (so the feature works independently of slurm, eg crunch-dispatch-local)
  • Support TimeLimit requirement in arvados-cwl-runner (upcoming feature of CWL v1.1)

Subtasks

Task #13652: Review 13219-jobs-time-limitResolvedLucas Di Pentima


Related issues

Related to Arvados - Story #13760: Provide more information to SLURM to make scheduling decisions on HPCNew

Associated revisions

Revision 816764a2
Added by Lucas Di Pentima over 3 years ago

Merge branch '13219-jobs-time-limit'
Closes #13219

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <>

Revision 66c644ef
Added by Lucas Di Pentima over 3 years ago

Merge branch '13219-arv-cwl-schema-fix'
Refs #13219

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <>

History

#1 Updated by Peter Amstutz over 3 years ago

Here's the relevant sbatch feature:

-t, --time=<time>
Set a limit on the total run time of the job allocation. If the requested time limit exceeds the partition's time limit, the job will be left in a PENDING state (possibly indefinitely). The default time limit is the partition's default time limit. When the time limit is reached, each task in each job step is sent SIGTERM followed by SIGKILL. The interval between signals is specified by the Slurm configuration parameter KillWait. The OverTimeLimit configuration parameter may permit the job to run longer than scheduled. Time resolution is one minute and second values are rounded up to the next minute.

A time limit of zero requests that no time limit be imposed. Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".

#2 Updated by Peter Amstutz over 3 years ago

Also:

--time-min=<time>
Set a minimum time limit on the job allocation. If specified, the job may have it's --time limit lowered to a value no lower than --time-min if doing so permits the job to begin execution earlier than otherwise possible. The job's time limit will not be changed after the job is allocated resources. This is performed by a backfill scheduling algorithm to allocate resources otherwise reserved for higher priority jobs. Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".

#3 Updated by Peter Amstutz over 3 years ago

  • Subject changed from Allow user to specify resource limits for submitted jobs to Allow user to specify time limit for submitted jobs
  • Description updated (diff)

#4 Updated by Tom Morris over 3 years ago

  • Story points set to 3.0

#5 Updated by Tom Morris over 3 years ago

  • Target version changed from To Be Groomed to Arvados Future Sprints

#6 Updated by Tom Morris over 3 years ago

  • Target version changed from Arvados Future Sprints to 2018-07-03 Sprint

#7 Updated by Lucas Di Pentima over 3 years ago

  • Assigned To set to Lucas Di Pentima

#8 Updated by Tom Clegg over 3 years ago

Might be better to implement this in sdk/go/dispatch / crunch-dispatch-slurm rather than asking slurm to do it. It sounds like slurm will send SIGKILL crunch-run after KillWait, which can prevent logs and (partial) outputs from being written to Keep.

Also suggest a more specific name than "time_limit", which (as a scheduling parameter) sounds like it could also mean queue time or queue+run time. "max_run_time"?

#9 Updated by Peter Amstutz over 3 years ago

Tom Clegg wrote:

Might be better to implement this in sdk/go/dispatch / crunch-dispatch-slurm rather than asking slurm to do it. It sounds like slurm will send SIGKILL crunch-run after KillWait, which can prevent logs and (partial) outputs from being written to Keep.

I think we want both. I agree we should prefer a graceful shutdown. However, slurm uses time limit in for its backfill scheduler. I don't think we really care on cloud but it is relevant for HPC. Maybe the slurm time limit should have some extra head room.

Also suggest a more specific name than "time_limit", which (as a scheduling parameter) sounds like it could also mean queue time or queue+run time. "max_run_time"?

#10 Updated by Lucas Di Pentima over 3 years ago

  • Status changed from New to In Progress

#11 Updated by Tom Clegg over 3 years ago

Peter Amstutz wrote:

I think we want both. I agree we should prefer a graceful shutdown. However, slurm uses time limit in for its backfill scheduler. I don't think we really care on cloud but it is relevant for HPC. Maybe the slurm time limit should have some extra head room.

I see -- if we give slurm a time limit, it can make better scheduling decisions. But what would be the appropriate amount of time to allow for writing logs/outputs? It might be better to offer the "abandon the job completely, even if that means abandoning logs/outputs of a successful run" behavior with a separate knob. It seems like that's the only kind of limit slurm could use for scheduling purposes.

The objective isn't mentioned explicitly here but I think it's to reduce the cost of user containers that sometimes deadlock, or have pathologically low resource usage (e.g., arv-mount cache thrashing).

IMO we should implement this in a way that isn't slurm-specific at all, and avoid introducing other side effects like killing crunch-run while it's wrapping up.

#12 Updated by Lucas Di Pentima over 3 years ago

(WIP) updates at 57fd9fa6b - branch 13219-jobs-time-limit

Tom: Do you think the updates at dispatch.go on this commit is the correct approach? Want to check with you before start writing tests.

#13 Updated by Tom Clegg over 3 years ago

Now that I see the logging awkwardness and the missing "start time" information, I'm thinking this would be simpler to implement in crunch-run:
  • easy to log the "max runtime exceeded" message live + to the permanent log
  • crunch-run already knows the container start time, so we don't have to start tracking that separately
  • WaitFinish() can just make a time.NewTimer() and add a section to the select block, similar to the "arv-mount exited" case.

Other comment: scheduling_parameters should continue to be empty by default, in both container and container_request.

#14 Updated by Lucas Di Pentima over 3 years ago

Updates at 1f9519fba
Test run: https://ci.curoverse.com/job/developer-run-tests/780/

  • Removed unnecessary default scheduling_parameter on the API server
  • Moved time out code from dispatch library to crunch-run
  • Added CWL TimeLimit support on arvados-cwl-runner

#15 Updated by Tom Clegg over 3 years ago

Can the cwl test assertion be more focused? Diffing the whole submission seems like a bad habit that makes it hard to diagnose failing tests. Looking at test_initial_work_dir() I'm guessing we can do something like this:

_, kwargs = runner.api.container_requests().create.call_args
self.assertEqual(42, kwargs['body']['scheduling_parameters'].get('max_run_time'))

The rest LGTM, thanks

#16 Updated by Lucas Di Pentima over 3 years ago

  • Status changed from In Progress to Resolved

#17 Updated by Lucas Di Pentima over 3 years ago

Update at 380e4da5a - branch 13219-arv-cwl-schema-fix
Test run: https://ci.curoverse.com/job/developer-run-tests/783/

Arvados CWL schema updated -- now arvados-cwl-runner accepts the TimeLimit parameter. Example:

class: CommandLineTool
cwlVersion: v1.0
$namespaces:
  cwltool: "http://commonwl.org/cwltool#" 
inputs: []
outputs: []
requirements:
  cwltool:TimeLimit:
    timelimit: 5
baseCommand: [sleep, "30"]

#18 Updated by Peter Amstutz over 3 years ago

Lucas Di Pentima wrote:

Update at 380e4da5a - branch 13219-arv-cwl-schema-fix
Test run: https://ci.curoverse.com/job/developer-run-tests/783/

Arvados CWL schema updated -- now arvados-cwl-runner accepts the TimeLimit parameter. Example:

[...]

This needs to be documented in http://doc.arvados.org/user/cwl/cwl-extensions.html

#19 Updated by Peter Amstutz over 3 years ago

Tom Clegg wrote:

Peter Amstutz wrote:

I think we want both. I agree we should prefer a graceful shutdown. However, slurm uses time limit in for its backfill scheduler. I don't think we really care on cloud but it is relevant for HPC. Maybe the slurm time limit should have some extra head room.

I see -- if we give slurm a time limit, it can make better scheduling decisions. But what would be the appropriate amount of time to allow for writing logs/outputs? It might be better to offer the "abandon the job completely, even if that means abandoning logs/outputs of a successful run" behavior with a separate knob. It seems like that's the only kind of limit slurm could use for scheduling purposes.

The objective isn't mentioned explicitly here but I think it's to reduce the cost of user containers that sometimes deadlock, or have pathologically low resource usage (e.g., arv-mount cache thrashing).

IMO we should implement this in a way that isn't slurm-specific at all, and avoid introducing other side effects like killing crunch-run while it's wrapping up.

I agree that doing it in crunch-run so that it only applies to the runtime of the actual job (and not the setup/teardown overhead) is the right way to do it, but I still think crunch-dispatch-slurm should also be setting a SLURM time limit for scheduling, with some configurable amount of head room. Perhaps we should reach out to our HPC users and see what they think?

#20 Updated by Lucas Di Pentima over 3 years ago

Update at 9b6abcd04

Adds documentation to CWL extension user's guide page.

#21 Updated by Peter Amstutz over 3 years ago

  • Related to Story #13760: Provide more information to SLURM to make scheduling decisions on HPC added

#22 Updated by Tom Morris about 3 years ago

  • Release set to 13

Also available in: Atom PDF