Bug #14117

c-d-s reniceAll sets nice on jobs that are not pending

Added by Joshua Randall over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Crunch
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

The reniceAll function makes no distinction between the state of jobs, and SqueueChecker runs `squeue` with the `--all` option, which returns jobs in all states.

As a result, it appears that reniceAll ends up setting priority on jobs whose priority has no impact on scheduling, including jobs that are already running and those which have recently completed, been cancelled, or failed.

I would suggest adding `state` to the `slurmJob` struct and then adding a new conditional block before the `if j.wantPriority == 0` one that is something like:

if j.state != "PENDING" {
    continue
}

History

#1 Updated by Joshua Randall over 2 years ago

According to squeue docs (https://slurm.schedmd.com/squeue.html) the complete set of possible job states are:

BF BOOT_FAIL
Job terminated due to launch failure, typically due to a hardware failure (e.g. unable to boot the node or block and the job can not be requeued).
CA CANCELLED
Job was explicitly cancelled by the user or system administrator. The job may or may not have been initiated.
CD COMPLETED
Job has terminated all processes on all nodes with an exit code of zero.
CF CONFIGURING
Job has been allocated resources, but are waiting for them to become ready for use (e.g. booting).
CG COMPLETING
Job is in the process of completing. Some processes on some nodes may still be active.
DL DEADLINE
Job terminated on deadline.
F FAILED
Job terminated with non-zero exit code or other failure condition.
NF NODE_FAIL
Job terminated due to failure of one or more allocated nodes.
OOM OUT_OF_MEMORY
Job experienced out of memory error.
PD PENDING
Job is awaiting resource allocation.
PR PREEMPTED
Job terminated due to preemption.
R RUNNING
Job currently has an allocation.
RD RESV_DEL_HOLD
Job is held.
RF REQUEUE_FED
Job is being requeued by a federation.
RH REQUEUE_HOLD
Held job is being requeued.
RQ REQUEUED
Completing job is being requeued.
RS RESIZING
Job is about to change size.
RV REVOKED
Sibling was removed from cluster due to other cluster starting the job.
SE SPECIAL_EXIT
The job was requeued in a special state. This state can be set by users, typically in EpilogSlurmctld, if the job has terminated with a particular exit value.
ST STOPPED
Job has an allocation, but execution has been stopped with SIGSTOP signal. CPUS have been retained by this job.
S SUSPENDED
Job has an allocation, but execution has been suspended and CPUs have been released for other jobs.
TO TIMEOUT
Job terminated upon reaching its time limit.

Of those, I think the ones relevant to prioritisation are:
NODE_FAIL - because in some SLURM configurations jobs that experience node failure can be automatically requeued
PENDING - this is the normal state that requires prioritisation
PREEMPTED - because when job preemption is configured, preempted jobs are automatically requeued
RESV_DEL_HOLD - not really sure when this would happen, but it sounds like it could still be queued if the priority changes
REQUEUE_FED - probably not relevant to arvados use-case but if it is being requeued, then priority still matters
REQUEUE_HOLD - again, if it is requeued then priority probably matters
REQUEUED - same
RESIZING - I think this probably only happens to running jobs but the priority may influence whether the resize is successful (I'm not sure)
SPECIAL_EXIT - this says that it means the job has been requeued so I guess priority may matter
SUSPENDED - because when using preemption SLURM can be configured to suspend jobs rather than requeuing them, and a priority change could be relevant to a resume decision

And the ones that I would argue should not be subject to ongoing renice adjustments are:
BOOT_FAIL - failed and can not be requeued in this state
CANCELLED - jobs that are done should not be prioritised
COMPLETED - jobs that are done should not be prioritised
CONFIGURING - as job has already been allocated resources, the prioritisation decision has already been made
COMPLETING - no need to prioritise jobs that are already running
DEADLINE - jobs that are done should not be prioritised
FAILED - jobs that are done should not be prioritised
OUT_OF_MEMORY - jobs that are done should not be prioritised
RUNNING - no need to prioritise jobs that are already running
REVOKED - this seems unlikely to happen in an arvados configuration, but it sounds like it is a final state for this cluster
STOPPED - no need to prioritise jobs that are already running
TIMEOUT - jobs that are done should not be prioritised

#2 Updated by Joshua Randall over 2 years ago

so, perhaps:

        for _, j := range sqc.queue {
+               switch j.state {
+               case
+                       "BOOT_FAIL",
+                       "CANCELLED",
+                       "COMPLETED",
+                       "CONFIGURING",
+                       "COMPLETING",
+                       "DEADLINE",
+                       "FAILED",
+                       "OUT_OF_MEMORY",
+                       "RUNNING",
+                       "REVOKED",
+                       "STOPPED",
+                       "TIMEOUT":
+                               continue
+               }
                if j.wantPriority == 0 {

Also available in: Atom PDF