Currently Arvados uses SLURM to dispatch containers to worker hosts. (Related: Container dispatch)
Arvados supports a range of SLURM versions and configurations, but there are some sensitivities.
Limited "nice" values (SLURM 15)¶
Background: crunch-dispatch-slurm needs to adjust SLURM job priorities so that job priority order matches container priority order. It uses SLURM's "nice" feature to do this. This is preferable to adjusting priority directly because it doesn't require crunch-dispatch-slurm to have SLURM administrator privileges.
Older versions of SLURM (including version 15, in ubuntu 1604) do not accept nice values ≥10000. When lots of SLURM jobs are being submitted and containers run for a long time, this limitation can prevent crunch-dispatch-slurm from achieving the desired priority order. In most cases containers will continue to run. However, in some cases this can contribute to a dispatch deadlock in which all worker nodes are consumed by containers that are waiting for their child containers to be dispatched.
Messages will appear in the crunch-dispatch-slurm logs:
2018/04/25 20:12:39 "/usr/bin/scontrol" ["scontrol" "update" "JobName=zzzzz-dz642-abcdefghijklmno" "Nice=12052"]: "scontrol: error: Invalid nice value, must be between -10000 and 10000"
In some cases, this can be avoided by reducing PrioritySpread in the crunch-dispatch-slurm configuration file. See https://doc.arvados.org/install/crunch2-slurm/install-dispatch.html.