Project

General

Profile

Bug #13868

Updated by Peter Amstutz over 5 years ago

<pre> 
 Jul 19 14:50:50 manage.e51c5.arvadosapi.com env[110136]: 2018-07-19 14:50:50 ComputeNodeUpdateActor.5af93592d98f[110137] ERROR: SLURM update ['scontrol', 'update', u'NodeName=compute138', 'Weight=9999999000', 'Features=instancetype=invalid'] failed 
 Jul 19 14:50:50 manage.e51c5.arvadosapi.com env[110136]: Traceback (most recent call last): 
 Jul 19 14:50:50 manage.e51c5.arvadosapi.com env[110136]:     File "/usr/lib/python2.7/dist-packages/arvnodeman/computenode/dispatch/slurm.py", line 26, in _update_slurm_node 
 Jul 19 14:50:50 manage.e51c5.arvadosapi.com env[110136]:       subprocess.check_output(cmd) 
 Jul 19 14:50:50 manage.e51c5.arvadosapi.com env[110136]:     File "/usr/lib/python2.7/dist-packages/subprocess32.py", line 343, in check_output 
 Jul 19 14:50:50 manage.e51c5.arvadosapi.com env[110136]:       raise CalledProcessError(retcode, process.args, output=output) 
 Jul 19 14:50:50 manage.e51c5.arvadosapi.com env[110136]: CalledProcessError: Command '['scontrol', 'update', u'NodeName=compute138', 'Weight=9999999000', 'Features=instancetype=invalid']' returned non-zero exit status 1. 
 Jul 19 14:50:50 manage.e51c5.arvadosapi.com env[110136]: scontrol: error: Weight value (9999999000) is greater than 4294967280 
 </pre> 

 Invalid nodes have a weight of 9999999. 

 Two problems: 

 We should make the invalid weight smaller. 

 If there are nodes that don't have the "arvados_node_size" tag, it is set to "None" instead of using the regular @size.id@ like before. 

Back