Bug #4121

[Crunch] cancelled job did not get cancelled at the slurm level

Added by Ward Vandewege over 7 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
1.0

History

#1 Updated by Ward Vandewege over 7 years ago

Job 9tee4-8i9sb-z5mxjnqgda5di0z was cancelled, but slurm never got the message:

 squeue_long 
  JOBID PARTITION NAME     USER ST       TIME  NODES NODELIST(REASON)
     87   compute 9tee4-8i9sb-z5mxjnqgda5di0z   crunch  R   16:22:31      1 compute1

Crunch-dispatch logs don't say a lot:

@4000000054331716018b706c.s:2014-10-06_21:58:52.06739 git --git-dir=/var/lib/arvados/internal.git tag 9tee4-8i9sb-z5mxjnqgda5di0z 3985ead6428cf6d847e107a4f449609a47b1f25b
@4000000054331716018b706c.s:2014-10-06_21:58:52.14675 dispatch: sudo -E -u crunch PATH=/var/www/9tee4.arvadosapi.com/releases/20141006150429/vendor/bundle/ruby/2.1.0/bin:/usr/local/rvm/gems/ruby-2.1.2/bin:/usr/local/rvm/gems/ruby-2.1.2@global/bin:/usr/local/rvm/rubies/ruby-2.1.2/bin:/usr/local/rvm/bin:/usr/local/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/X11R6/bin:/usr/local/arvados/src/services/crunch PERLLIB=/usr/local/arvados/src/sdk/perl/lib PYTHONPATH= RUBYLIB=/usr/local/rvm/gems/ruby-2.1.2@global/gems/bundler-1.6.2/lib GEM_PATH= salloc --chdir=/ --immediate --exclusive --no-kill --job-name=9tee4-8i9sb-z5mxjnqgda5di0z --nodelist=compute1 /usr/local/arvados/src/services/crunch/crunch-job --job-api-token 1rt2034l6xkz1k9mrb5kt13hjdyehklif12f5um8mr5x9e6oyc --job 9tee4-8i9sb-z5mxjnqgda5di0z --git-dir /var/lib/arvados/internal.git
@4000000054331716018b706c.s:2014-10-06_21:58:52.22979 dispatch: job 9tee4-8i9sb-z5mxjnqgda5di0z
@4000000054331716018b706c.s:2014-10-06_21:58:52.44741 dispatch: update compute1 state to {:state=>"alloc", :job=>"9tee4-8i9sb-z5mxjnqgda5di0z"}
@4000000054331716018b706c.s:2014-10-06_21:58:52.52334 9tee4-8i9sb-z5mxjnqgda5di0z ! salloc: Granted job allocation 87
@4000000054331716018b706c.s:2014-10-06_21:58:53.00822 9tee4-8i9sb-z5mxjnqgda5di0z 23395  check slurm allocation
@4000000054331716018b706c.s:2014-10-06_21:58:53.00838 9tee4-8i9sb-z5mxjnqgda5di0z 23395  node compute1 - 20 slots
@4000000054331716018b706c.s:2014-10-06_21:58:53.24730 9tee4-8i9sb-z5mxjnqgda5di0z 23395  start
@4000000054331716018b706c.s:2014-10-06_21:58:53.48397 9tee4-8i9sb-z5mxjnqgda5di0z 23395  Install revision 3985ead6428cf6d847e107a4f449609a47b1f25b
@4000000054331716018b706c.s:2014-10-06_21:58:53.48406 9tee4-8i9sb-z5mxjnqgda5di0z ! /bin/fusermount: entry for /tmp/crunch-job/work/0.12178.keep not found in /etc/mtab
@4000000054331716018b706c.s:2014-10-06_21:58:53.48419 9tee4-8i9sb-z5mxjnqgda5di0z ! /bin/fusermount: entry for /tmp/crunch-job/work/11.20175.keep not found in /etc/mtab
@4000000054331716018b706c.s:2014-10-06_21:58:53.48425 9tee4-8i9sb-z5mxjnqgda5di0z ! /bin/fusermount: entry for /tmp/crunch-job/work/12.20190.keep not found in /etc/mtab
@4000000054331716018b706c.s:2014-10-06_21:58:53.72786 9tee4-8i9sb-z5mxjnqgda5di0z ! /bin/fusermount: entry for /tmp/crunch-job/work/14.20214.keep not found in /etc/mtab
@4000000054331716018b706c.s:2014-10-06_21:58:53.75874 9tee4-8i9sb-z5mxjnqgda5di0z ! /bin/fusermount: entry for /tmp/crunch-job/work/18.20264.keep not found in /etc/mtab
@4000000054331716018b706c.s:2014-10-06_21:58:53.98341 9tee4-8i9sb-z5mxjnqgda5di0z ! /bin/fusermount: entry for /tmp/crunch-job/work/7.20114.keep not found in /etc/mtab
@4000000054331716018b706c.s:2014-10-06_21:58:53.98349 9tee4-8i9sb-z5mxjnqgda5di0z ! /bin/fusermount: entry for /tmp/crunch-job/work/8.20128.keep not found in /etc/mtab

#2 Updated by Ward Vandewege over 7 years ago

  • Target version changed from Bug Triage to Arvados Future Sprints

#3 Updated by Ward Vandewege over 7 years ago

  • Story points set to 1.0

#4 Updated by Tom Morris over 5 years ago

  • Status changed from New to Closed

#5 Updated by Tom Morris over 5 years ago

  • Target version deleted (Arvados Future Sprints)

Also available in: Atom PDF