Project

General

Profile

Bug #10808

Updated by Ward Vandewege almost 8 years ago

Job c97qk-8i9sb-bj9c3ojdng85osz appears to be unkillable via the cancel button on workbench. 

 There are several pipeline instances waiting on it: 

 <pre> 
 2017-01-04_18:27:11.93074 2017-01-04 18:27:11 +0000 -- pipeline_instance c97qk-d1hrv-n6pik83zizjk5hn 
 2017-01-04_18:27:11.93074 cwl-runner c97qk-8i9sb-bj9c3ojdng85osz {:running=>1, :done=>0, :failed=>0, :todo=>0} 
 2017-01-04_18:27:13.03719  
 2017-01-04_18:27:13.03721 2017-01-04 18:27:12 +0000 -- pipeline_instance c97qk-d1hrv-0thxn81rmpaedyo 
 2017-01-04_18:27:13.03721 cwl-runner c97qk-8i9sb-bj9c3ojdng85osz {:running=>1, :done=>0, :failed=>0, :todo=>0} 
 2017-01-04_18:27:14.53057  
 2017-01-04_18:27:14.53060 2017-01-04 18:27:14 +0000 -- pipeline_instance c97qk-d1hrv-frf2e4vls4gq22v 
 2017-01-04_18:27:14.53062 cwl-runner c97qk-8i9sb-bj9c3ojdng85osz {:running=>1, :done=>0, :failed=>0, :todo=>0} 
 2017-01-04_18:27:15.74509  
 2017-01-04_18:27:15.74511 2017-01-04 18:27:15 +0000 -- pipeline_instance c97qk-d1hrv-5dzt55sa9wlq495 
 2017-01-04_18:27:15.74512 cwl-runner c97qk-8i9sb-bj9c3ojdng85osz {:running=>1, :done=>0, :failed=>0, :todo=>0} 
 2017-01-04_18:27:16.69833  
 2017-01-04_18:27:16.69834 2017-01-04 18:27:16 +0000 -- pipeline_instance c97qk-d1hrv-1uwxdzktqgl8hr6 
 2017-01-04_18:27:16.69835 cwl-runner c97qk-8i9sb-bj9c3ojdng85osz {:running=>1, :done=>0, :failed=>0, :todo=>0} 
 2017-01-04_18:27:20.34010  
 </pre> 

 It is not actually running: 

 <pre> 
 c97qk:/etc/service# sinfo 
 PARTITION AVAIL    TIMELIMIT    NODES    STATE NODELIST 
 compute*       up     infinite        7 drain* compute[3-9] 
 compute*       up     infinite      249    down* compute[0-2,10-255] 
 crypto         up     infinite        7 drain* compute[3-9] 
 crypto         up     infinite      249    down* compute[0-2,10-255] 
 c97qk:/etc/service# squeue_long  
   JOBID PARTITION NAME       USER ST         TIME    NODES NODELIST(REASON) 
 </pre> 

 I ran the stale jobs script which cleaned up two stale jobs but not c97qk-8i9sb-bj9c3ojdng85osz: 

 <pre> 
 c97qk:/var/www/arvados-api/current/script# RAILS_ENV=production bundle exec ./fail-jobs.rb --before reboot 
 Called 'load' without the :safe option -- defaulting to safe mode. 
 You can avoid this warning in the future by setting the SafeYAML::OPTIONS[:default_mode] option (to :safe or :unsafe). 
 dispatch: c97qk-8i9sb-ip0ts72w6z9nty9: cleaned up stale job: started before 2016-11-18 02:31:23 +0000 
 dispatch: c97qk-8i9sb-7wzv76a2p7jbi9d: cleaned up stale job: started before 2016-11-18 02:31:23 +0000 
 </pre>

Back