Project

General

Profile

Idea #8000

Updated by Peter Amstutz over 8 years ago

Apparently node manager only shuts down nodes that are "idle" in slurm, if they are "down" then they don't get shut down? 

 <pre> 
 2015-12-11_20:41:05.08909 2015-12-11 20:41:05 arvnodeman.cloud_nodes[11545] DEBUG: CloudNodeListMonitorActor (at 140548410010704) got response with 1 items 
 2015-12-11_20:41:05.09007 2015-12-11 20:41:05 arvnodeman.daemon[11545] INFO: Registering new cloud node /subscriptions/a731f419-596b-4b64-a278-364e76506b06/resourceGroups/c97qk/providers/Microsoft.Compute/virtualMachines/compute-tj4hwdsw3yjiyjt-c97qk 
 2015-12-11_20:41:05.09273 2015-12-11 20:41:05 pykka[11545] DEBUG: Registered ComputeNodeMonitorActor (urn:uuid:83697dab-e718-4fd5-8595-b6563015585c) 
 2015-12-11_20:41:05.09280 2015-12-11 20:41:05 pykka[11545] DEBUG: Starting ComputeNodeMonitorActor (urn:uuid:83697dab-e718-4fd5-8595-b6563015585c) 
 2015-12-11_20:41:05.09391 2015-12-11 20:41:05 arvnodeman.computenode[11545] DEBUG: Node /subscriptions/a731f419-596b-4b64-a278-364e76506b06/resourceGroups/c97qk/providers/Microsoft.Compute/virtualMachines/compute-tj4hwdsw3yjiyjt-c97qk suggesting shutdown. 
 2015-12-11_20:41:05.09584 2015-12-11 20:41:05 arvnodeman.cloud_nodes[11545] DEBUG: <pykka.proxy._CallableProxy object at 0x7fd3f81b0850> subscribed to events for '/subscriptions/a731f419-596b-4b64-a278-364e76506b06/resourceGroups/c97qk/providers/Microsoft.Compute/virtualMachines/compute-tj4hwdsw3yjiyjt-c97qk' 
 2015-12-11_20:41:05.09804 2015-12-11 20:41:05 arvnodeman.daemon[11545] INFO: Cloud node /subscriptions/a731f419-596b-4b64-a278-364e76506b06/resourceGroups/c97qk/providers/Microsoft.Compute/virtualMachines/compute-tj4hwdsw3yjiyjt-c97qk has associated with Arvados node c97qk-7ekkf-tj4hwdsw3yjiyjt 
 2015-12-11_20:41:05.09921 2015-12-11 20:41:05 arvnodeman.computenode[11545] DEBUG: Node /subscriptions/a731f419-596b-4b64-a278-364e76506b06/resourceGroups/c97qk/providers/Microsoft.Compute/virtualMachines/compute-tj4hwdsw3yjiyjt-c97qk shutdown window open but node busy. 
 2015-12-11_20:41:05.10064 2015-12-11 20:41:05 arvnodeman.arvados_nodes[11545] DEBUG: <pykka.proxy._CallableProxy object at 0x7fd3f8e11250> subscribed to events for 'c97qk-7ekkf-tj4hwdsw3yjiyjt' 
 </pre> 

 <pre> 
 $ arv node get -u c97qk-7ekkf-tj4hwdsw3yjiyjt 
 { 
  "href":"/nodes/c97qk-7ekkf-tj4hwdsw3yjiyjt", 
  "kind":"arvados#node", 
  "etag":"984qlz3msed6utdnndclhuz0o", 
  "uuid":"c97qk-7ekkf-tj4hwdsw3yjiyjt", 
  "owner_uuid":"c97qk-tpzed-000000000000000", 
  "created_at":"2015-09-09T14:26:19.832861000Z", 
  "modified_by_client_uuid":null, 
  "modified_by_user_uuid":"c97qk-tpzed-000000000000000", 
  "modified_at":"2015-12-11T20:58:01.734010000Z", 
  "hostname":"compute0", 
  "domain":"c97qk.arvadosapi.com", 
  "ip_address":"10.25.64.10", 
  "last_ping_at":"2015-12-11T20:58:01.734010000Z", 
  "slot_number":0, 
  "status":"running", 
  "job_uuid":null, 
  "crunch_worker_state":"down", 
  "properties":{ 
   "cloud_node":{ 
    "price":0, 
    "size":"Standard_D1" 
   }, 
   "total_cpu_cores":1, 
   "total_ram_mb":3442, 
   "total_scratch_mb":51172 
  }, 
  "first_ping_at":"2015-12-08T02:17:01.949316000Z", 
  "info":{ 
   "ec2_instance_id":"/subscriptions/a731f419-596b-4b64-a278-364e76506b06/resourceGroups/c97qk/providers/Microsoft.Compute/virtualMachines/compute-tj4hwdsw3yjiyjt-c97qk", 
   "last_action":"Prepared by Node Manager", 
   "ping_secret":"35vaizroj3kkoqzm2vad92t6fewg7hbdix8jgj0wpklh3rdo4v", 
   "slurm_state":"down" 
  }, 
  "nameservers":[ 
   "10.25.0.6" 
  ] 
 } 
 </pre> 

 <pre> 
 PARTITION AVAIL    TIMELIMIT    NODES    STATE NODELIST 
 compute*       up     infinite        2 drain* compute[2-3] 
 compute*       up     infinite      252    down* compute[1,4-14,16-255] 
 compute*       up     infinite        1     idle compute15 
 compute*       up     infinite        1     down compute0 
 </pre> 

Back