Actions
Bug #17186
open[dispatch] broken node logs should also be copied to a-d-c logs
Story points:
-
Release:
Release relationship:
Auto
Description
Currently, when crunch-run detects a broken node, it will report that in the container logs, e.g. su92l-dz642-uyoqykf2i604pma, https://workbench.su92l.arvadosapi.com/collections/04c43ec5454a350c37c0affd7d331e63+1236/crunch-run.txt?disposition=inline&size=1358:
2020-12-01T20:35:27.009845935Z Error suggests node is unable to run containers: While loading container image into Docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? 2020-12-01T20:35:27.009904337Z Writing /var/lock/crunch-run-broken to mark node as broken 2020-12-01T20:35:27.009991541Z error in Run: While loading container image: While loading container image into Docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? 2020-12-01T20:35:27.073795048Z crunch-run finished
Meanwhile, the a-d-c logs don't provide any detail:
Dec 01 20:34:43 su92l.arvadosapi.com arvados-dispatch-cloud[120842]: {"Address":"10.28.64.31","ContainerUUID":"su92l-dz642-n9nu9htkcj4ofp6","Instance":"/subscriptions/3fa048dc-aa38-4820-85ba-68498da5f26b/resourceGroups/su92l/providers/Microsoft.Compute/virtualMachines/compute-f51710e302afe4aef4a97c634a7c2ed3-tyxs4w1m1dwwkfj","InstanceType":"Standard_D32s_v3","PID":120842,"level":"info","msg":"crunch-run process started","time":"2020-12-01T20:34:43.820244851Z"} Dec 01 20:35:54 su92l.arvadosapi.com arvados-dispatch-cloud[120842]: {"Address":"10.28.64.31","ContainerUUID":"su92l-dz642-n9nu9htkcj4ofp6","Instance":"/subscriptions/3fa048dc-aa38-4820-85ba-68498da5f26b/resourceGroups/su92l/providers/Microsoft.Compute/virtualMachines/compute-f51710e302afe4aef4a97c634a7c2ed3-tyxs4w1m1dwwkfj","InstanceType":"Standard_D32s_v3","PID":120842,"Reason":"state=Queued","level":"info","msg":"killing crunch-run process","time":"2020-12-01T20:35:54.226894349Z"}
It will be helpful for debugging to copy the broken node details into the a-d-c logs.
Updated by Ward Vandewege about 4 years ago
- Related to Feature #17185: [adc] add broken node metrics added
Actions