Bug #14026

Error response from daemon: Range of CPUs is from 0.01 to 1.00, as there are only 1 CPUs available

Added by Bryan Cosca over 1 year ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

https://workbench.e51c5.arvadosapi.com/container_requests/e51c5-xvhdp-ugp7a5guqc4brhh#Log

2018-08-14T10:11:28.088035478Z crunch-run 1.1.4 started
2018-08-14T10:11:28.088079978Z Executing container 'e51c5-dz642-uc8ccn1nzs9q9u0'
2018-08-14T10:11:28.088104778Z Executing on host 'compute62.e51c5.arvadosapi.com'
2018-08-14T10:11:28.176213953Z Fetching Docker image from collection 'be17bc91682c86583461bf461858492b+426'
2018-08-14T10:11:28.188391149Z Using Docker image id 'sha256:d849cf08d27d02f19c5ae1ea0a5d49dc4b000e495a727762983342f7168a4199'
2018-08-14T10:11:28.190780848Z Docker image is available
2018-08-14T10:11:28.190948348Z Running [arv-mount --foreground --allow-other --read-write --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id /tmp/crunch-run.e51c5-dz642-uc8ccn1nzs9q9u0.460952586/keep648490465]
2018-08-14T10:11:28.176360052Z notice: reading stats from /sys/fs/cgroup/cpuacct/cgroup.procs
2018-08-14T10:11:28.176402052Z notice: reading stats from /sys/fs/cgroup/memory/memory.stat
2018-08-14T10:11:28.176727852Z mem 20295680 cache 88 pgmajfault 536576 rss
2018-08-14T10:11:28.176742552Z notice: reading stats from /sys/fs/cgroup/cpuacct/cpuacct.stat
2018-08-14T10:11:28.176781152Z notice: reading stats from /sys/fs/cgroup/cpuset/cpuset.cpus
2018-08-14T10:11:28.176800152Z cpu 25.6000 user 22.7300 sys 1 cpus
2018-08-14T10:11:28.176925952Z net:docker0 0 tx 0 rx
2018-08-14T10:11:28.176936152Z net:eth0 1141280 tx 565449770 rx
2018-08-14T10:11:29.097092488Z Creating Docker container
2018-08-14T10:11:29.100320488Z While creating container: Error response from daemon: Range of CPUs is from 0.01 to 1.00, as there are only 1 CPUs available
2018-08-14T10:11:29.145217775Z Running [arv-mount --unmount-timeout=8 --unmount /tmp/crunch-run.e51c5-dz642-uc8ccn1nzs9q9u0.460952586/keep648490465]
2018-08-14T10:11:29.432344492Z fusermount: failed to unmount /tmp/crunch-run.e51c5-dz642-uc8ccn1nzs9q9u0.460952586/keep648490465: Invalid argument
2018-08-14T10:11:29.482877378Z crunch-run finished
2018-08-14T10:15:40.149987687Z crunch-run 1.1.4 started
2018-08-14T10:15:40.150028988Z Executing container 'e51c5-dz642-uc8ccn1nzs9q9u0'
2018-08-14T10:15:40.150051888Z Executing on host 'compute29.e51c5.arvadosapi.com'
2018-08-14T10:15:40.217954197Z Fetching Docker image from collection 'be17bc91682c86583461bf461858492b+426'
2018-08-14T10:15:40.231564039Z Using Docker image id 'sha256:d849cf08d27d02f19c5ae1ea0a5d49dc4b000e495a727762983342f7168a4199'
2018-08-14T10:15:40.233583245Z Docker image is available
2018-08-14T10:15:40.233748646Z Running [arv-mount --foreground --allow-other --read-write --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id /tmp/crunch-run.e51c5-dz642-uc8ccn1nzs9q9u0.545151285/keep410107664]
2018-08-14T10:15:41.002139315Z Creating Docker container
2018-08-14T10:15:41.005905626Z While creating container: Error response from daemon: Range of CPUs is from 0.01 to 1.00, as there are only 1 CPUs available

Related issues

Is duplicate of Arvados - Bug #14036: a-n-m spawned a container on the wrong sized nodeClosed

History

#1 Updated by Peter Amstutz over 1 year ago

It appears to have made a scheduling error and put a job requesting 2 cores on a 1 core machine. Which probably means slurm was acting on bad information when deciding how to schedule the job.

#2 Updated by Tom Clegg over 1 year ago

Log indicates we tried to run it on the wrong instance type 3 times before choosing one with 2 CPUs. compute62 and compute29 (x2) failed with that error, then compute68 succeeded. Unfortunately the host info logs aren't saved for the earlier attempts so I'm just assuming docker is counting the host's CPUs correctly.

#3 Updated by Tom Clegg over 1 year ago

  • Is duplicate of Bug #14036: a-n-m spawned a container on the wrong sized node added

#4 Updated by Peter Amstutz about 1 month ago

  • Status changed from New to Closed

Also available in: Atom PDF