Idea #11139
closed[Node manager] Expected MemTotal for each cloud node size
Description
There's a discrepancy between the RAM of a VM used to choose what size node to boot for a job, and the actual amount of memory available to the job. If a job falls in the "donut hole", the job will be unable to run because the request is larger than the actual memory available, but node manager won't boot up a properly sized node because it will believe that the job is satisfied.
tetron@compute3.c97qk:/usr/local/share/arvados-compute-ping-controller.d$ awk '($1 == "MemTotal:"){print ($2 / 1024)}' </proc/meminfo 3440.54
df -m /tmp | perl -e ' > my $index = index(<>, " 1M-blocks "); > substr(<>, 0, $index + 10) =~ / (\d+)$/; > print "$1\n"; > ' 51170
tetron@compute3.c97qk:/usr/local/share/arvados-compute-ping-controller.d$ sinfo -n compute3 --format "%c %m %d" CPUS MEMORY TMP_DISK 1 3440 51169
>>> szd["Standard_D1_v2"] <NodeSize: id=Standard_D1_v2, name=Standard_D1_v2, ram=3584 disk=50 bandwidth=0 price=0 driver=Azure Virtual machines ...> >>>
For Standard_D1_v2 there is a ~144 MiB discrepancy between the advertised RAM size and the amount of RAM considered available by Linux.
CPUS MEMORY TMP_DISK 2 6968 102344
<NodeSize: id=Standard_D2_v2, name=Standard_D2_v2, ram=7168 disk=100 bandwidth=0 price=0 driver=Azure Virtual machines ...>
For Standard_D1_v2 it is 200 MiB.
Based on discussion: node manager should reduce the RAM size for node by 5% from the "sticker value" in the ServerCalculator (jobqueue.py)
The scale factor should be settable in the configuration file.