https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422020-07-15T13:44:31ZArvadosArvados - Bug #16600: Compute nodes missing attached scratch spacehttps://dev.arvados.org/issues/16600?journal_id=854302020-07-15T13:44:31ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Bug #16600: Compute nodes missing attached scratch spacehttps://dev.arvados.org/issues/16600?journal_id=854312020-07-15T13:45:47ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/85431/diff?detail_id=82201">diff</a>)</li></ul> Arvados - Bug #16600: Compute nodes missing attached scratch spacehttps://dev.arvados.org/issues/16600?journal_id=854322020-07-15T13:47:47ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/85432/diff?detail_id=82202">diff</a>)</li></ul> Arvados - Bug #16600: Compute nodes missing attached scratch spacehttps://dev.arvados.org/issues/16600?journal_id=854352020-07-15T14:22:14ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/85435/diff?detail_id=82206">diff</a>)</li></ul> Arvados - Bug #16600: Compute nodes missing attached scratch spacehttps://dev.arvados.org/issues/16600?journal_id=854952020-07-16T11:00:54ZJavier Bértolijbertoli@curii.com
<ul></ul><p>As Ward suggested, reused the code but it fails:</p>
<ul>
<li>AWS base AMI don't have LVM (which we use to manage the ephemeral images)</li>
<li><code>arvados-docker-cleaner</code> fails with this error<br /><pre>
Jul 16 10:50:34 ip-10-254-254-103 systemd[1]: arvados-docker-cleaner.service: Main process exited, code=exited, status=1/FAILURE
Jul 16 10:50:34 ip-10-254-254-103 systemd[1]: arvados-docker-cleaner.service: Failed with result 'exit-code'.
Jul 16 10:50:37 ip-10-254-254-103 dhclient[450]: XMT: Solicit on ens5, interval 121430ms.
Jul 16 10:50:41 ip-10-254-254-103 sudo[4925]: admin : TTY=unknown ; PWD=/home/admin ; USER=root ; COMMAND=/usr/bin/docker ps -q
Jul 16 10:50:41 ip-10-254-254-103 sudo[4925]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jul 16 10:50:41 ip-10-254-254-103 sudo[4925]: pam_unix(sudo:session): session closed for user root
Jul 16 10:50:44 ip-10-254-254-103 systemd[1]: arvados-docker-cleaner.service: Service RestartSec=10s expired, scheduling restart.
Jul 16 10:50:44 ip-10-254-254-103 systemd[1]: arvados-docker-cleaner.service: Scheduled restart job, restart counter is at 47.
Jul 16 10:50:44 ip-10-254-254-103 systemd[1]: Stopped Arvados Docker Image Cleaner.
Jul 16 10:50:44 ip-10-254-254-103 systemd[1]: Started Arvados Docker Image Cleaner.
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: Traceback (most recent call last):
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: File "/usr/bin/arvados-docker-cleaner", line 5, in <module>
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: from arvados_docker.cleaner import main
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: File "/usr/share/python3/dist/arvados-docker-cleaner/lib/python3.7/site-packages/arvados_docker/cleaner.py", line 21, in <module>
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: import docker
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: File "/usr/share/python3/dist/arvados-docker-cleaner/lib/python3.7/site-packages/docker/__init__.py", line 20, in <module>
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: from .client import Client, AutoVersionClient # flake8: noqa
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: File "/usr/share/python3/dist/arvados-docker-cleaner/lib/python3.7/site-packages/docker/client.py", line 25, in <module>
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: from . import api
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: File "/usr/share/python3/dist/arvados-docker-cleaner/lib/python3.7/site-packages/docker/api/__init__.py", line 2, in <module>
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: from .build import BuildApiMixin
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: File "/usr/share/python3/dist/arvados-docker-cleaner/lib/python3.7/site-packages/docker/api/build.py", line 9, in <module>
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: from .. import utils
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: File "/usr/share/python3/dist/arvados-docker-cleaner/lib/python3.7/site-packages/docker/utils/__init__.py", line 1, in <module>
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: from .utils import (
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: File "/usr/share/python3/dist/arvados-docker-cleaner/lib/python3.7/site-packages/docker/utils/utils.py", line 24, in <module>
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: from distutils.version import StrictVersion
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: File "/usr/share/python3/dist/arvados-docker-cleaner/lib/python3.7/distutils/__init__.py", line 44, in <module>
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: from distutils import dist, sysconfig # isort:skip
Jul 16 10:50:44 ip-10-254-254-103 sh[4933]: ImportError: cannot import name 'dist' from 'distutils' (/usr/share/python3/dist/arvados-docker-cleaner/lib/python3.7/distutils/__init__.py)
Jul 16 10:50:44 ip-10-254-254-103 systemd[1]: arvados-docker-cleaner.service: Main process exited, code=exited, status=1/FAILURE
Jul 16 10:50:44 ip-10-254-254-103 systemd[1]: arvados-docker-cleaner.service: Failed with result 'exit-code'.
</pre></li>
<li>docker does not start</li>
</ul> Arvados - Bug #16600: Compute nodes missing attached scratch spacehttps://dev.arvados.org/issues/16600?journal_id=854982020-07-16T11:28:28ZJavier Bértolijbertoli@curii.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-4 priority-default closed" href="/issues/16611">Bug #16611</a>: arvados-docker-cleaner package broken on Debian 10 and Ubuntu 18.04</i> added</li></ul> Arvados - Bug #16600: Compute nodes missing attached scratch spacehttps://dev.arvados.org/issues/16600?journal_id=855012020-07-16T13:37:57ZJavier Bértolijbertoli@curii.com
<ul></ul><p>After refactoring the image, it didn't show a scratch space (perhaps my fault) but talking about this with Lucas & Nico I found:</p>
<ul>
<li><a href="https://doc.arvados.org/v2.0/admin/config.html" class="external">Documentation reference</a><br /><pre>
IncludedScratch: 16GB
AddedScratch: 0
</pre></li>
</ul>
<ul>
<li>pirca/lugli (iirc, Peter added the node info in these clusters):<br /><pre>
Scratch: 200GB
AddedScratch: 200GB
</pre></li>
</ul>
<ul>
<li>su92l:<br /><pre>
Scratch: 100000000000
IncludedScratch: 100000000000
</pre></li>
</ul>
<p>so, which is the correct format to use?</p> Arvados - Bug #16600: Compute nodes missing attached scratch spacehttps://dev.arvados.org/issues/16600?journal_id=855022020-07-16T13:54:50ZTom Cleggtom@curii.com
<ul></ul><p>This is preferred for current versions of Arvados:</p>
<pre><code>AddedScratch: 0 # portion added separately<br />IncludedScratch: 100000000000 # portion included with node type</code></pre>
<pre><code>AddedScratch: 100000000000 # portion added separately<br />IncludedScratch: 0 # portion included with node type</code></pre>
<p>This has the same effect as the first example, and is useful if you're trying to make the same config file work with an older version of Arvados that doesn't pay attention to Included/Attached:</p>
<pre><code>Scratch: 100000000000 # total<br />IncludedScratch: 100000000000 # portion included with node type</code></pre> Arvados - Bug #16600: Compute nodes missing attached scratch spacehttps://dev.arvados.org/issues/16600?journal_id=855082020-07-16T19:22:36ZJavier Bértolijbertoli@curii.com
<ul><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li><li><strong>Assigned To</strong> changed from <i>Javier Bértoli</i> to <i>Peter Amstutz</i></li><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li><li><strong>Category</strong> set to <i>Deployment</i></li></ul><p>Should be fixed in commit 1d2304b@packer, branch compute-image-simplified-script and pushed via commit 9c740af@saltstack</p>
<p>On a running compute image:<br /><pre>
root@ip-10-255-254-215:~# df -h
Filesystem Size Used Avail Use% Mounted on
...
/dev/nvme1n1p1 7.7G 1.6G 5.8G 22% /
...
/dev/mapper/tmp 47G 81M 47G 1% /tmp
</pre></p>
<p>Testing with <a href="https://workbench.lugli.arvadosapi.com/container_requests/lugli-xvhdp-hs7bsikuma5ycyf#Status" class="external">this job</a>, waiting for it to finish to see if all is OK and close this issue.</p> Arvados - Bug #16600: Compute nodes missing attached scratch spacehttps://dev.arvados.org/issues/16600?journal_id=855122020-07-17T14:23:12ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul><p>This is working. The job failed because it ran out of RAM, but I can see from the stats that it used 10 GB of disk, when the root disk only have 5 GB available, so clearly it was using the scratch space.</p>