https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422022-02-03T18:53:25ZArvadosArvados - Bug #18713: [gpu] nvidia-persistenced.service fails when booted on a node without GPUshttps://dev.arvados.org/issues/18713?journal_id=1005912022-02-03T18:53:25ZWard Vandewegeward@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Bug #18713: [gpu] nvidia-persistenced.service fails when booted on a node without GPUshttps://dev.arvados.org/issues/18713?journal_id=1005922022-02-03T18:55:34ZWard Vandewegeward@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/100592/diff?detail_id=97209">diff</a>)</li></ul> Arvados - Bug #18713: [gpu] nvidia-persistenced.service fails when booted on a node without GPUshttps://dev.arvados.org/issues/18713?journal_id=1005962022-02-03T19:00:43ZWard Vandewegeward@curii.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-6 status-3 priority-4 priority-default closed behind-schedule" href="/issues/15957">Idea #15957</a>: GPU support</i> added</li></ul> Arvados - Bug #18713: [gpu] nvidia-persistenced.service fails when booted on a node without GPUshttps://dev.arvados.org/issues/18713?journal_id=1005992022-02-03T19:04:10ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>It is probably fine to have it disabled, because crunch-run does some GPU driver initialization on its own already.</p> Arvados - Bug #18713: [gpu] nvidia-persistenced.service fails when booted on a node without GPUshttps://dev.arvados.org/issues/18713?journal_id=1006002022-02-03T20:05:20ZWard Vandewegeward@curii.com
<ul></ul><p>I updated the script that builds the compute node image to disable the nvidia-persistenced service in <a class="changeset" title="18713: disable the nvidia-persistenced service in the compute image. Arvados-DCO-1.1-Signed-off-..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/ac52d7ee23b39779712c702945eb9db7e17dd814">ac52d7ee23b39779712c702945eb9db7e17dd814</a> on branch 18713-nvidia-persistenced. Ready for review.</p>
<p>I then built a compute image for Tordo from this commit, and that made Tordo work again, cf. <a class="external" href="https://workbench.tordo.arvadosapi.com/container_requests/tordo-xvhdp-x824fng56ciyvoo">https://workbench.tordo.arvadosapi.com/container_requests/tordo-xvhdp-x824fng56ciyvoo</a></p> Arvados - Bug #18713: [gpu] nvidia-persistenced.service fails when booted on a node without GPUshttps://dev.arvados.org/issues/18713?journal_id=1006012022-02-03T20:09:30ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Ward Vandewege wrote:</p>
<blockquote>
<p>I updated the script that builds the compute node image to disable the nvidia-persistenced service in <a class="changeset" title="18713: disable the nvidia-persistenced service in the compute image. Arvados-DCO-1.1-Signed-off-..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/ac52d7ee23b39779712c702945eb9db7e17dd814">ac52d7ee23b39779712c702945eb9db7e17dd814</a> on branch 18713-nvidia-persistenced. Ready for review.</p>
<p>I then built a compute image for Tordo from this commit, and that made Tordo work again, cf. <a class="external" href="https://workbench.tordo.arvadosapi.com/container_requests/tordo-xvhdp-x824fng56ciyvoo">https://workbench.tordo.arvadosapi.com/container_requests/tordo-xvhdp-x824fng56ciyvoo</a></p>
</blockquote>
<p>In the comment I would include a note that this doesn't matter, because crunch-run does its own basic CUDA initialization.</p>
<p>We should also confirm that in fact GPUs still work.</p> Arvados - Bug #18713: [gpu] nvidia-persistenced.service fails when booted on a node without GPUshttps://dev.arvados.org/issues/18713?journal_id=1006022022-02-03T20:25:02ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Target version</strong> deleted (<del><i>2022-02-16 sprint</i></del>)</li><li><strong>Assigned To</strong> deleted (<del><i>Ward Vandewege</i></del>)</li><li><strong>File</strong> <a href="/attachments/2962">tf-mnist-tutorial-gpu.cwl</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/2962/tf-mnist-tutorial-gpu.cwl">tf-mnist-tutorial-gpu.cwl</a> added</li><li><strong>File</strong> <a href="/attachments/2961">tf-mnist-tutorial.py</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/2961/tf-mnist-tutorial.py">tf-mnist-tutorial.py</a> added</li></ul> Arvados - Bug #18713: [gpu] nvidia-persistenced.service fails when booted on a node without GPUshttps://dev.arvados.org/issues/18713?journal_id=1006032022-02-03T20:25:15ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Target version</strong> set to <i>2022-02-16 sprint</i></li><li><strong>Assigned To</strong> set to <i>Ward Vandewege</i></li></ul> Arvados - Bug #18713: [gpu] nvidia-persistenced.service fails when booted on a node without GPUshttps://dev.arvados.org/issues/18713?journal_id=1006042022-02-03T20:43:15ZWard Vandewegeward@curii.com
<ul></ul><p>Peter Amstutz wrote:</p>
<blockquote>
<p>Ward Vandewege wrote:</p>
<blockquote>
<p>I updated the script that builds the compute node image to disable the nvidia-persistenced service in <a class="changeset" title="18713: disable the nvidia-persistenced service in the compute image. Arvados-DCO-1.1-Signed-off-..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/ac52d7ee23b39779712c702945eb9db7e17dd814">ac52d7ee23b39779712c702945eb9db7e17dd814</a> on branch 18713-nvidia-persistenced. Ready for review.</p>
<p>I then built a compute image for Tordo from this commit, and that made Tordo work again, cf. <a class="external" href="https://workbench.tordo.arvadosapi.com/container_requests/tordo-xvhdp-x824fng56ciyvoo">https://workbench.tordo.arvadosapi.com/container_requests/tordo-xvhdp-x824fng56ciyvoo</a></p>
</blockquote>
<p>In the comment I would include a note that this doesn't matter, because crunch-run does its own basic CUDA initialization.</p>
</blockquote>
<p>Sure, updated in <a class="changeset" title="18713: expand comment. Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <ward@curii.com>" href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/12c1c51313e897abd0e9d1801b42bc8dc3b8d1d9">12c1c51313e897abd0e9d1801b42bc8dc3b8d1d9</a> on branch 18713-nvidia-persistenced</p>
<blockquote>
<p>We should also confirm that in fact GPUs still work.</p>
</blockquote>
<p>Thanks for the sample workflow, it completed at <a href="https://arvadosapi.com/tordo-xvhdp-h7cu2u53dtjf3ag">tordo-xvhdp-h7cu2u53dtjf3ag</a> (without reuse!).</p> Arvados - Bug #18713: [gpu] nvidia-persistenced.service fails when booted on a node without GPUshttps://dev.arvados.org/issues/18713?journal_id=1006092022-02-04T16:30:03ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>LGTM</p> Arvados - Bug #18713: [gpu] nvidia-persistenced.service fails when booted on a node without GPUshttps://dev.arvados.org/issues/18713?journal_id=1006202022-02-04T18:49:19ZWard Vandewegeward@curii.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul><p>Applied in changeset arvados-private:commit:arvados|8685251f024c4519c5f61413b9dcb66a86e3efd6.</p> Arvados - Bug #18713: [gpu] nvidia-persistenced.service fails when booted on a node without GPUshttps://dev.arvados.org/issues/18713?journal_id=1021432022-03-24T19:28:44ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Release</strong> set to <i>46</i></li></ul>