Arvados: Issueshttps://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422024-03-28T17:14:24ZArvados
Redmine Arvados - Idea #21634 (New): Go through WB2 and ensure that loading indicators behave consistentl...https://dev.arvados.org/issues/216342024-03-28T17:14:24ZPeter Amstutzpeter.amstutz@curii.comArvados - Task #21625 (New): Review at engineering meetinghttps://dev.arvados.org/issues/216252024-03-27T16:08:13ZPeter Amstutzpeter.amstutz@curii.comArvados - Feature #21623 (New): Display filters active on a data table as chips on the column headerhttps://dev.arvados.org/issues/216232024-03-27T15:55:43ZPeter Amstutzpeter.amstutz@curii.comArvados - Bug #21622 (New): Mail delivery failure should not cause API calls to failhttps://dev.arvados.org/issues/216222024-03-27T15:52:03ZPeter Amstutzpeter.amstutz@curii.comArvados Workbench 2 - Feature #21621 (New): Inputs/Outputs panel allows JSON data to be copied to...https://dev.arvados.org/issues/216212024-03-27T14:19:45ZLucas Di Pentimalucas.dipentima@curii.com
<p>Sometimes it's necessary to get that data, specially on inputs when you need to re-run something from the CLI. Having a copy-to-clipboard button would make it trivial.</p> Arvados Workbench 2 - Idea #21615 (New): Details Panel should show details for every type of reso...https://dev.arvados.org/issues/216152024-03-22T13:25:49ZLisa Knox
<p>The details panel currently only shows details when viewing a project, workflow, process, or collection and shows empty everywhere else. This includes on the Shell Access view, Instance Types, and a few other places where there is really nothing to display, but the details panel will remain open if it was open when navigating to that view.</p>
<p>The details panel should display something useful if the current view allows for it, or should close automatically when navigating to a view where it is irrelevant. Having it display an empty element with the word "Projects" at the top in all non-specified cases is not helpful.</p> Arvados - Feature #21614 (New): User can open things in new tab with middle-click/Ctrl+clickhttps://dev.arvados.org/issues/216142024-03-21T14:54:40ZBrett Smithbrett.smith@curii.com
<p>Please please please, this would help a lot for situations where Workbench is struggling under the load of a large item and I need to get at adjacent items.</p>
<ul>
<li>Breadcrumbs parent links</li>
<li>The "Open collection" button of a process' logs pane</li>
<li>Subprocesses</li>
<li>Files in a collection listing</li>
</ul> Arvados - Bug #21612 (New): a-c-r with --debug can try to log entire input/output objects, which ...https://dev.arvados.org/issues/216122024-03-20T20:22:22ZBrett Smithbrett.smith@curii.com
<p>User got this error while running aws-s3-bulk-download.cwl with >6K input URLs, using <code>a-c-r --submit --debug</code>.</p>
<p>I don't think it actually interfered with the workflow's run at all, but it clogs the logs and looks scary.</p>
<p>IMO a-c-r (along with the rest of our code) should not try to log data that can be arbitrarily large.</p>
<p>Three instances where this came up:</p>
<pre>
--- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.7/logging/__init__.py", line 1037, in emit
stream.write(msg + self.terminator)
BlockingIOError: [Errno 11] write could not complete without blocking
Call stack:
File "/usr/bin/arvados-cwl-runner", line 8, in <module>
sys.exit(main())
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/arvados_cwl/__init__.py", line 440, in main
input_required=not workflow_op)
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/main.py", line 1302, in main
tool, initialized_job_order_object, runtimeContext, logger=_logger
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/arvados_cwl/executor.py", line 874, in arv_executor
self.start_run(runnable, runtimeContext)
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/arvados_cwl/executor.py", line 248, in start_run
self.workflow_eval_lock, self.stop_polling)
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/task_queue.py", line 85, in add
task()
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/command_line_tool.py", line 202, in run
self.output_callback(cast(Optional[CWLObjectType], ev), "success")
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/arvados_cwl/executor.py", line 321, in wrapped_callback
cb(obj, st)
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow.py", line 429, in receive_output
output_callback(output, processStatus)
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow_job.py", line 564, in receive_output
_logger.debug("[%s] produced output %s", step.name, json_dumps(jobout, indent=4))
</pre>
<pre>--- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.7/logging/__init__.py", line 1037, in emit
stream.write(msg + self.terminator)
BlockingIOError: [Errno 11] write could not complete without blocking
Call stack:
File "/usr/bin/arvados-cwl-runner", line 8, in <module>
sys.exit(main())
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/arvados_cwl/__init__.py", line 440, in main
input_required=not workflow_op)
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/main.py", line 1302, in main
tool, initialized_job_order_object, runtimeContext, logger=_logger
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/arvados_cwl/executor.py", line 863, in arv_executor
for runnable in jobiter:
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow.py", line 175, in job
yield from job.job(builder.job, output_callbacks, runtimeContext)
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow_job.py", line 821, in job
for newjob in step.iterable:
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow_job.py", line 751, in try_make_job
yield from jobs
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow_job.py", line 77, in job
yield from self.step.job(joborder, output_callback, runtimeContext)
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow.py", line 462, in job
runtimeContext,
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow.py", line 175, in job
yield from job.job(builder.job, output_callbacks, runtimeContext)
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow_job.py", line 821, in job
for newjob in step.iterable:
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow_job.py", line 735, in try_make_job
json_dumps(inputobj, indent=4),
</pre>
<pre>--- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.7/logging/__init__.py", line 1037, in emit
stream.write(msg + self.terminator)
BlockingIOError: [Errno 11] write could not complete without blocking
Call stack:
File "/usr/bin/arvados-cwl-runner", line 8, in <module>
sys.exit(main())
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/arvados_cwl/__init__.py", line 440, in main
input_required=not workflow_op)
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/main.py", line 1302, in main
tool, initialized_job_order_object, runtimeContext, logger=_logger
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/arvados_cwl/executor.py", line 874, in arv_executor
self.start_run(runnable, runtimeContext)
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/arvados_cwl/executor.py", line 248, in start_run
self.workflow_eval_lock, self.stop_polling)
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/task_queue.py", line 85, in add
task()
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/command_line_tool.py", line 202, in run
self.output_callback(cast(Optional[CWLObjectType], ev), "success")
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/arvados_cwl/executor.py", line 321, in wrapped_callback
cb(obj, st)
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow.py", line 429, in receive_output
output_callback(output, processStatus)
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow_job.py", line 582, in receive_output
self.do_output_callback(final_output_callback)
File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.7/site-packages/cwltool/workflow_job.py", line 541, in do_output_callback
_logger.debug("[%s] outputs %s", self.name, json_dumps(wo, indent=4))
</pre> Arvados - Feature #21611 (New): crunch-run updates copy of container.json in log collection when ...https://dev.arvados.org/issues/216112024-03-20T16:08:35ZPeter Amstutzpeter.amstutz@curii.comArvados - Idea #21610 (New): Evaluate the feasibility of exporting a prometheus-compatible API fo...https://dev.arvados.org/issues/216102024-03-20T15:55:49ZPeter Amstutzpeter.amstutz@curii.com
<p>Useful links</p>
<p><a class="external" href="https://prometheus.io/docs/prometheus/latest/querying/api/">https://prometheus.io/docs/prometheus/latest/querying/api/</a></p>
<p><a class="external" href="https://www.npmjs.com/package/chartjs-plugin-datasource-prometheus">https://www.npmjs.com/package/chartjs-plugin-datasource-prometheus</a></p> Arvados - Feature #21609 (New): Display how to use container shell somewhere on process page when...https://dev.arvados.org/issues/216092024-03-20T15:20:29ZPeter Amstutzpeter.amstutz@curii.comArvados - Bug #21607 (New): arv-mount memory usage grows over timehttps://dev.arvados.org/issues/216072024-03-19T13:15:14ZPeter Amstutzpeter.amstutz@curii.com
<p>arv-mount releases metadata (collection and project listings) for files and directories that haven't been used recently to prevent unlimited memory growth.</p>
<p>Ideally it should reach a ceiling and then level off as new stuff replaces the memory used by old stuff. However, in the current version, memory usage still creeps up.</p>
<p>arv-mount would benefit from additional debugging and memory profiling to determine if there are objects being held past their intended lifetime.</p> Arvados - Task #21605 (In Progress): Reviewhttps://dev.arvados.org/issues/216052024-03-18T15:35:51ZPeter Amstutzpeter.amstutz@curii.comArvados - Idea #21595 (New): 'shared' should use usernames, not full nameshttps://dev.arvados.org/issues/215952024-03-14T14:52:12ZPeter Amstutzpeter.amstutz@curii.com
<p>The 'shared' directory uses full names. These have a couple of problems:</p>
<ul>
<li>Always contain spaces and may have other characters that make it awkward with Unix tooling</li>
<li>Not unique. For example pirca has multiple accounts with full_name "Peter Amstutz". FUSE ends up picking one account and the other accounts just can't be accessed through FUSE.</li>
</ul>
<p>It should use 'username' instead, which is unique on a given Arvados instance.</p>
<p>I think the only question is whether it is worth the effort to maintain backwards compatibility (by making the 'username' behavior a new option) or we just change the existing behavior in place.</p>
<p>I suppose one way to do it would be to change to using usernames by default but add an option that restores the previous behavior of using full names.</p> Arvados - Idea #21581 (New): Crunch saves compute node journals to collections readable only by a...https://dev.arvados.org/issues/215812024-03-12T17:57:35ZBrett Smithbrett.smith@curii.com
<p>Problem:</p>
<ul>
<li>Compute nodes and tasks can fail for any number of reasons. You basically need a full system log to diagnose some problems.</li>
<li>We can't just give users the system log, there's too much sensitive information in there and it's practically impossible to reliably know what needs to be redacted.</li>
<li>And even if it wasn't, regular users mostly can't act on this information, and it may nede to be subject to different retention policies than regular container logs, etc.</li>
</ul>
<p>Big idea: Crunch occasionally saves the system journal (and other logs?) to a collection that should only be readable by Arvados administrators. Administrators can go back and review these logs to diagnose problems.</p>
<p>Implementation idea:</p>
<ul>
<li>crunch-run gains a subcommand to upload the journal to a collection. When you run it, it:
<ul>
<li>Runs <code>journalctl --sync</code> to make sure all entries so far are written to disk
<ul>
<li>TBD: Does this need sudo?</li>
<li>The rest of the work should probably continue even if this command fails. Even if it means we can't get all the logs, we might as well capture what we can.</li>
</ul>
</li>
<li>Creates a collection from the recursive contents of <code>/var/log/journal</code>
<ul>
<li>TBD: Any other log files we should throw in?</li>
<li>The collection should have a property that indicates which container(s) these system logs correspond to. This should be a system property with the <code>arv:</code> prefix that's documented.</li>
<li>The collection should have a <code>trashed_at</code> time in the future. TBD: Should this time be configurable? If it's set to zero, should this functionality be disabled?</li>
</ul>
</li>
</ul>
</li>
<li>crunch-dispatch calls this crunch-run command when specific events occur
<ul>
<li>When a container finishes</li>
<li>When the cloud dispatcher decides to terminate a node</li>
</ul></li>
</ul>
<p>Setup that needs to happen:</p>
<ul>
<li>There needs to be a dedicated Unix account on the compute nodes to run this
<ul>
<li>It should be a member of the <code>systemd-journal</code> group to read the journal</li>
<li>It may need sudo permission to run <code>journalctl --sync</code> passwordless</li>
</ul>
</li>
<li>Permissions can be limited on the Arvados side
<ul>
<li>This can't use the same API token as the container because the permissions are completely different</li>
<li>The token could be scoped pretty narrowly: just permission to <code>PUT</code> a collection and <code>GET</code> the owning project and similar related resources</li>
<li>It seems like we either need (a) a dedicated user account that just has all these journal collections in its home project, or (b) a configurable UUID of a project where all these journal collections are saved</li>
</ul></li>
</ul>
<p>Background:</p>
<ul>
<li>Read a saved journal with <code>journalctl --root=PATH</code></li>
<li>We considered setting something up that automatically does this when the node goes down (a service that's <code>WantedBy=shutdown.target</code>?) It has the advantage that it could work even if crunch-dispatch has trouble coordinating with the compute node, but:
<ul>
<li>The upload might take a while and we're not sure if systemd and/or the cloud provider would be patient enough to let it run</li>
<li>It would require us to permanently store credentials somewhere, which isn't insurmountable but something we generally avoid doing</li>
</ul></li>
</ul>