Idea #9318
closed[Workbench] Update Dashboard to show processes
Added by Brett Smith over 8 years ago. Updated over 8 years ago.
Description
We want to call work units without parents "processes" in the UI. These are deemed to be all pipeline instances, and container requests with requested_by=null. (Jobs without parents would ordinarily meet this definition too, but they're difficult to query and uncommon in practice, so we're going to ignore them.)
"Processes" is the term the CWL group is using, and it's reasonably neutral about it being a single-step thing, whether or not it's working toward an end goal vs. a running service, etc.
Update the Workbench Dashboard to show all processes, not just pipeline instances. The basic UI structure of the Dashboard should remain unchanged. There should be a pane of currently running processes, and a pane below that of recently completed processes, with the same basic information as displayed in the current Dashboard.
In order to show container information, you have to get the container record that's fulfilling a container request. We are still working to determine how that's going to happen.
"Run a pipeline" button can remain unchanged, because we don't have a better alternative right now.
Leave the "All pipeline instances" and "All jobs" button alone. They'll be dealt with in a separate story (see related).
Updated by Brett Smith over 8 years ago
- Subject changed from [Workbench] Update Dashboard to show root work units to [Workbench] Update Dashboard to show processes
- Description updated (diff)
Updated by Radhika Chippada over 8 years ago
- Status changed from New to In Progress
- Assigned To set to Radhika Chippada
- Target version set to 2016-06-08 sprint
Updated by Radhika Chippada over 8 years ago
Peter said we should also update the "Details" collapsible under "Busy nodes" in "Compute and job status" pane in dashboard. He said, this should fetch Queued and Locked containers and display them as well.
Updated by Radhika Chippada over 8 years ago
Branch 9318-dashboard-uses-work-units implements the dashboard updates for "Active processes," "Recently finished processes" and "Compute and job status" panes.
It seems like we might want to add a queue_size method on the api server side for container as well, similar to job, to ensure the note "Note: some items in the queue are not visible to you" reflects the size more accurately.
Updated by Brett Smith over 8 years ago
Radhika Chippada wrote:
Branch 9318-dashboard-uses-work-units implements the dashboard updates for "Active processes," "Recently finished processes" and "Compute and job status" panes.
+1
It seems like we might want to add a queue_size method on the api server side for container as well, similar to job, to ensure the note "Note: some items in the queue are not visible to you" reflects the size more accurately.
We don't accurately report a job's queue_size anymore, because the performance was bad. I don't think we can do this until we figure out a performant way to report this information. I suspect we'll also want to do something more nuanced, because in Crunch v2 it's possible to have multiple dispatchers running for different purposes, making it even more difficult to predict when a container is going to run based just on the list of queued container requests.
Updated by Radhika Chippada over 8 years ago
Brett said:
We don't accurately report a job's queue_size anymore, because the performance was bad. I don't think we can do this until we figure out a performant way to report this information. I suspect we'll also want to do something more nuanced, because in Crunch v2 it's possible to have multiple dispatchers running for different purposes, making it even more difficult to predict when a container is going to run based just on the list of queued container requests.
In the meantime, can I just remove the code that is showing this text then?
Updated by Brett Smith over 8 years ago
Radhika Chippada wrote:
Brett said:
We don't accurately report a job's queue_size anymore, because the performance was bad. I don't think we can do this until we figure out a performant way to report this information. I suspect we'll also want to do something more nuanced, because in Crunch v2 it's possible to have multiple dispatchers running for different purposes, making it even more difficult to predict when a container is going to run based just on the list of queued container requests.
In the meantime, can I just remove the code that is showing this text then?
If you're in that code anyway, that sounds good, yeah.
Updated by Radhika Chippada over 8 years ago
Branch 9318-dashboard-uses-work-units at 1650c9ee is ready for review.
- One observation: The "Outputs" list displayed in the new version is different. If you are looking at 4xphq-d1hrv-ata5nivg2f9zheg in the dashboard, the new code will list 3 outputs whereas you will see 2 in staging. This is happening because the previous code is looking for "output_uuid" from component (rather than the :output on the job). However, this is not populated for the first job (hasher) and hence there are only 2 outputs listed. So, this issue is resolved with the new code. Unfortunately, I am unable to display a friendly name in the new version.
- Removed the queue_size display warning (note 8)
- Please review to make sure the logic around obtaining a containers children is correct.
- Please review to ensure the logic and sorting and ordering of running_processes and finished_processes is correct.
- Some UI suggestions about the "Compute and job status"
- I think the panel title here should be "Compute nodes and Queued processes" or something like that?
- I think it would easier on the eye if the title bar for the Queue table (the one that says Process Submitted Queued Priority is at the top of the table rather than below the queued content rows
- I also think we should left justify the Details collapse button. Right now it seems like it belongs to Busy nodes while in fact the Details table is listing Queued jobs.
Thanks.
Updated by Tom Clegg over 8 years ago
The running_processes and finished_processes methods have a lot of duplicated code. Perhaps the common parts could be moved to a separate function like
def get_processes pipeline_filters, job_filters, lim
...
end
But more importantly, I don't think the "get container requests and containers" code here will hold up over time. "Get all top-level container requests" will retrieve thousands of records for old requests, even if none are currently queued/running. The basic problem seems to be that "get the last 8 container requests whose container is in a given state". But I don't think that's necessarily what we want anyway. What if we consolidate these two panes into "recent processes", and rely on the running/done/failed visual cues to classify them instead? That way, over time the processes will appear to change state (by changing color etc) rather than change type (by moving from one box to another)... which might even make more sense to users. IOW, we could show "what's happening with the last N processes you submitted" without expensive queries.
I agree about the "compute and job status" UI: it always looks confusing to me. I think the basic problem there is that it's trying to help you predict when your queued jobs will start by giving you a bunch of information, but it's totally unclear how you're supposed to get from "a bunch of information" to a prediction, or even that that's the intent.
I suspect it would be more useful to show just #busy and #idle nodes (with a link to /nodes
page, maybe only for admins?) as a "cluster status" area. Instead of emphasizing distinctions like "queued things vs. not-queued things" we could emphasize the relationships between processes, e.g., "this pipeline is not done because it has 3 jobs waiting for resources, and 1 job still running".
Here's what our UI designer suggested:
https://dev.arvados.org/attachments/download/349/dashboard.jpg
Logic for getting children for a container -- looks correct.
nit: suggest using the work_unit method here:
- items << ContainerWorkUnit.new(c, crs[c.uuid])
+ items << c.work_unit(crs[c.uuid])
I think ContainerWorkUnit#outputs should be just "return [get(:output)]", not the outputs of the children. Thanks to pipeline instances, the work unit interface has to accommodate any number of outputs for a single process, so we have to call "outputs" everywhere when dealing with a work unit (i.e., delete the "output" method!) -- but a container provides only one output, regardless of what its "child" processes do.
I think the previous behavior was correct wrt output_uuid. Generally a pipeline has some components that produce "pipeline outputs" and some that produce "intermediate data". The "output_uuid" field indicates we're dealing with a "pipeline output". PipelineInstanceWorkUnit#outputs should not include intermediate data, so it should just look for output_uuid.
Updated by Radhika Chippada over 8 years ago
- Target version changed from 2016-06-08 sprint to 2016-06-22 sprint
Updated by Radhika Chippada over 8 years ago
But more importantly, ... What if we consolidate these two panes into "recent processes"
Updated accordingly.
I agree about the "compute and job status" UI: it always looks confusing to me ... it would be more useful to show just #busy and #idle nodes (with a link to /nodes page, maybe only for admins?)
Removed the Queued job display and added an "All nodes" when an admin
Here's what our UI designer suggested:
Not sure if you had any specific thoughts about it. Please see the latest and let me know if you have any other thoughts.
Logic for getting children for a container -- looks correct.
Ok
nit: suggest using the work_unit method here: items << ContainerWorkUnit.new(c, crs[c.uuid])
Done
I think ContainerWorkUnit#outputs should be just "return [get(:output)]", not the outputs of the children.
Done
Thanks to pipeline instances, the work unit interface has to accommodate any number of outputs for a single process, so we have to call "outputs" everywhere when dealing with a work unit (i.e., delete the "output" method!) -- but a container provides only one output, regardless of what its "child" processes do.
Deleted the :output method and used :outputs everywhere. This needed some reworking of the component detail partial etc to be able to display n number of outputs. Refactored the "Outputs collapsable" code
I think the previous behavior was correct wrt output_uuid. Generally a pipeline has some components that produce "pipeline outputs" and some that produce "intermediate data". The "output_uuid" field indicates we're dealing with a "pipeline output". PipelineInstanceWorkUnit#outputs should not include intermediate data, so it should just look for output_uuid.
Updated the :outputs logic for pipeline instances accordingly. Also, for a job_work_unit, I used :outputs from children if any, otherwise the job's :output
Updated tests to account for dashboard updates.
Updated by Tom Clegg over 8 years ago
I think JobWorkUnit#outputs should return only the job's own output, not its children's outputs, with the same reasoning as container outputs in note-11...?
OO habits -- t.andand.class == String
should be t.is_a? String
Instead of counting nodes with active_nodes += 1
, how about just testing if !nodes.any?
below?
Everything else LGTM, thanks.
Updated by Tom Clegg over 8 years ago
Oh, one more thing -- can we change the title on the dashboard from "Recent processes" to "Recent pipelines and processes"? Might be an easier transition for users, in case any of them read the title.
Updated by Radhika Chippada over 8 years ago
I think JobWorkUnit#outputs should return only the job's own output, not its children's outputs, with the same reasoning as container outputs in note-11...?
Updated
OO habits -- t.andand.class == String should be t.is_a? String
Thanks. I just can't seem to remember it sometimes :)
Instead of counting nodes with active_nodes += 1, how about just testing if !nodes.any? below?
- it is not nodes.any? that is the problem. We are looking for node state and last_ping_at. I reshuffled the lookup and providing the Details link only when there are any active nodes
- I also noticed that in this panel, we are displaying "Queued jobs" which was displaying "Job.queue_size". This is incomplete since it would not include queued container count. Per note 6 and our IRC conversation, it appears that this is not needed. Hence, I removed this column from this panel display
Oh, one more thing -- can we change the title on the dashboard from "Recent processes" to "Recent pipelines and processes"? Might be an easier transition for users, in case any of them read the title.
Done
Updated by Radhika Chippada over 8 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|commit:1c36703db22a4695f0a2aebaa3ffbd5d8d64997f.
Updated by Radhika Chippada over 8 years ago
Peter observed that the dashboard is incorrectly displaying Container objects instead of ContainerRequest objects. Since Container and ContainerRequest objects are sharing ContainerWorkUnit as implemented in #9372, this correction is being made in that branch.