Feature #15087

[Workbench] Show number of queued containers on dashboard (instead of busy/idle nodes)

Added by Tom Clegg 5 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Workbench
Target version:
Start date:
06/14/2019
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0
Release relationship:
Auto

Description

Background: Currently Workbench1 has "busy/idle nodes" counters on the dashboard, but they stop working or disappear if the deprecated crunch1 services are not running. This issue suggests a low-cost way to maintain some semblance of an "is anything happening?" indicator on Workbench after migrating to crunch2.

Feature: On the Workbench1 dashboard, if crunch2 is enabled, show
  • the number of containers (visible to the current user) that have state=Locked, or state=Queued and priority>0
  • the number of containers (visible to the current user) that have Running
  • time since the earliest start time of any running container (visible to the current user)
  • how long the oldest visible queued container has been waiting
Benefits:
  • Easy to implement1 in Workbench in a way that works with all dispatch setups
  • Corresponds to reasonable user expectations ("it shouldn't take 2 hours to start a container")
Shortcomings:
  • "Lots of other users' containers are queued ahead of yours" looks identical to "nothing is running at all" (assuming user is not admin)
  • "Cluster is at capacity, with long-running containers" looks identical to "cluster is unable to run anything at all"
  • Doesn't take advantage of the metrics we are (or could be) tracking in arvados-dispatch-cloud, like recent queued-to-starting delays and # busy/idle/booting cloud instances.

1 Assuming we aren't too picky about the definition of "oldest" -- currently we don't record how long a container has been ready to run, only when it was created (since when it might have spent lots of time having priority=0) and when it was last modified (at which point it might have merely raised its priority long after it was ready to run)


Subtasks

Task #15325: Review 15087-wb-queued-containersResolvedPeter Amstutz


Related issues

Related to Arvados - Bug #15036: [Crunch2] Idle/busy node count not accurate if crunch1 not runningDuplicate

Related to Arvados - Bug #15014: [Workbench] Hide busy/idle nodes display when crunch1 is not activeIn Progress

Blocks Arvados - Story #15133: Remove crunch v1 (jobs api)Resolved08/08/2019

Associated revisions

Revision bf9803ee
Added by Peter Amstutz 3 months ago

Merge branch '15087-wb-queued-containers' closes #15087

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

History

#1 Updated by Tom Clegg 5 months ago

  • Description updated (diff)

#2 Updated by Tom Clegg 5 months ago

  • Description updated (diff)

#3 Updated by Tom Clegg 5 months ago

  • Target version changed from To Be Groomed to Arvados Future Sprints
  • Story points set to 1.0

#4 Updated by Peter Amstutz 5 months ago

  • Related to Story #15133: Remove crunch v1 (jobs api) added

#5 Updated by Tom Clegg 4 months ago

  • Related to Bug #15036: [Crunch2] Idle/busy node count not accurate if crunch1 not running added

#6 Updated by Tom Clegg 4 months ago

  • Related to Bug #15014: [Workbench] Hide busy/idle nodes display when crunch1 is not active added

#7 Updated by Tom Morris 4 months ago

  • Related to deleted (Story #15133: Remove crunch v1 (jobs api))

#8 Updated by Tom Morris 4 months ago

#9 Updated by Peter Amstutz 4 months ago

  • Target version changed from Arvados Future Sprints to 2019-06-19 Sprint
  • Assigned To set to Peter Amstutz

#10 Updated by Ward Vandewege 3 months ago

  • Release set to 22

#11 Updated by Tom Clegg 3 months ago

  • Description updated (diff)

#12 Updated by Peter Amstutz 3 months ago

Do we actually want the oldest queued container, or the age of container at the top of the queue? (which would be highest priority)?

#13 Updated by Peter Amstutz 3 months ago

15087-wb-queued-containers @ 6b9eafb0de63da57e7b1a3945e7d16823e1c25df

Adds a new panel displaying pending/running containers and age of oldest container and longest running container.

#14 Updated by Lucas Di Pentima 3 months ago

Tried it locally by running some simple jobs on arvbox, seems to work ok! Just one comment:

#16 Updated by Peter Amstutz 3 months ago

15087-wb-queued-containers @ 466ae87ff214d37c0765ee64845941adcbae8af4

Added links to the oldest / longest running container. Tweaked the layout. Re-running tests:

https://ci.curoverse.com/view/Developer/job/developer-run-tests/1315/

#17 Updated by Lucas Di Pentima 3 months ago

Latest updates LGTM, thanks.

#18 Updated by Peter Amstutz 3 months ago

  • % Done changed from 0 to 100
  • Status changed from New to Resolved

Also available in: Atom PDF