Story #2883

Interactive job log browser to sort/select task logs for diagnostic and profiling purposes

Added by Tom Clegg about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
06/11/2014
Due date:
% Done:

100%

Estimated time:
(Total: 16.00 h)
Story points:
0.5

Description

The "log" link for a completed job should go to a (new) "log" tab on the "view job" page. (Perhaps /jobs/{uuid}#Log just needs some generic javascript for "on page load, if location anchor is a tab, hit that tab")

The log tab should offer various views of the log data.
  • Overview
    • Obvious "N failed tasks" button (if any failed) that brings you to the "log messages" tab for all the failures
    • "123 tasks completed in 1 hour 12 minutes, using 3 nodes"
    • "3 hours 8 minutes of worker node usage, plus 28 minutes idle" ("idle" = node allocated but no tasks running there)
    • Start time, finish time
    • #complete, #incomplete, #failures. (complete+incomplete=total. E.g., a task can fail twice and then succeed on the third attempt.)
    • For each category (completed task IDs / incomplete task IDs / failures), show min/max runtime.
  • Time chart (lower priority)
    • Perhaps like the chrome debug panel's view of network activity (attached screen shot), but more compact.
    • Or, perhaps a vertical view with y=time(down), x=node -- i.e., idle nodes show up as empty areas.
  • Messages from selected task(s)
    • Just the setup/teardown/stderr messages relating to specified task(s)
    • Sort by task qsequence, then log timestamp
    • Drop-down to select "all tasks", or "all failures", "all successes", or a single task (show sequence, qsequence, and uuid in the drop-down items)
      • Showing 1000 task IDs in a drop-down won't work well. (Either limit the drop-down to the first hard-coded N tasks, or don't even offer the "specified task" option, or some other reasonable strategy.)

Parsing should probably happen on the client side. (First iteration should just deal with a static log file but future work will include updating this page as new logs arrive by websocket. And the raw log needs to show up on the client side anyway, so might as well do everything from there?)

Screenshot from 2014-06-04 12_57_18.png (6.6 KB) Screenshot from 2014-06-04 12_57_18.png Tom Clegg, 06/04/2014 12:57 PM
Screenshot from 2014-06-16 09_41_24.png (184 KB) Screenshot from 2014-06-16 09_41_24.png (showing sort-by-task secondary sort key bug) Tom Clegg, 06/16/2014 09:42 AM

Subtasks

Task #2911: Add buttons to sort log rowsResolvedPeter Amstutz

Task #3008: Review 2883-job-log-viewerResolvedPeter Amstutz

Task #2910: Split job log into columnsResolvedPeter Amstutz

Associated revisions

Revision 4f1085f3
Added by Peter Amstutz about 5 years ago

Merge branch 'origin-2883-job-log-viewer' closes #2883

Revision 2fd0eba4
Added by Peter Amstutz about 5 years ago

Merge branch 'origin-2883-job-log-viewer' closes #2883 refs #3027

Revision bfaad44c (diff)
Added by Tom Clegg about 5 years ago

Fix off-by-one-month in timestamp conversion. refs #2883

History

#1 Updated by Ward Vandewege about 5 years ago

  • Project changed from Umbrella Project to Arvados

#2 Updated by Peter Amstutz about 5 years ago

  • Assigned To set to Peter Amstutz

#4 Updated by Tom Clegg about 5 years ago

  • Description updated (diff)

#5 Updated by Brett Smith about 5 years ago

Reviewing a441b04

My only request is for consistent indentation in addToLogViewer. I know indentation in ERB is kind of nightmarish, and I don't think we should expect perfect consistency across a document. But I think a JavaScript function can be internally consistent—right now a dedent makes an if block look like it's closed before it actually is.

With that addressed, I think this is good to merge. Thanks.

#6 Updated by Anonymous about 5 years ago

  • Status changed from New to Resolved
  • % Done changed from 67 to 100

Applied in changeset arvados|commit:4f1085f353d44600643a8e9dd6b43a39131e7946.

#7 Updated by Peter Amstutz about 5 years ago

  • Status changed from Resolved to In Progress

#8 Updated by Tom Clegg about 5 years ago

  • Description updated (diff)

#10 Updated by Peter Amstutz about 5 years ago

<tomclegg_> A task ends up either complete or incomplete.
<tomclegg_> An attempt to complete a task ends in either success or failure.

#11 Updated by Peter Amstutz about 5 years ago

Rejected/not addressed tasks:

3 hours 8 minutes of worker node usage, plus 28 minutes idle" ("idle" = node allocated but no tasks running there)
For each category (completed task IDs / incomplete task IDs / failures), show min/max runtime.
Time chart (lower priority)

Please review that the remaining tasks are completed.

#12 Updated by Tom Clegg about 5 years ago

  • Target version changed from 2014-06-17 Curating and Crunch to 2014-07-16 Sprint

#13 Updated by Tom Clegg about 5 years ago

  • Story points changed from 2.0 to 0.5

#14 Updated by Brett Smith about 5 years ago

Reviewing 715a760

I don't see the “Drop-down to select… a single task (show sequence, qsequence, and uuid in the drop-down items)” or equivalent. Given some of the follow-up discussion in the description, that's not especially surprising, but am I missing something? Was this scoped out too?

After I sort by node or task, trying to go back to sort by time is ineffective; nothing in the log view changes.

#15 Updated by Peter Amstutz about 5 years ago

The drop-down to select a specific task was scoped out, the obvious UI of a drop-down menu seemed like it would get unwieldy when you have a thousand tasks. Also, it is not that hard to find a specific task by sorting by task.

On further consideration, an alternate UI that might work would be a text box where the user can type in a comma separated list or a range of task numbers that they want to filter on. I'll add this idea to story #3022.

Sorting is fixed in 5a4863dc I broke it during refactoring to batch-add rows to the list object, so it was assigning a row id of '1' to every row, whoops. Good catch!

#16 Updated by Brett Smith about 5 years ago

Peter Amstutz wrote:

The drop-down to select a specific task was scoped out, the obvious UI of a drop-down menu seemed like it would get unwieldy when you have a thousand tasks. Also, it is not that hard to find a specific task by sorting by task.

On further consideration, an alternate UI that might work would be a text box where the user can type in a comma separated list or a range of task numbers that they want to filter on. I'll add this idea to story #3022.

Sounds good to me.

The "Select All" button changes the second radio selection to "Show failed tasks." This surprised me, since the text means I don't expect the button to impose any limits on the view. I feel like it should either leave the radio selection alone, or select "Show all tasks."

#17 Updated by Peter Amstutz about 5 years ago

Brett Smith wrote:

The "Select All" button changes the second radio selection to "Show failed tasks." This surprised me, since the text means I don't expect the button to impose any limits on the view. I feel like it should either leave the radio selection alone, or select "Show all tasks."

Fixed in 143165af

#18 Updated by Brett Smith about 5 years ago

  • Target version changed from 2014-07-16 Sprint to 2014-06-17 Curating and Crunch

Thanks. I think this is good to merge.

#19 Updated by Brett Smith about 5 years ago

  • Target version changed from 2014-06-17 Curating and Crunch to 2014-07-16 Sprint

#20 Updated by Anonymous about 5 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:2fd0eba4e138bd9cadbdf03ea2ca37bbc3f87f24.

Also available in: Atom PDF