Bug #3782

[Workbench] Browser prompts about long-running JavaScript when trying to view a large Job log in viewer

Added by Brett Smith almost 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Tim Pierce
Category:
Workbench
Target version:
Start date:
09/02/2014
Due date:
% Done:

100%

Estimated time:
(Total: 0.50 h)
Story points:
0.5

Description

I tried to view a large Job log (qr1hi-8i9sb-zyva8qlbfmjsnse—it takes 574 pages in Workbench). While it was loading, Firefox repeatedly threw up its prompt about long-running JavaScript:

A script on this page may be busy, or it may have stopped responding. You can stop the script now, or you can continue to see if the script will complete.

The script does eventually finish if you let it continue to completion, but this requires selecting "Continue" many times, or selecting "Don't ask me again" then "Continue." I think we should refactor the viewer JavaScript so users don't have to figure this out.


Subtasks

Task #4034: Review 3782-stub-file-contentResolvedTom Clegg

Task #4017: Use ?size= URL parameter to limit job log dataResolvedTim Pierce

Associated revisions

Revision 359d96e3
Added by Tom Clegg almost 5 years ago

Merge branch '3782-stub-file-content' refs #3782

Revision a8bd120b
Added by Tim Pierce almost 5 years ago

Merge branch '3782-large-job-logs'

Closes #3782.

Revision c6b14938
Added by Tom Clegg almost 5 years ago

Merge branch '3782-stub-io-pipe' refs #3782

Revision 8cfc0583 (diff)
Added by Tom Clegg almost 5 years ago

Resolve race condition: wait for response before checking mock expectations. refs #3782

History

#1 Updated by Brett Smith almost 5 years ago

  • Subject changed from [Workbench] Browser warns about long-running JavaScript when trying to view a large log to [Workbench] Browser prompts about long-running JavaScript when trying to view a large Job log

#2 Updated by Brett Smith almost 5 years ago

  • Category set to Workbench

#3 Updated by Ward Vandewege almost 5 years ago

  • Target version set to Arvados Future Sprints

#4 Updated by Peter Amstutz almost 5 years ago

  • Subject changed from [Workbench] Browser prompts about long-running JavaScript when trying to view a large Job log to [Workbench] Browser prompts about long-running JavaScript when trying to view a large Job log in viewer

#5 Updated by Ward Vandewege almost 5 years ago

  • Story points set to 0.5

#6 Updated by Peter Amstutz almost 5 years ago

Desire quick workaround by truncating log to some reasonable size.

#7 Updated by Tim Pierce almost 5 years ago

This bug is also hitting Bryan on job https://workbench.qr1hi.arvadosapi.com/jobs/qr1hi-8i9sb-wxtrxrcq73ymexm which has 957 log entries in the database. In this case it actually crashes a Chrome browser tab, so fixing it is kind of urgent.

#8 Updated by Tim Pierce almost 5 years ago

3782-large-job-logs includes a workaround for this (display only up to 1MB of post-pipeline logs in the log viewer pane). No test, but I can try to rig up an integration test that uses KEEP_LOCAL_STORE to deliver a log.

This workaround will temporarily relieve the problem, but it doesn't seem like a good general solution. Ideally the viewer will deliver the last 1MB of logs to the user. It's not clear to me whether there's any way for a user to effectively review hundreds of megabytes of log data, so it may not be worth putting more effort into a solution than that.

#9 Updated by Tim Pierce almost 5 years ago

  • Target version changed from Arvados Future Sprints to 2014-10-08 sprint

#10 Updated by Tim Pierce almost 5 years ago

  • Assigned To set to Tim Pierce
  • Status changed from New to In Progress

#11 Updated by Tim Pierce almost 5 years ago

Per code review: Collection.show_file is used to fetch any kind of file from a collection, so the default should continue to be to fetch the entire file if possible.

So instead in 070ca0d: app/views/jobs/_show_log.html adds a ?size=1000000 parameter when fetching the log for the job, and Collection.show_file stops reading from arv-get after this limit is reached.

#12 Updated by Peter Amstutz almost 5 years ago

Tested by setting the log truncate down to 1000 bytes.

One problem: since the log viewer is parsing the log lines, it ignores things that don't look like crunch log lines. So the text "Log truncated ..." does not show up in the log viewer. Solution: prefix the truncated message with the necessary columns to pass muster with the log viewer regex.

#13 Updated by Tim Pierce almost 5 years ago

Peter Amstutz wrote:

Tested by setting the log truncate down to 1000 bytes.

One problem: since the log viewer is parsing the log lines, it ignores things that don't look like crunch log lines. So the text "Log truncated ..." does not show up in the log viewer. Solution: prefix the truncated message with the necessary columns to pass muster with the log viewer regex.

The real problem here is that if the log is truncated in the middle of a line, so that the log viewer regex won't parse it, the log viewer gives up and doesn't process any more of it. So nothing we do will help at that point.

We can make the truncated log produce a parsable line if it was terminated on a line boundary, so I've done that at 641a04f. Is there a better solution I'm missing?

#14 Updated by Tom Clegg almost 5 years ago

I'd like to see the log-viewer-specific code stay in the log viewer rather than the "download file" handler. For example:
  • Remove "append some log-format text" code from CollectionsController.
  • In the log viewer, if (bytes_received < known_log_file_size), add a "log truncated" message in the appropriate format.
  • If it's really inconvenient to propagate the known_log_file_size information to the JavaScript side, if (bytes_received == max_bytes_we_always_ask_for) will eventually produce a false positive but most of the time will be correct.
Also, I think using the "size" parameter to mean "truncate" is misleading. It looks too much like "I think the file size is N" to me.
  • I suggest we use something closer to (a subset of) the HTTP Range format, like bytes=0-999999/*, even if the byte offset is required to be 0 and the total size is required to be "*" (just to limit scope creep). This makes it more obvious what is being requested and provided, and it can be extended in the future to satisfy other classes of Range requests, accept the same parameter via standard HTTP header rather than query string, etc.

Should add a test case too, I think.

#15 Updated by Tim Pierce almost 5 years ago

Updated at 521df5c: moved log-specific checks to _show_log.html.erb. CollectionsController.FileStreamer now delivers either the entire file or exactly opts[:maxbytes] bytes, whichever comes first. Any log truncation is reported in the log-viewer-overview pane (where "empty collection" is normally reported).

Added Rails.configuration.log_viewer_max_bytes just for good measure.

Suggestions for how best to test this? I think this requires the test harness to run workbench with KEEP_LOCAL_STORE set, and to populate that directory with log collections that match API test fixtures. Any suggestions for an easier test?

Tests that would be good to exercise:
  • log viewer called on a log that is shorter than Rails.configuration.log_viewer_max_bytes
  • log viewer called on a log that is longer than Rails.configuration.log_viewer_max_bytes
  • Collection.show_file called with:
    • "Range: 0-bignum"
    • "Range: 0-smallnum"
    • "?size=bignum"
    • "?size=smallnum"
    • "Range: 0-bignum" and "?size=smallnum"
    • "Range: 0-smallnum" and "?size=bignum"

#16 Updated by Peter Amstutz almost 5 years ago

By the way, you could use "206 partial content" status code to distinguish between "log truncated" and "log is exactly 10000 bytes"

#17 Updated by Tom Clegg almost 5 years ago

Suggestions:
  • Servers "should" respond 206 to requests with Range headers, and must not respond 206 otherwise -- so we should set status=206 in FileStreamer when we receive (and do not ignore) a Range header.
  • When responding 206 we must include a header like Content-Range: 0-999999/1324567
  • The reliable way for the client to determine whether the log is truncated is "if status is 206 and Content-Range: 0-a/b and a+1 < b

#18 Updated by Tom Clegg almost 5 years ago

Rather than giving an "arv-get" command, how about:
  • Always link to the log collection page ("View or download entire log file") -- perhaps at the top, after the summary/timing sentence? This is useful even when the log isn't truncated.
  • In #log-viewer-overview mention that only the first X bytes of the log are shown here, and the timing summary is not available.

Since you (pretty much) asked for JavaScript nitpicks:

This would be more consistent with a trailing semicolon:
  • +    var log_maxbytes = <%= Rails.configuration.log_viewer_max_bytes %>
    

Thank you for adding the configuration knob using the approach described at Hacking Workbench. :)

#19 Updated by Tom Clegg almost 5 years ago

3782-stub-file-content @ 5a341ad isolates the file_enumerator stub to one test at a time.

#20 Updated by Tim Pierce almost 5 years ago

Tom Clegg wrote:

3782-stub-file-content @ 5a341ad isolates the file_enumerator stub to one test at a time.

Nice!

Can we go one step better, and put the arv-get call in its own method, and stub out that method? The stub could return a StringIO object which reads from a file. That way we can test the whole stack all the way down to the arv-get call, which is the piece that's hardest to exercise in a functional test (and arguably shouldn't be part of the functional test at all).

#21 Updated by Tim Pierce almost 5 years ago

Committed at 3cdc055:

  • CollectionsController.show_file, when the request has a Range header:
    • returns a 206 status
    • supplies a Content-Range header listing the byte range returned
  • _show_log.html.erb
    • Displays a link to download the full job log
    • Reports truncated logs only if the return status is 206 and the Content-Range indicates that the end of the log is not included

#22 Updated by Tom Clegg almost 5 years ago

Tim Pierce wrote:

Tom Clegg wrote:

3782-stub-file-content @ 5a341ad isolates the file_enumerator stub to one test at a time.

Nice!

Can we go one step better, and put the arv-get call in its own method, and stub out that method? The stub could return a StringIO object which reads from a file. That way we can test the whole stack all the way down to the arv-get call, which is the piece that's hardest to exercise in a functional test (and arguably shouldn't be part of the functional test at all).

Yes, I've started to go down that road, but I haven't quite nailed it yet. Stubbing with static content was pretty easy, but politely stubbing a kernel method that accepts a block is (so far) less obvious, and I was hoping to do this in a way that's less obtrusive to the code itself.

Does this branch look OK to merge in the meantime? At least it isolates the stubbing behavior to individual test cases instead of monkeypatching the whole controller class, which seems like a good increment.

#23 Updated by Tim Pierce almost 5 years ago

Tom Clegg wrote:

Tim Pierce wrote:

Tom Clegg wrote:

3782-stub-file-content @ 5a341ad isolates the file_enumerator stub to one test at a time.

Nice!

Can we go one step better, and put the arv-get call in its own method, and stub out that method? The stub could return a StringIO object which reads from a file. That way we can test the whole stack all the way down to the arv-get call, which is the piece that's hardest to exercise in a functional test (and arguably shouldn't be part of the functional test at all).

Yes, I've started to go down that road, but I haven't quite nailed it yet. Stubbing with static content was pretty easy, but politely stubbing a kernel method that accepts a block is (so far) less obvious, and I was hoping to do this in a way that's less obtrusive to the code itself.

It's easy to rewrite FileStreamer.each to use the IO object directly instead of using a block. The trick is that, the way the code is organized, it's still difficult to put the stub in the right place (since the FileStreamer isn't instantiated until it's time to start downloading, there's nothing to stub).

If we could put an arv_get_pipe method in the CollectionsController that returns an IO object, that could be stubbed easily, but I wasn't able to figure out how to call it from the FileStreamer.

Does this branch look OK to merge in the meantime? At least it isolates the stubbing behavior to individual test cases instead of monkeypatching the whole controller class, which seems like a good increment.

Yes, I like this approach. Good to merge.

#24 Updated by Tim Pierce almost 5 years ago

New commit available at 34e461f:

This fixes a fun bug. The log viewer never checked the actual size of the log file (as reported in the manifest) against log_viewer_max_bytes. So it sent a Range header with every request, and never included the expected log size, so the collections controller could only respond with HTTP 206 "Content-Range: 0-1000000/*" (i.e. "here are the first million bytes of that unknown-sized file you asked for, sir") and the viewer always reported that the log was truncated.

The fix: only send a Range header if the log size in the manifest exceeds log_viewer_max_bytes.

Also included: integration tests for retrieving the full log and retrieving a partial log. If the approach to the integration test seems okay, it should be easy to write additional tests here.

#25 Updated by Tom Clegg almost 5 years ago

At 34e461f...

Tim Pierce wrote:

Committed at 3cdc055:

  • CollectionsController.show_file, when the request has a Range header:
    • returns a 206 status
    • supplies a Content-Range header listing the byte range returned
  • _show_log.html.erb
    • Displays a link to download the full job log
    • Reports truncated logs only if the return status is 206 and the Content-Range indicates that the end of the log is not included
apps/workbench/app/views/jobs/_show_log.html.erb
  • has whitespace errors (tabs)
  • could use $("#log-viewer-download-pane").show() instead of $("#log-viewer-download-pane").css('display', 'inline');
apps/workbench/app/controllers/collections_controller.rb
  • has an unused bytecount variable
  • has never been through its while bytesleft > 0... loop more than once if I'm following our test regime correctly. But I don't see any problems, even after staring at it, so it must be OK.
  • bytesleft > 0 && buf = io.read(bytesleft) makes a GCC warning in my head. But again I think it's OK.

That test setup/teardown/KEEP_LOCAL_STORE stuff is pretty icky but it sure is better than no test. (I'm still volunteering to stub the arv-get pipe, so I'll clean this up when I get there. It looks like I should copy the "reset config after each test case" code from API server as well.)

The new fixtures would be slightly less mysterious (possibly even more helpful) if the job_with_real_log's uuid were changed to zzzzz-8i9sb-abcdefghijklmno to match its log collection's filename and content.

Probably not worth changing, but FYI: assert_selector does what you usually want when you say "wait_for_ajax; assert page.has_text?" -- i.e., if necessary, wait for stuff to appear.

#26 Updated by Tim Pierce almost 5 years ago

New revision at 4ff61dd:

Tom Clegg wrote:

apps/workbench/app/views/jobs/_show_log.html.erb
  • has whitespace errors (tabs)
  • could use $("#log-viewer-download-pane").show() instead of $("#log-viewer-download-pane").css('display', 'inline');

That's a nice tip, thanks. Both done.

apps/workbench/app/controllers/collections_controller.rb
  • has an unused bytecount variable
  • has never been through its while bytesleft > 0... loop more than once if I'm following our test regime correctly. But I don't see any problems, even after staring at it, so it must be OK.
  • bytesleft > 0 && buf = io.read(bytesleft) makes a GCC warning in my head. But again I think it's OK.

Removed unused bytecount.

I agree, the while loop is terrible style. Rewritten as bytesleft > 0 && (buf = io.read(bytesleft)) != nil. Let me know if this is not a helpful improvement.

Writing a stub that explicitly trickles out data so we can test multiple trips through the while loop would be pretty snazzy! I really need to close this ticket out, but I'd be happy to try to squeeze that test in at the end of this sprint.

That test setup/teardown/KEEP_LOCAL_STORE stuff is pretty icky but it sure is better than no test. (I'm still volunteering to stub the arv-get pipe, so I'll clean this up when I get there. It looks like I should copy the "reset config after each test case" code from API server as well.)

Sounds great. Happy to take ideas on better refactoring the Keep local store stuff. I tried putting the test Keep data in a fake fixture but rake didn't like that idea too much. (We could put it in a .yml file somewhere outside of the fixture directory...)

The new fixtures would be slightly less mysterious (possibly even more helpful) if the job_with_real_log's uuid were changed to zzzzz-8i9sb-abcdefghijklmno to match its log collection's filename and content.

Good catch, done. (Well, done the other way around, but whatever.)

Probably not worth changing, but FYI: assert_selector does what you usually want when you say "wait_for_ajax; assert page.has_text?" -- i.e., if necessary, wait for stuff to appear.

I tried playing around with assert_selector here but it seems to require the assertion to specify the full text of what appears in the selector, and in some of these cases that seemed both too cumbersome and too fragile to me. Happy to reconsider.

#27 Updated by Tom Clegg almost 5 years ago

Tim Pierce wrote:

apps/workbench/app/controllers/collections_controller.rb
  • has an unused bytecount variable
  • has never been through its while bytesleft > 0... loop more than once if I'm following our test regime correctly. But I don't see any problems, even after staring at it, so it must be OK.
  • bytesleft > 0 && buf = io.read(bytesleft) makes a GCC warning in my head. But again I think it's OK.

Removed unused bytecount.

I agree, the while loop is terrible style. Rewritten as bytesleft > 0 && (buf = io.read(bytesleft)) != nil. Let me know if this is not a helpful improvement.

New version also OK, yup.

Writing a stub that explicitly trickles out data so we can test multiple trips through the while loop would be pretty snazzy! I really need to close this ticket out, but I'd be happy to try to squeeze that test in at the end of this sprint.

Nope, not asking you to do anything here.

That test setup/teardown/KEEP_LOCAL_STORE stuff is pretty icky but it sure is better than no test. (I'm still volunteering to stub the arv-get pipe, so I'll clean this up when I get there. It looks like I should copy the "reset config after each test case" code from API server as well.)

Sounds great. Happy to take ideas on better refactoring the Keep local store stuff. I tried putting the test Keep data in a fake fixture but rake didn't like that idea too much. (We could put it in a .yml file somewhere outside of the fixture directory...)

"I'm still volunteering" translation: "You're off the hook."

The new fixtures would be slightly less mysterious (possibly even more helpful) if the job_with_real_log's uuid were changed to zzzzz-8i9sb-abcdefghijklmno to match its log collection's filename and content.

Good catch, done. (Well, done the other way around, but whatever.)

Ah, I suggested that way around to make it a one-line fix without updating any hashes. You'll never get those 3 minutes of your life back.

Probably not worth changing, but FYI: assert_selector does what you usually want when you say "wait_for_ajax; assert page.has_text?" -- i.e., if necessary, wait for stuff to appear.

I tried playing around with assert_selector here but it seems to require the assertion to specify the full text of what appears in the selector, and in some of these cases that seemed both too cumbersome and too fragile to me. Happy to reconsider.

Nope, not worth changing. Maybe next time around Hacking Workbench will give advice (I think you can use a regexp).

Please merge, thanks. Ha ha, just kidding! I think you need to update that effing fragile FUSE test that fails when we fiddle with the fixtures.

  • ======================================================================
    FAIL: runTest (tests.test_mount.FuseSharedTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/tom/src/arvados/services/fuse/tests/test_mount.py", line 270, in runTest
        ], d2)
    AssertionError: Lists differ: ['A Project', 'Empty collectio... != ['A Project', 'Empty collectio...
    
    Second list contains 1 additional elements.
    First extra element 11:
    real_log_collection
    

#28 Updated by Tim Pierce almost 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 67 to 100

Applied in changeset arvados|commit:a8bd120b4b5056e7a688e6ce4a60c1251fe7cb0f.

Also available in: Atom PDF