Project

General

Profile

Actions

Feature #20219

closed

Log panel on container view fetches live logs using periodic range request on new container log endpoint

Added by Peter Amstutz over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Workbench2
Target version:
Story points:
3.0
Release relationship:
Auto

Description

https://dev.arvados.org/projects/arvados/wiki/Efficient_live_access_to_container_logs

Stop getting things from the log table.

Get the list of files in the container log collection (via WebDAV):

/arvados/v1/containers/{uuid}/log

Then for the logs the user wants to view, get the data for the individual logs:

/arvados/v1/containers/{uuid}/log/{file}.txt

If the amount of data exceeds some limit (eg 128kb) then only fetch the first 64kb and last 64kb and include a message that some logs were omitted.

Periodically poll to see if the file sizes get bigger and request more data when it is available.


Subtasks 2 (0 open2 closed)

Task #20498: Review 20219-log-apiResolvedStephen Smith07/28/2023Actions
Task #20834: Review 20219-log-apiResolvedLucas Di Pentima07/28/2023Actions

Related issues

Related to Arvados Epics - Idea #16442: Scalable + reliable container loggingResolved03/15/202308/31/2023Actions
Related to Arvados - Bug #20036: Process view log component fetches process logs from Keep when the logs table logs have been purgedClosedActions
Blocked by Arvados - Feature #19889: access current container logs at /arvados/v1/containers/{uuid}/log/{filename}ResolvedTom Clegg03/23/2023Actions
Blocked by Arvados - Bug #20647: container_request/.../logs endpoint needs to handle CORS preflight (unauthenticated OPTIONS) requestsResolvedTom Clegg06/16/2023Actions
Actions #1

Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)
  • Subject changed from Log panel fetches live logs using periodic range request on new WebDAV endpoint to Log panel on container view fetches live logs using periodic range request on new container log endpoint
Actions #2

Updated by Peter Amstutz over 1 year ago

  • Related to Idea #16442: Scalable + reliable container logging added
Actions #3

Updated by Peter Amstutz over 1 year ago

  • Related to Feature #19889: access current container logs at /arvados/v1/containers/{uuid}/log/{filename} added
Actions #4

Updated by Peter Amstutz over 1 year ago

  • Related to deleted (Feature #19889: access current container logs at /arvados/v1/containers/{uuid}/log/{filename})
Actions #5

Updated by Peter Amstutz over 1 year ago

  • Blocked by Feature #19889: access current container logs at /arvados/v1/containers/{uuid}/log/{filename} added
Actions #6

Updated by Peter Amstutz over 1 year ago

  • Story points set to 3.0
Actions #7

Updated by Peter Amstutz over 1 year ago

  • Target version changed from Future to To be scheduled
Actions #8

Updated by Peter Amstutz over 1 year ago

  • Related to Bug #20036: Process view log component fetches process logs from Keep when the logs table logs have been purged added
Actions #9

Updated by Peter Amstutz over 1 year ago

  • Target version changed from To be scheduled to Development 2023-05-24 sprint
Actions #10

Updated by Peter Amstutz over 1 year ago

  • Assigned To set to Stephen Smith
Actions #11

Updated by Peter Amstutz over 1 year ago

  • Target version changed from Development 2023-05-24 sprint to Development 2023-06-07
Actions #12

Updated by Stephen Smith over 1 year ago

  • Status changed from New to In Progress
Actions #13

Updated by Peter Amstutz over 1 year ago

  • Target version changed from Development 2023-06-07 to Development 2023-06-21 sprint
Actions #15

Updated by Tom Clegg over 1 year ago

  • Blocked by Bug #20647: container_request/.../logs endpoint needs to handle CORS preflight (unauthenticated OPTIONS) requests added
Actions #16

Updated by Peter Amstutz over 1 year ago

  • Target version changed from Development 2023-06-21 sprint to Development 2023-07-05 sprint
Actions #17

Updated by Peter Amstutz over 1 year ago

  • Target version changed from Development 2023-07-05 sprint to Development 2023-07-19 sprint
Actions #18

Updated by Peter Amstutz about 1 year ago

  • Target version changed from Development 2023-07-19 sprint to Development 2023-08-02 sprint
Actions #19

Updated by Stephen Smith about 1 year ago

Changes at arvados-workbench2|d76aa6ba31074f3d41cef48628b235aab4906ce5 branch 20219-log-api
Tests developer-tests-workbench2: #1246

Summary of changes:

  • Add apiWebdavClient to log service for making webdav requests to the API and
    differentiate from keepWebdavClient
  • Add listLogFiles and getLogFileContents methods to log service, contents can
    be fetched with range request
  • Add loadContainerLogFileContents to load logs using webdav range requests
    • Files larger than the chunk limit (128k) are loaded as beginning and end
      in 2 64k fragments, snipline is appended to the last line of the first
      fragment
    • The chunks are all requested in parallel
  • Add sorting of timestamped log types in combined log views (Main/All)
    • Lines without timestamps are merged with previous lines with timestamps
      • If no preceding lines of a sortable type have timestamps, the leading lines are
        removed from combined log views - they can still be seen in the single
        log type filter
    • Non-timestamped log types (node-info/contaner) are pushed to the bottom
      with original ordering in All view
    • Merging & sorting is not applied to single log type view
  • Remove snipline LogEventType as all sniplines now belong to specific log
    event types
  • Update logs panel store and reducer to remember the last log byte requested
  • Add polling useAsyncInterval utility function to poll log file sizes
    • Polling awaits for callback to finish to prevent race conditions
  • Update process log code snippet style to remove gap between array of loglines
  • Remove websocket handling of log events
  • Replaced cypress log manipulation commands with webdav log helpers
  • Add tests for polling, line sorting, correctly sized/positioned chunks, and
    snipline
Actions #20

Updated by Lucas Di Pentima about 1 year ago

Some comments & questions:

  • Can we have some unit testing on the new useAsyncInterval() function? I remember you mentioning some special feature it has (sorry, I didn't read the code in depth) and it would be great to showcase it on a test case.
  • Just curious: Can you explain why there's a need to differentiate apiWebdavClient versus keepWebdavClient names on LogService & CollectionService? AFAICT, they're both private members of different classes so they can be named the same (ie: webdavClient), right?
  • At src/services/log-service/log-service.ts
    • Do you think is a good idea to ignore any exception that could come from the webdav endpoint in getLogFileContents()? I think it would be nice to at least allow the exception to propagate up in the call stack is we're not going to handle it there.
    • I'm seeing that logFileToLogType() allows a directory as input parameter. Do we currently support the case of a directory being handled as a log file?
  • It seems that the log polling happens always, including the cases of completed/failed (ie: not running) containers. I think it would be better to cancel the polling on those to avoid unnecessary load on the controller. WDYT?
  • At src/store/process-logs-panel/process-logs-panel-actions.ts
    • pollProcessLogs() on polling failure, exceptions are being ignored. I'm thinking this might bite us in the future when trying to debug some issue. At the very least, I would log some error message on the console (to avoid nagging the user with a toast every 2 seconds), but also I'm thinking that it will be useful for doing exponential backoff or similar techniques in the future.
    • mergeSortLogFragments() I think non-sortable logs should be first, at least in the a case of a live log: imagine the user accessing a running container and seeing partial live stdout/stderr logs, then the non-sortable stuff and then after 2 secs, more of the sortable stuff. Especially when the non-sortable logs are things that are created at the very first moment a container runs (node-info, for example).
    • loadContainerLogFileList() Do you think it would be better to do the filtering by path inside the logService.listLogFiles(uuid) call? I'm seeing a high coupling issue there: why the client of the log service should know about log file paths? AFAIK, we only care about root-level log files, and the rest is for humans browsing the log collection, so always filtering the ones we care in wb2's log service code is clearer, do you agree?
Actions #21

Updated by Peter Amstutz about 1 year ago

  • Target version changed from Development 2023-08-02 sprint to Development 2023-08-16
Actions #22

Updated by Stephen Smith about 1 year ago

Changes at arvados-workbench2|32e9b073f9a153a02cecd6adb902b3bfa36e2ff3
Tests developer-tests-workbench2: #1249

  • Added unit tests for useAsyncInterval
    • This required using a different fake timers library since jest fake timers doesn't work reliably with async
    • Replace async/await with promise chain to make the helper more robust - handles synchronous callbacks / doesn't break if the callback doesn't return a promise / isn't a function
    • Combine useRefs so it can be mocked easier
    • pollProcessLogs explicitly resolves/rejects the promise
  • For the webdav client, I renamed the ones in tests just to make it clearer that each webdav client is specifically for keep or api and to reduce confusion
  • getLogFileContents now exposes errors through promise rejection
    • This required some reworking of loadContainerLogFileContents, which now uses Promise.allSettled to await all the promises without exiting on any failure since failed LogFragments result in a rejected promise
    • Rejecting promises in getLogfileContents has the benefit of now triggering the entire failure of both fragments of a log file when only one fragment fails to load due to still using Promise.all for each log file - no longer need to filter out partially failed promises
    • Added es2020 to TS lib array in order to use Promise.allSettled
    • The initial log file load now catches if all log files failed to load and shows a toast, and also initializes the store empty so that polling can start if the container is running
  • Switched logFileToLogType to only accept CollectionFile
  • Polling switches on only when the container is running
  • Added console log and note to pollProcessLogs on failure
  • Swapped order of sortable/non-sortable logs so that non-sortable is at the top
  • Perform log file root path filtering in listLogFiles instead
Actions #23

Updated by Lucas Di Pentima about 1 year ago

First of all, thank you for splitting the changes in bite-sized commits, it makes the reviewing process a lot easier!

I have just a couple of minor comments:
  • I think the "fake-timers" library should go on devDependencies as it's just used when running tests (there might be other dev-only packages that we have included in dependencies by accident, "tslint*", "redux-devtools-extension" for example?)
  • At src/store/process-logs-panel/process-logs-panel-actions.ts I think it's more correct to use console.error(), and I think it will be nice to include some information about the error that is happening.

Other than that, it LGTM. Thanks!

Actions #25

Updated by Peter Amstutz about 1 year ago

  • Status changed from Resolved to Feedback
Actions #26

Updated by Peter Amstutz about 1 year ago

Tested on a deployed cluster, the live logs are not working.

The issue is that /arvados/v1/container_requests/scale-xvhdp-k0oo2h4q12y5ftj/log/stderr.txt is never going to return live logs.

While the container is Running, .../log/{container_uuid}/ returns real-time logging data.

Clients should look up the container request record and use the container_uuid attribute to request files and directory listings under the per-container directory. (See https://doc.arvados.org/main/api/methods/container_requests.html)

We should be seeing reqs like /arvados/v1/container_requests/scale-xvhdp-k0oo2h4q12y5ftj/log/scale-dz642-0bue1asl6fho2tg/stderr.txt

This seems like a significant miscommunication about the API.

Tom: "huh. Yeah, I remember we ran into the "endpoint moved completely since the wiki spec was written" issue while Stephen was developing, and it looks like we still didn't update the 20219 description accordingly"

Actions #27

Updated by Stephen Smith about 1 year ago

Changes at arvados-workbench2|59cc53cbed401e6b5ad750d992f64a8f5c83c6bf branch 20219-log-api
Tests developer-tests-workbench2: #1275

  • Switched log endpoint to live logs: container_requests/{cr_uuid}/log/* to container_requests/{cr_uuid}/log/{container_uuid}/*
  • Adjusted test helper commands to manipulate logs in the container subfolder of the log_uuid on the container request
Actions #28

Updated by Peter Amstutz about 1 year ago

  • Target version changed from Development 2023-08-16 to Development 2023-08-30
Actions #29

Updated by Lucas Di Pentima about 1 year ago

Some comments, not necessarily related to the latest changes, I apologize, I might have missed some details on my first review.

  • The command cy.collectionReplaceFiles() looks almost the same as cy.updateCollection(), do you think you could make it more specific to the "replace files" part of its name? Like only receiving the params that should be set to replace_files, because the rest (I think) should be always the same.
  • I think the command's name "cy.doKeepRequest" should be changed to "cy.doWebDAVRequest" or something similar to avoid confusion, because Keep requests aren't being used in this case (in fact, we purposedly avoid making WB2 know how to talk to keep-stores).
  • When trying this on tordo, ran a wf and immediately got a "log not found" red toast, because the PROPFIND request to https://tordo.arvadosapi.com/arvados/v1/container_requests/tordo-xvhdp-m9jvxp33j4j90ix/log/tordo-dz642-v0r4ythj725nmiz returned 404. I think we should handle this exceptions by ignoring 404s because is something expected, right? Or maybe, don't make the PROPFIND request when the CR is queued?
  • Would it be too complex to write some unit tests for the LogService class methods?
Actions #30

Updated by Stephen Smith about 1 year ago

Changes at arvados-workbench2|ac6562028c3b71210988b4f845ac2945f4e88ace
Tests developer-tests-workbench2: #1319

  • Renamed doKeepRequest to doWebDAVRequest
  • Made cypress collectionReplaceFiles helper more specific to replace_files api
  • Added unit tests for log service list log files and get log file contents
  • Suppress 404 errors when doing initial load of log files
Actions #31

Updated by Lucas Di Pentima about 1 year ago

This LGTM, thanks!

Actions #32

Updated by Stephen Smith about 1 year ago

  • Status changed from Feedback to Resolved
Actions #33

Updated by Peter Amstutz about 1 year ago

  • Release set to 66
Actions

Also available in: Atom PDF