Feature #20219
closedLog panel on container view fetches live logs using periodic range request on new container log endpoint
Added by Peter Amstutz over 1 year ago. Updated about 1 year ago.
Description
https://dev.arvados.org/projects/arvados/wiki/Efficient_live_access_to_container_logs
Stop getting things from the log table.
Get the list of files in the container log collection (via WebDAV):
/arvados/v1/containers/{uuid}/log
Then for the logs the user wants to view, get the data for the individual logs:
/arvados/v1/containers/{uuid}/log/{file}.txt
If the amount of data exceeds some limit (eg 128kb) then only fetch the first 64kb and last 64kb and include a message that some logs were omitted.
Periodically poll to see if the file sizes get bigger and request more data when it is available.
Related issues
Updated by Peter Amstutz over 1 year ago
- Description updated (diff)
- Subject changed from Log panel fetches live logs using periodic range request on new WebDAV endpoint to Log panel on container view fetches live logs using periodic range request on new container log endpoint
Updated by Peter Amstutz over 1 year ago
- Related to Idea #16442: Scalable + reliable container logging added
Updated by Peter Amstutz over 1 year ago
- Related to Feature #19889: access current container logs at /arvados/v1/containers/{uuid}/log/{filename} added
Updated by Peter Amstutz over 1 year ago
- Related to deleted (Feature #19889: access current container logs at /arvados/v1/containers/{uuid}/log/{filename})
Updated by Peter Amstutz over 1 year ago
- Blocked by Feature #19889: access current container logs at /arvados/v1/containers/{uuid}/log/{filename} added
Updated by Peter Amstutz over 1 year ago
- Target version changed from Future to To be scheduled
Updated by Peter Amstutz over 1 year ago
- Related to Bug #20036: Process view log component fetches process logs from Keep when the logs table logs have been purged added
Updated by Peter Amstutz over 1 year ago
- Target version changed from To be scheduled to Development 2023-05-24 sprint
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-05-24 sprint to Development 2023-06-07
Updated by Stephen Smith over 1 year ago
- Status changed from New to In Progress
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-06-07 to Development 2023-06-21 sprint
Updated by Tom Clegg over 1 year ago
- Blocked by Bug #20647: container_request/.../logs endpoint needs to handle CORS preflight (unauthenticated OPTIONS) requests added
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-06-21 sprint to Development 2023-07-05 sprint
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-07-05 sprint to Development 2023-07-19 sprint
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-07-19 sprint to Development 2023-08-02 sprint
Updated by Stephen Smith over 1 year ago
Changes at arvados-workbench2|d76aa6ba31074f3d41cef48628b235aab4906ce5 branch 20219-log-api
Tests developer-tests-workbench2: #1246
Summary of changes:
- Add apiWebdavClient to log service for making webdav requests to the API and
differentiate from keepWebdavClient - Add listLogFiles and getLogFileContents methods to log service, contents can
be fetched with range request - Add loadContainerLogFileContents to load logs using webdav range requests
- Files larger than the chunk limit (128k) are loaded as beginning and end
in 2 64k fragments, snipline is appended to the last line of the first
fragment - The chunks are all requested in parallel
- Files larger than the chunk limit (128k) are loaded as beginning and end
- Add sorting of timestamped log types in combined log views (Main/All)
- Lines without timestamps are merged with previous lines with timestamps
- If no preceding lines of a sortable type have timestamps, the leading lines are
removed from combined log views - they can still be seen in the single
log type filter
- If no preceding lines of a sortable type have timestamps, the leading lines are
- Non-timestamped log types (node-info/contaner) are pushed to the bottom
with original ordering in All view - Merging & sorting is not applied to single log type view
- Lines without timestamps are merged with previous lines with timestamps
- Remove snipline LogEventType as all sniplines now belong to specific log
event types - Update logs panel store and reducer to remember the last log byte requested
- Add polling useAsyncInterval utility function to poll log file sizes
- Polling awaits for callback to finish to prevent race conditions
- Update process log code snippet style to remove gap between array of loglines
- Remove websocket handling of log events
- Replaced cypress log manipulation commands with webdav log helpers
- Add tests for polling, line sorting, correctly sized/positioned chunks, and
snipline
Updated by Lucas Di Pentima over 1 year ago
Some comments & questions:
- Can we have some unit testing on the new
useAsyncInterval()
function? I remember you mentioning some special feature it has (sorry, I didn't read the code in depth) and it would be great to showcase it on a test case. - Just curious: Can you explain why there's a need to differentiate
apiWebdavClient
versuskeepWebdavClient
names onLogService
&CollectionService
? AFAICT, they're both private members of different classes so they can be named the same (ie: webdavClient), right? - At
src/services/log-service/log-service.ts
- Do you think is a good idea to ignore any exception that could come from the webdav endpoint in
getLogFileContents()
? I think it would be nice to at least allow the exception to propagate up in the call stack is we're not going to handle it there. - I'm seeing that
logFileToLogType()
allows a directory as input parameter. Do we currently support the case of a directory being handled as a log file?
- Do you think is a good idea to ignore any exception that could come from the webdav endpoint in
- It seems that the log polling happens always, including the cases of completed/failed (ie: not running) containers. I think it would be better to cancel the polling on those to avoid unnecessary load on the controller. WDYT?
- At
src/store/process-logs-panel/process-logs-panel-actions.ts
pollProcessLogs()
on polling failure, exceptions are being ignored. I'm thinking this might bite us in the future when trying to debug some issue. At the very least, I would log some error message on the console (to avoid nagging the user with a toast every 2 seconds), but also I'm thinking that it will be useful for doing exponential backoff or similar techniques in the future.mergeSortLogFragments()
I think non-sortable logs should be first, at least in the a case of a live log: imagine the user accessing a running container and seeing partial live stdout/stderr logs, then the non-sortable stuff and then after 2 secs, more of the sortable stuff. Especially when the non-sortable logs are things that are created at the very first moment a container runs (node-info, for example).loadContainerLogFileList()
Do you think it would be better to do the filtering by path inside thelogService.listLogFiles(uuid)
call? I'm seeing a high coupling issue there: why the client of the log service should know about log file paths? AFAIK, we only care about root-level log files, and the rest is for humans browsing the log collection, so always filtering the ones we care in wb2's log service code is clearer, do you agree?
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-08-02 sprint to Development 2023-08-16
Updated by Stephen Smith over 1 year ago
Changes at arvados-workbench2|32e9b073f9a153a02cecd6adb902b3bfa36e2ff3
Tests developer-tests-workbench2: #1249
- Added unit tests for useAsyncInterval
- This required using a different fake timers library since jest fake timers doesn't work reliably with async
- Replace async/await with promise chain to make the helper more robust - handles synchronous callbacks / doesn't break if the callback doesn't return a promise / isn't a function
- Combine useRefs so it can be mocked easier
- pollProcessLogs explicitly resolves/rejects the promise
- For the webdav client, I renamed the ones in tests just to make it clearer that each webdav client is specifically for keep or api and to reduce confusion
- getLogFileContents now exposes errors through promise rejection
- This required some reworking of loadContainerLogFileContents, which now uses Promise.allSettled to await all the promises without exiting on any failure since failed LogFragments result in a rejected promise
- Rejecting promises in getLogfileContents has the benefit of now triggering the entire failure of both fragments of a log file when only one fragment fails to load due to still using Promise.all for each log file - no longer need to filter out partially failed promises
- Added es2020 to TS lib array in order to use Promise.allSettled
- The initial log file load now catches if all log files failed to load and shows a toast, and also initializes the store empty so that polling can start if the container is running
- Switched logFileToLogType to only accept CollectionFile
- Polling switches on only when the container is running
- Added console log and note to pollProcessLogs on failure
- Swapped order of sortable/non-sortable logs so that non-sortable is at the top
- Perform log file root path filtering in listLogFiles instead
Updated by Lucas Di Pentima over 1 year ago
First of all, thank you for splitting the changes in bite-sized commits, it makes the reviewing process a lot easier!
I have just a couple of minor comments:- I think the "fake-timers" library should go on
devDependencies
as it's just used when running tests (there might be other dev-only packages that we have included independencies
by accident, "tslint*", "redux-devtools-extension" for example?) - At
src/store/process-logs-panel/process-logs-panel-actions.ts
I think it's more correct to useconsole.error()
, and I think it will be nice to include some information about the error that is happening.
Other than that, it LGTM. Thanks!
Updated by Stephen Smith over 1 year ago
- Status changed from In Progress to Resolved
Updated by Peter Amstutz over 1 year ago
- Status changed from Resolved to Feedback
Updated by Peter Amstutz over 1 year ago
Tested on a deployed cluster, the live logs are not working.
The issue is that /arvados/v1/container_requests/scale-xvhdp-k0oo2h4q12y5ftj/log/stderr.txt is never going to return live logs.
While the container is Running, .../log/{container_uuid}/ returns real-time logging data.
Clients should look up the container request record and use the container_uuid attribute to request files and directory listings under the per-container directory. (See https://doc.arvados.org/main/api/methods/container_requests.html)
We should be seeing reqs like /arvados/v1/container_requests/scale-xvhdp-k0oo2h4q12y5ftj/log/scale-dz642-0bue1asl6fho2tg/stderr.txt
This seems like a significant miscommunication about the API.
Tom: "huh. Yeah, I remember we ran into the "endpoint moved completely since the wiki spec was written" issue while Stephen was developing, and it looks like we still didn't update the 20219 description accordingly"
Updated by Stephen Smith over 1 year ago
Changes at arvados-workbench2|59cc53cbed401e6b5ad750d992f64a8f5c83c6bf branch 20219-log-api
Tests developer-tests-workbench2: #1275
- Switched log endpoint to live logs:
container_requests/{cr_uuid}/log/*
tocontainer_requests/{cr_uuid}/log/{container_uuid}/*
- Adjusted test helper commands to manipulate logs in the container subfolder of the log_uuid on the container request
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-08-16 to Development 2023-08-30
Updated by Lucas Di Pentima over 1 year ago
Some comments, not necessarily related to the latest changes, I apologize, I might have missed some details on my first review.
- The command
cy.collectionReplaceFiles()
looks almost the same ascy.updateCollection()
, do you think you could make it more specific to the "replace files" part of its name? Like only receiving the params that should be set toreplace_files
, because the rest (I think) should be always the same. - I think the command's name "cy.doKeepRequest" should be changed to "cy.doWebDAVRequest" or something similar to avoid confusion, because Keep requests aren't being used in this case (in fact, we purposedly avoid making WB2 know how to talk to keep-stores).
- When trying this on tordo, ran a wf and immediately got a "log not found" red toast, because the PROPFIND request to https://tordo.arvadosapi.com/arvados/v1/container_requests/tordo-xvhdp-m9jvxp33j4j90ix/log/tordo-dz642-v0r4ythj725nmiz returned 404. I think we should handle this exceptions by ignoring 404s because is something expected, right? Or maybe, don't make the PROPFIND request when the CR is queued?
- Would it be too complex to write some unit tests for the LogService class methods?
Updated by Stephen Smith over 1 year ago
Changes at arvados-workbench2|ac6562028c3b71210988b4f845ac2945f4e88ace
Tests developer-tests-workbench2: #1319
- Renamed doKeepRequest to doWebDAVRequest
- Made cypress collectionReplaceFiles helper more specific to replace_files api
- Added unit tests for log service list log files and get log file contents
- Suppress 404 errors when doing initial load of log files
Updated by Stephen Smith over 1 year ago
- Status changed from Feedback to Resolved