Project

General

Profile

Efficient live access to container logs » History » Revision 2

Revision 1 (Tom Clegg, 12/28/2022 07:32 PM) → Revision 2/4 (Tom Clegg, 08/08/2023 06:44 PM)

h1. Efficient live access to container logs 

 {{toc}} 

 h2. Goals 

 * Low latency: deliver live logs to clients (workbench/cli) as quickly as possible 
 * Scalable: don't generate significant load in RailsAPI 
 * Scalable: don't generate significant load in PostgreSQL 
 * Efficient: don't waste resources on event streams that nobody is listening to 
 * Robust: Don't let a badly behaved container (writing many gigabytes of logs) impede others 
 * Dependable: Retain complete logs (container/crunch-run/crunchstat) _in the log collection_ even in the event that crunch-run or the compute instance itself dies unexpectedly (_e.g.,_ bug, out of memory, hardware failure) 

 h2. Proposed architecture 

 * crunch-run offers a “container logs” API at its container gateway server 
 * controller offers a “container logs” API that works whether the container is finished or not 
 ** proxies to crunch-run _or_ keep-web (depends on whether the container is running) 
 * “container logs” API has two parts: 
 ** webdav: read file content from the log collection 
 ** server-sent-events stream: get notifications as logs appear (see below) 
 * dispatcher provides “plan B” to save logs when crunch-run crashes, instance dies, etc. 
 ** dispatcher routinely retrieves live logs from container gateway, and buffers content to local disk 
 ** dispatcher discards buffered logs when crunch-run commits to log collection 
 ** when crunch-run exits, dispatcher checks whether crunch-run managed to save the full log collection and, if needed, appends the latest buffered logs 
 * command-line client can “tail” and “tail -f” logs from a running container 
 * workbench2 uses a single code path to retrieve logs (instead of choosing webdav or logs table based on whether container is running) 

 h2. API details 

 @GET /arvados/v1/container_requests/{uuid}/log/{container_uuid}/stderr.txt@ /arvados/v1/containers/{uuid}/log/stderr.txt@ 

 ** client will use Range headers to request partial content 

 @GET /arvados/v1/container_requests/{uuid}/log_events@ /arvados/v1/containers/{uuid}/log_events@ 

 ** response is a "server-sent events":https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events stream, data is JSON-encoded 
 ** a “file_sizes” event indicates the current size of new/updated files in the log collection 
 @event: file_sizes@ 
 @data: {"{container_uuid}/stderr.txt":1234,"{container_uuid}/stdout.txt":0,"{container_uuid}/crunch-run.txt":2345}@ {"stderr.txt":1234,"stdout.txt":0,"crunch-run.txt":2345}@ 
 ** after the first message, unchanged files may be omitted: 
 @event: file_sizes@ 
 @data: {"{container_uuid}/stderr.txt":1444,"{container_uuid}/stdout.txt":72}@ {"stderr.txt":1444,"stdout.txt":72}@ 
 ** a “final” message indicates that the log collection is finalized and no further file_sizes events will be sent: 
 @event: final@ 
 ** a “retry” message indicates how long, in milliseconds, the client should wait before reconnecting after a disconnect (per SSE spec): 
 @retry: 2000@ 
 ** empty comments are sent periodically to keep the connection alive during idle periods 
 @:@ 
 ** after the “final” message, the connection either remains open and silent, or closes after specifying a long reconnect delay in a “retry” message (the idea is to avoid sending an inattentive client into a reconnect cycle    -    this doesn't affect well-behaved clients because they will hang up after receiving the “final” message) 
 ** @?maxInterval=X@ sets the maximum interval between messages (i.e., keepalive) to X seconds, default 15 
 ** @?minInterval=X@ increases the minimum interval between messages (i.e., throttle) to X seconds, default 1 

 h2. Authentication 


 The browser's EventSource API allows the caller to enable existing cached credentials/cookies, but not to explicitly set an Authorization header or another custom header. 
 * @?api_token=v2/...@ in the URL is easy to support on the client and server 
 ** this is not ideal: the token is likely to be logged by HTTP servers/proxies 
 ** we already accept this situation in our websocket usage 

 Server also accepts the usual Authorization header, and this should be used by clients that have the capability (e.g., arvados-client). 

 Future version may also support a “copy Authorization header into cookie” endpoint 
 ** client requests @GET /arvados/v1/cookie@ request with the token in the Authorization request header 
 ** response sets a cookie with the provided token 
 ** client sets up EventSource using @{withCredentials: true}@ 
 ** this approach is inherently flawed in that applications in different tabs (e.g., wb2.clusterA and wb2.clusterB, where A and B are in a federation) will unknowingly fight over which cookie should be sent to cluster A. 

 Future version may also support a “provide short-lived token” endpoint 
 ** client requests @GET /arvados/v1/api_client_authorizations/:uuid/slt@ with the real API token in an Authorization request header 
 ** response contains a new token that will only be recognized for the next 15 seconds 
 ** client sets up EventSource using @?slt={newtoken}@ 
 ** event stream sends a message with the real token in the “id” field 
 ** when reconnecting, the browser automatically sends the real token in the Last-Event-Id header 
 ** server uses the token in the Last-Event-Id field, if provided 

 h2. Client pseudocode 

 to reliably display logs for container with uuid C: 

 <pre> 
 connect to server-sent-events endpoint /arvados/v1/container_requests/{CR}/log_events /arvados/v1/containers/{C}/log_events 
   on unexpected disconnect: 
     delay for duration indicated in last “retry” message 
     reconnect 
   on final: 
     disconnect 
   on file_sizes: 
     decode message data as json 
     compare each entry to last size received for corresponding file, if any 
     if desired, retrieve file data via webdav endpoint and present to user 
 </pre> 

 h2. Client (workbench2 / CLI) implementation notes 

 * optionally, if container state is complete/cancelled or CR state is final, client _may_ skip the log&#95;events steps and just retrieve file data via webdav endpoint 
 * the client can rely on the server&#45;sent&#45;events stream to return a message, with sizes of all available log files, upon first connection, even if there are no available sizes yet (i.e., message.files may have zero keys) or the container is already finished 
 * message.files may have keys for newly added files that have not appeared in any previous message 
 * client should have logic to skip some content if the log file size grows faster than expected 

 h2. Crunch&#45;run implementation notes 

 * buffer logs and periodically flush to keep (as before) 
 * container gateway has a new webdav endpoint that reads the (live) log collection 
 ** stdlib webdav server implements range requests for us 
 * container gateway has a new endpoint that returns a server&#45;sent&#45;events stream 
 ** upon connection, return one event per log file: [filename, current&#45;size, last&#45;saved&#45;size] 
 ** when a file is updated, wait until the last returned event is more than 1s old, then return an event [filename, current&#45;size, last&#45;saved&#45;size] 
 *** “current&#45;size” enables client to construct a suitable range request to retrieve recent data 
 *** “current&#45;size” enables client to skip ahead if appropriate 
 *** “last&#45;saved&#45;size” enables dispatcher to discard buffered data once it's safely written to storage and accessible via log collection 
 * crunch&#45;run updates gateway&#95;address as soon as the gateway service is up (instead of waiting until state=Running) 
 ** railsapi should allow gateway&#95;address to be set when state=Locked 
 ** this allows clients to access early logs (e.g., while preparing the docker image) 

 h2. Controller implementation notes for log&#95;events 

 * additional logic is needed here beyond a simple proxy, in order that a single client&#45;to&#45;controller connection can receive all log updates through the entire container lifecycle, including one with the final file sizes, even though 
 ** crunch&#45;run is not running yet when state=Queued 
 ** crunch&#45;run may exit/disconnect before returning the final update (many race possibilities, including container finishing while controller is trying to reconnect) 
 ** additional logs may be added by dispatcher if crunch&#45;run exits before finalizing the container 
 * if state=Queued or container gateway&#95;address is empty, send an empty file&#95;sizes message, then send keepalive messages while waiting/polling for the gateway to come up 
 * if state=Complete or Cancelled, get the final log file sizes, send a file&#95;sizes message, a long retry message, and a “final” message; delay a few seconds; and hang up 
 * connect to crunch&#45;run's log&#95;events endpoint (through controller tunnel if applicable), propagate client's last&#45;id and min/maxInterval params, and relay messages (including keepalive) to client 
 * when crunch&#45;run disconnects (or does not answer), wait a second, re&#45;fetch the container record, and repeat from top 

 h2. Controller implementation notes for log data 

 * if container log is not empty, a specific range is requested, and the size of the requested file is compatible with the requested range, proxy to keep&#45;web 
 * if state=Complete/Cancelled, proxy to keep&#45;web 
 * if gateway&#95;address is empty, return 404 
 * if state=Locked/Running and gateway&#95;address is not empty, proxy to gateway&#95;address 

 h2. Dispatcher implementation (for retaining the last log entries when a worker instance crashes) 

 * routinely connect/reconnect to crunch&#45;run gateway server of containers that are being supervised 
 * when unflushed logs appear, save them in a temporary buffer on local disk, up to a configurable size limit 
 * when crunch&#45;run exits unexpectedly, check the latest saved log collection, and (if needed) append the buffered logs to the respective truncated files before finalizing the container 
 * in HPC environment where dispatcher→worker connections are not supported, connect through the worker→controller tunnel (see #19166)