Feature #19889
closedaccess current container logs at /arvados/v1/containers/{uuid}/log/{filename}
Description
See Efficient live access to container logs
Provide access to the logs for a locked/running container, including the portion that has not yet been flushed to keep.
Included:- read-only webdav handler in crunch-run (may involve refactoring/exporting some parts of keep-web to avoid copying code)
- webdav handler in controller that proxies to crunch-run gateway, keep-web (webdav.InternalURLs), or an "empty collection" stub, depending on whether container is active/finished
- log_events API
- workbench2
Related issues
Updated by Tom Clegg almost 2 years ago
- Related to Idea #16442: Scalable + reliable container logging added
Updated by Tom Clegg over 1 year ago
- Release deleted (
60) - Story points set to 3.0
- Target version set to To be scheduled
Updated by Tom Clegg over 1 year ago
- Has duplicate Feature #20217: crunch-run WebDAV service providing access to latest logs added
Updated by Tom Clegg over 1 year ago
- Has duplicate Feature #20218: Container log endpoint in controller that routes to either crunch-run or keep-web added
Updated by Peter Amstutz over 1 year ago
- Related to Feature #18790: Access live container logs through arvados-client and crunch-run container gateway added
Updated by Peter Amstutz over 1 year ago
- Related to Feature #20219: Log panel on container view fetches live logs using periodic range request on new container log endpoint added
Updated by Peter Amstutz over 1 year ago
- Related to deleted (Feature #20219: Log panel on container view fetches live logs using periodic range request on new container log endpoint)
Updated by Peter Amstutz over 1 year ago
- Blocks Feature #20219: Log panel on container view fetches live logs using periodic range request on new container log endpoint added
Updated by Peter Amstutz over 1 year ago
- Target version changed from To be scheduled to Development 2023-03-29 Sprint
Updated by Tom Clegg over 1 year ago
- Status changed from New to In Progress
19889-live-log-webdav @ a7438bf8ec3a27920905d732c62baf91a355bf27 -- developer-run-tests: #3561
19889-live-log-webdav @ a7438bf8ec3a27920905d732c62baf91a355bf27 -- developer-run-tests: #3564
Most of the diff is splitting the non-ssh-specific stuff out of the ContainerSSH code and using it to controller-to-controller proxying (when crunch-run has a tunnel to a different controller node), controller-to-tunnel-to-gateway proxying (when crunch-run has a tunnel to this controller node), and controller-to-gateway proxying (when crunch-run is directly reachable by tcp).
There are a few changes to lib/controller/router to make the request method and headers usable as inputs to a routableFunc, and pass OPTIONS requests under .../log
through to the webdav handler.
Tangential / seemingly unrelated changes:
I spent a while figuring out why moving keep-web's webdavfs wrapper into a separate module caused the controller UpdatePriority test to fail. It turned out the UpdatePriority test was using the (repeatable) seeded prng to generate test data. It only passed because we got lucky with the prng sequence. The webdavfs wrapper disrupted the lucky sequence by using the prng first. I suspect the failure is related to #20240. For the sake of unblocking this branch, I just updated the test to use the lucky sequence explicitly.
railsapi was refusing to update logs on container record when it was referenced by a container request that was already in Final state. This shouldn't happen IRL anyway (the container request wouldn't be final) but we have a test fixture in that situation and it interfered with using the fixture for testing.
Keep-web's webdavfs wrapper had code that transparently created intermediate directories when writing files, to compensate for the fact that empty directories weren't preserved. Now that empty directories are preserved in collections, this isn't needed, so I removed it.
crunchrun was still using the golang.org/x/net/context package instead of the new stdlib context (ISTR this used to be necessary because the dockerclient library didn't work with stdlib context yet). Switched to import "context"
.
Updated gorilla/mux, and noticed it's no longer being maintained. :(
Updated by Lucas Di Pentima over 1 year ago
Sorry for the delay, this was not an easy diff to read for me :). I tried to start a fresh arvbox
instance to do some manual testing, but after failing and wasting too much time on that... I gave up.
- File
lib/crunchrun/container_gateway.go
- Comment at line 84 I believe should mention
/arvados/v1/container/{uuid}/log/
- Comment at line 84 I believe should mention
- File
lib/controller/localdb/container_gateway_test.go
- Lines 291-295 are commented out code that I think could be removed.
- Re:
gorilla/mux
(not a blocker, I think it can be a different story) would it be a good to time to forking it and use it on our own code? I think we can benefit from GitHub's security warnings this way? (not sure if we would get those otherwise) - IIRC, the cadaver tests were an intergration test suite for KeepWeb. Do you think it would be useful/possible to add a test for the "already finished container log" case?
- For the future: There're many aspects of how the container gateway works that aren't known or fresh in my mind; I tried to find some high-level explanation on the wiki but couldn't find, I think it would be interesting to have some coarse grained explanation for devs to get up to speed without reading the code.
Updated by Tom Clegg over 1 year ago
- Comment at line 84 I believe should mention
/arvados/v1/container/{uuid}/log/
Fixed.
- Lines 291-295 are commented out code that I think could be removed.
Fixed.
- Re:
gorilla/mux
(not a blocker, I think it can be a different story) would it be a good to time to forking it and use it on our own code? I think we can benefit from GitHub's security warnings this way? (not sure if we would get those otherwise)
I've just subscribed to security notifications at https://github.com/gorilla/mux (does it notify earlier if we have a fork?)
- IIRC, the cadaver tests were an intergration test suite for KeepWeb. Do you think it would be useful/possible to add a test for the "already finished container log" case?
Indeed, this turned out to be very useful. Cadaver worked fine in the unfinished case, but in the proxy-to-keep-web case it noticed that the paths in the webdav responses didn't match the request, and bailed out (unless using -t
to ignore).
To fix this I needed to change the proxy strategy: use the ID-in-host request form, preserve the original path, and send a new X-Webdav-Prefix header to keep-web so it knows to ignore the "/arvados/v1/containers/{uuid}/log" part.
19889-live-log-webdav @ d7f388108615d981636e464785129ce4a958a7f5 -- developer-run-tests: #3567
retry wb1 developer-run-tests-apps-workbench-integration: #3843
- For the future: There're many aspects of how the container gateway works that aren't known or fresh in my mind; I tried to find some high-level explanation on the wiki but couldn't find, I think it would be interesting to have some coarse grained explanation for devs to get up to speed without reading the code.
I think the best we have is
Efficient live access to container logs
https://doc.arvados.org/main/architecture/dispatchcloud.html
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-03-29 Sprint to Development 2023-04-12 sprint
Updated by Lucas Di Pentima over 1 year ago
This LGTM, thanks!
Re:
- Re:
gorilla/mux
(not a blocker, I think it can be a different story) would it be a good to time to forking it and use it on our own code? I think we can benefit from GitHub's security warnings this way? (not sure if we would get those otherwise)I've just subscribed to security notifications at https://github.com/gorilla/mux (does it notify earlier if we have a fork?)
I think it notifies the whole team automatically instead of needing to manually subscribe, but given that you'll probably be the best person to handle those updates, this is good enough :)
Updated by Tom Clegg over 1 year ago
While working on #18790 I discovered that arvados-controller routing wasn't working because the /containers/ routes still go through the old generic railsproxy path instead of the new path.
I fixed this, and took the opportunity to add routing for .../arvados/v1/containers/{uuid}/ssh
and .../container_gateway
as well. The old .../connect/{uuid}/ssh
endpoints are still supported, and still used by client-side code, so 2.7 clients will work with 2.6 servers and vice versa. In a future version we can switch client-side code to use the new endpoints, and then eventually delete the old ones. Looking back I should have hacked the routing this way in the first place instead of having a temporary endpoint outside of /containers/
.
Also tweaked logging so "file not found" doesn't get logged as an error. Too noisy when clients are polling for stderr.txt
to exist, etc.
19889-live-log-webdav @ 6d03fdac5674dc88eff821f5e8ac70642f39a895 -- developer-run-tests: #3573
wb1 retry
Updated by Tom Clegg over 1 year ago
Another bugfix (unhandled error leading to use of nil pointer)
19889-live-log-webdav @ b402b894c535f95cdeba0631dea1fbb9c7672a07 -- developer-run-tests: #3575
Updated by Tom Clegg over 1 year ago
- % Done changed from 0 to 100
- Status changed from In Progress to Resolved
Applied in changeset arvados|d9fdb9630e56f3ccdaee6acd8b1ca4cdbdf11b0a.
Updated by Tom Clegg over 1 year ago
- Related to Feature #20319: Move containers/{uuid}/log/ to container_requests/{uuid}/log/ added
Updated by Brett Smith about 1 year ago
- Related to Idea #20955: Crunch log transition documentation issues added