Project

General

Profile

Actions

Feature #19889

closed

access current container logs at /arvados/v1/containers/{uuid}/log/{filename}

Added by Tom Clegg about 1 year ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Story points:
3.0
Release relationship:
Auto

Description

See Efficient live access to container logs

Provide access to the logs for a locked/running container, including the portion that has not yet been flushed to keep.

Included:
  • read-only webdav handler in crunch-run (may involve refactoring/exporting some parts of keep-web to avoid copying code)
  • webdav handler in controller that proxies to crunch-run gateway, keep-web (webdav.InternalURLs), or an "empty collection" stub, depending on whether container is active/finished
Not included:
  • log_events API
  • workbench2

Subtasks 1 (1 open0 closed)

Task #20244: Review 19889-live-log-webdavIn ProgressTom Clegg03/23/2023Actions

Related issues

Related to Arvados Epics - Idea #16442: Scalable + reliable container loggingResolved03/15/202308/31/2023Actions
Related to Arvados - Feature #18790: Access live container logs through arvados-client and crunch-run container gatewayResolvedTom Clegg04/04/2023Actions
Related to Arvados - Feature #20319: Move containers/{uuid}/log/ to container_requests/{uuid}/log/ResolvedTom Clegg04/21/2023Actions
Related to Arvados - Idea #20955: Crunch log transition documentation issuesResolvedPeter Amstutz09/15/2023Actions
Has duplicate Arvados - Feature #20217: crunch-run WebDAV service providing access to latest logsDuplicateActions
Has duplicate Arvados - Feature #20218: Container log endpoint in controller that routes to either crunch-run or keep-webDuplicateActions
Blocks Arvados - Feature #20219: Log panel on container view fetches live logs using periodic range request on new container log endpointResolvedStephen Smith07/28/2023Actions
Actions #1

Updated by Tom Clegg about 1 year ago

  • Related to Idea #16442: Scalable + reliable container logging added
Actions #2

Updated by Peter Amstutz about 1 year ago

  • Release set to 60
Actions #3

Updated by Tom Clegg 12 months ago

  • Release deleted (60)
  • Story points set to 3.0
  • Target version set to To be scheduled
Actions #4

Updated by Tom Clegg 12 months ago

  • Has duplicate Feature #20217: crunch-run WebDAV service providing access to latest logs added
Actions #5

Updated by Tom Clegg 12 months ago

  • Description updated (diff)
Actions #6

Updated by Tom Clegg 12 months ago

  • Has duplicate Feature #20218: Container log endpoint in controller that routes to either crunch-run or keep-web added
Actions #7

Updated by Peter Amstutz 12 months ago

  • Related to Feature #18790: Access live container logs through arvados-client and crunch-run container gateway added
Actions #8

Updated by Peter Amstutz 12 months ago

  • Related to Feature #20219: Log panel on container view fetches live logs using periodic range request on new container log endpoint added
Actions #9

Updated by Peter Amstutz 12 months ago

  • Related to deleted (Feature #20219: Log panel on container view fetches live logs using periodic range request on new container log endpoint)
Actions #10

Updated by Peter Amstutz 12 months ago

  • Blocks Feature #20219: Log panel on container view fetches live logs using periodic range request on new container log endpoint added
Actions #11

Updated by Peter Amstutz 12 months ago

  • Target version changed from To be scheduled to Development 2023-03-29 Sprint
Actions #12

Updated by Peter Amstutz 12 months ago

  • Assigned To set to Tom Clegg
Actions #13

Updated by Tom Clegg 12 months ago

  • Status changed from New to In Progress

19889-live-log-webdav @ a7438bf8ec3a27920905d732c62baf91a355bf27 -- developer-run-tests: #3561

19889-live-log-webdav @ a7438bf8ec3a27920905d732c62baf91a355bf27 -- developer-run-tests: #3564

Most of the diff is splitting the non-ssh-specific stuff out of the ContainerSSH code and using it to controller-to-controller proxying (when crunch-run has a tunnel to a different controller node), controller-to-tunnel-to-gateway proxying (when crunch-run has a tunnel to this controller node), and controller-to-gateway proxying (when crunch-run is directly reachable by tcp).

There are a few changes to lib/controller/router to make the request method and headers usable as inputs to a routableFunc, and pass OPTIONS requests under .../log through to the webdav handler.

Tangential / seemingly unrelated changes:

I spent a while figuring out why moving keep-web's webdavfs wrapper into a separate module caused the controller UpdatePriority test to fail. It turned out the UpdatePriority test was using the (repeatable) seeded prng to generate test data. It only passed because we got lucky with the prng sequence. The webdavfs wrapper disrupted the lucky sequence by using the prng first. I suspect the failure is related to #20240. For the sake of unblocking this branch, I just updated the test to use the lucky sequence explicitly.

railsapi was refusing to update logs on container record when it was referenced by a container request that was already in Final state. This shouldn't happen IRL anyway (the container request wouldn't be final) but we have a test fixture in that situation and it interfered with using the fixture for testing.

Keep-web's webdavfs wrapper had code that transparently created intermediate directories when writing files, to compensate for the fact that empty directories weren't preserved. Now that empty directories are preserved in collections, this isn't needed, so I removed it.

crunchrun was still using the golang.org/x/net/context package instead of the new stdlib context (ISTR this used to be necessary because the dockerclient library didn't work with stdlib context yet). Switched to import "context".

Updated gorilla/mux, and noticed it's no longer being maintained. :(

Actions #14

Updated by Lucas Di Pentima 11 months ago

Sorry for the delay, this was not an easy diff to read for me :). I tried to start a fresh arvbox instance to do some manual testing, but after failing and wasting too much time on that... I gave up.

  • File lib/crunchrun/container_gateway.go
    • Comment at line 84 I believe should mention /arvados/v1/container/{uuid}/log/
  • File lib/controller/localdb/container_gateway_test.go
    • Lines 291-295 are commented out code that I think could be removed.
  • Re: gorilla/mux (not a blocker, I think it can be a different story) would it be a good to time to forking it and use it on our own code? I think we can benefit from GitHub's security warnings this way? (not sure if we would get those otherwise)
  • IIRC, the cadaver tests were an intergration test suite for KeepWeb. Do you think it would be useful/possible to add a test for the "already finished container log" case?
  • For the future: There're many aspects of how the container gateway works that aren't known or fresh in my mind; I tried to find some high-level explanation on the wiki but couldn't find, I think it would be interesting to have some coarse grained explanation for devs to get up to speed without reading the code.
Actions #15

Updated by Tom Clegg 11 months ago

  • Comment at line 84 I believe should mention /arvados/v1/container/{uuid}/log/

Fixed.

  • Lines 291-295 are commented out code that I think could be removed.

Fixed.

  • Re: gorilla/mux (not a blocker, I think it can be a different story) would it be a good to time to forking it and use it on our own code? I think we can benefit from GitHub's security warnings this way? (not sure if we would get those otherwise)

I've just subscribed to security notifications at https://github.com/gorilla/mux (does it notify earlier if we have a fork?)

  • IIRC, the cadaver tests were an intergration test suite for KeepWeb. Do you think it would be useful/possible to add a test for the "already finished container log" case?

Indeed, this turned out to be very useful. Cadaver worked fine in the unfinished case, but in the proxy-to-keep-web case it noticed that the paths in the webdav responses didn't match the request, and bailed out (unless using -t to ignore).

To fix this I needed to change the proxy strategy: use the ID-in-host request form, preserve the original path, and send a new X-Webdav-Prefix header to keep-web so it knows to ignore the "/arvados/v1/containers/{uuid}/log" part.

19889-live-log-webdav @ d7f388108615d981636e464785129ce4a958a7f5 -- developer-run-tests: #3567

retry wb1 developer-run-tests-apps-workbench-integration: #3843

  • For the future: There're many aspects of how the container gateway works that aren't known or fresh in my mind; I tried to find some high-level explanation on the wiki but couldn't find, I think it would be interesting to have some coarse grained explanation for devs to get up to speed without reading the code.

I think the best we have is

Efficient live access to container logs

https://doc.arvados.org/main/architecture/dispatchcloud.html

https://doc.arvados.org/main/architecture/hpc.html

Actions #16

Updated by Peter Amstutz 11 months ago

  • Target version changed from Development 2023-03-29 Sprint to Development 2023-04-12 sprint
Actions #17

Updated by Lucas Di Pentima 11 months ago

This LGTM, thanks!

Re:

  • Re: gorilla/mux (not a blocker, I think it can be a different story) would it be a good to time to forking it and use it on our own code? I think we can benefit from GitHub's security warnings this way? (not sure if we would get those otherwise)

I've just subscribed to security notifications at https://github.com/gorilla/mux (does it notify earlier if we have a fork?)

I think it notifies the whole team automatically instead of needing to manually subscribe, but given that you'll probably be the best person to handle those updates, this is good enough :)

Actions #18

Updated by Tom Clegg 11 months ago

While working on #18790 I discovered that arvados-controller routing wasn't working because the /containers/ routes still go through the old generic railsproxy path instead of the new path.

I fixed this, and took the opportunity to add routing for .../arvados/v1/containers/{uuid}/ssh and .../container_gateway as well. The old .../connect/{uuid}/ssh endpoints are still supported, and still used by client-side code, so 2.7 clients will work with 2.6 servers and vice versa. In a future version we can switch client-side code to use the new endpoints, and then eventually delete the old ones. Looking back I should have hacked the routing this way in the first place instead of having a temporary endpoint outside of /containers/.

Also tweaked logging so "file not found" doesn't get logged as an error. Too noisy when clients are polling for stderr.txt to exist, etc.

19889-live-log-webdav @ 6d03fdac5674dc88eff821f5e8ac70642f39a895 -- developer-run-tests: #3573

wb1 retry

developer-run-tests-apps-workbench-integration: #3852

developer-run-tests-apps-workbench-integration: #3853

Actions #19

Updated by Tom Clegg 11 months ago

Another bugfix (unhandled error leading to use of nil pointer)

19889-live-log-webdav @ b402b894c535f95cdeba0631dea1fbb9c7672a07 -- developer-run-tests: #3575

Actions #20

Updated by Lucas Di Pentima 11 months ago

LGTM, thanks!

Actions #21

Updated by Tom Clegg 11 months ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved
Actions #22

Updated by Tom Clegg 11 months ago

  • Related to Feature #20319: Move containers/{uuid}/log/ to container_requests/{uuid}/log/ added
Actions #23

Updated by Brett Smith 6 months ago

  • Release set to 66
Actions #24

Updated by Brett Smith 6 months ago

  • Related to Idea #20955: Crunch log transition documentation issues added
Actions

Also available in: Atom PDF