Project

General

Profile

Actions

Feature #19889

open

access current container logs at /arvados/v1/containers/{uuid}/log/{filename}

Added by Tom Clegg 3 months ago. Updated 3 days ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
-
Start date:
03/23/2023
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
Story points:
3.0

Description

See Efficient live access to container logs

Provide access to the logs for a locked/running container, including the portion that has not yet been flushed to keep.

Included:
  • read-only webdav handler in crunch-run (may involve refactoring/exporting some parts of keep-web to avoid copying code)
  • webdav handler in controller that proxies to crunch-run gateway, keep-web (webdav.InternalURLs), or an "empty collection" stub, depending on whether container is active/finished
Not included:
  • log_events API
  • workbench2

Subtasks 1 (1 open0 closed)

Task #20244: Review 19889-live-log-webdavIn ProgressLucas Di Pentima03/23/2023

Actions

Related issues

Related to Arvados Epics - Story #16442: Scalable + reliable container loggingNew03/15/202305/31/2023

Actions
Related to Arvados - Feature #18790: Access live container logs through arvados-client and crunch-run container gatewayNewTom Clegg

Actions
Has duplicate Arvados - Feature #20217: crunch-run WebDAV service providing access to latest logsDuplicate

Actions
Has duplicate Arvados - Feature #20218: Container log endpoint in controller that routes to either crunch-run or keep-webDuplicate

Actions
Blocks Arvados - Feature #20219: Log panel on container view fetches live logs using periodic range request on new container log endpointNew

Actions
Actions #1

Updated by Tom Clegg 3 months ago

  • Related to Story #16442: Scalable + reliable container logging added
Actions #2

Updated by Peter Amstutz about 2 months ago

  • Release set to 60
Actions #3

Updated by Tom Clegg 18 days ago

  • Release deleted (60)
  • Story points set to 3.0
  • Target version set to To be scheduled
Actions #4

Updated by Tom Clegg 18 days ago

  • Has duplicate Feature #20217: crunch-run WebDAV service providing access to latest logs added
Actions #5

Updated by Tom Clegg 18 days ago

  • Description updated (diff)
Actions #6

Updated by Tom Clegg 18 days ago

  • Has duplicate Feature #20218: Container log endpoint in controller that routes to either crunch-run or keep-web added
Actions #7

Updated by Peter Amstutz 18 days ago

  • Related to Feature #18790: Access live container logs through arvados-client and crunch-run container gateway added
Actions #8

Updated by Peter Amstutz 18 days ago

  • Related to Feature #20219: Log panel on container view fetches live logs using periodic range request on new container log endpoint added
Actions #9

Updated by Peter Amstutz 18 days ago

  • Related to deleted (Feature #20219: Log panel on container view fetches live logs using periodic range request on new container log endpoint)
Actions #10

Updated by Peter Amstutz 18 days ago

  • Blocks Feature #20219: Log panel on container view fetches live logs using periodic range request on new container log endpoint added
Actions #11

Updated by Peter Amstutz 17 days ago

  • Target version changed from To be scheduled to Development 2023-03-29 Sprint
Actions #12

Updated by Peter Amstutz 17 days ago

  • Assigned To set to Tom Clegg
Actions #13

Updated by Tom Clegg 11 days ago

  • Status changed from New to In Progress

19889-live-log-webdav @ a7438bf8ec3a27920905d732c62baf91a355bf27 -- developer-run-tests: #3561

19889-live-log-webdav @ a7438bf8ec3a27920905d732c62baf91a355bf27 -- developer-run-tests: #3564

Most of the diff is splitting the non-ssh-specific stuff out of the ContainerSSH code and using it to controller-to-controller proxying (when crunch-run has a tunnel to a different controller node), controller-to-tunnel-to-gateway proxying (when crunch-run has a tunnel to this controller node), and controller-to-gateway proxying (when crunch-run is directly reachable by tcp).

There are a few changes to lib/controller/router to make the request method and headers usable as inputs to a routableFunc, and pass OPTIONS requests under .../log through to the webdav handler.

Tangential / seemingly unrelated changes:

I spent a while figuring out why moving keep-web's webdavfs wrapper into a separate module caused the controller UpdatePriority test to fail. It turned out the UpdatePriority test was using the (repeatable) seeded prng to generate test data. It only passed because we got lucky with the prng sequence. The webdavfs wrapper disrupted the lucky sequence by using the prng first. I suspect the failure is related to #20240. For the sake of unblocking this branch, I just updated the test to use the lucky sequence explicitly.

railsapi was refusing to update logs on container record when it was referenced by a container request that was already in Final state. This shouldn't happen IRL anyway (the container request wouldn't be final) but we have a test fixture in that situation and it interfered with using the fixture for testing.

Keep-web's webdavfs wrapper had code that transparently created intermediate directories when writing files, to compensate for the fact that empty directories weren't preserved. Now that empty directories are preserved in collections, this isn't needed, so I removed it.

crunchrun was still using the golang.org/x/net/context package instead of the new stdlib context (ISTR this used to be necessary because the dockerclient library didn't work with stdlib context yet). Switched to import "context".

Updated gorilla/mux, and noticed it's no longer being maintained. :(

Actions #14

Updated by Lucas Di Pentima 5 days ago

Sorry for the delay, this was not an easy diff to read for me :). I tried to start a fresh arvbox instance to do some manual testing, but after failing and wasting too much time on that... I gave up.

  • File lib/crunchrun/container_gateway.go
    • Comment at line 84 I believe should mention /arvados/v1/container/{uuid}/log/
  • File lib/controller/localdb/container_gateway_test.go
    • Lines 291-295 are commented out code that I think could be removed.
  • Re: gorilla/mux (not a blocker, I think it can be a different story) would it be a good to time to forking it and use it on our own code? I think we can benefit from GitHub's security warnings this way? (not sure if we would get those otherwise)
  • IIRC, the cadaver tests were an intergration test suite for KeepWeb. Do you think it would be useful/possible to add a test for the "already finished container log" case?
  • For the future: There're many aspects of how the container gateway works that aren't known or fresh in my mind; I tried to find some high-level explanation on the wiki but couldn't find, I think it would be interesting to have some coarse grained explanation for devs to get up to speed without reading the code.
Actions #15

Updated by Tom Clegg 4 days ago

  • Comment at line 84 I believe should mention /arvados/v1/container/{uuid}/log/

Fixed.

  • Lines 291-295 are commented out code that I think could be removed.

Fixed.

  • Re: gorilla/mux (not a blocker, I think it can be a different story) would it be a good to time to forking it and use it on our own code? I think we can benefit from GitHub's security warnings this way? (not sure if we would get those otherwise)

I've just subscribed to security notifications at https://github.com/gorilla/mux (does it notify earlier if we have a fork?)

  • IIRC, the cadaver tests were an intergration test suite for KeepWeb. Do you think it would be useful/possible to add a test for the "already finished container log" case?

Indeed, this turned out to be very useful. Cadaver worked fine in the unfinished case, but in the proxy-to-keep-web case it noticed that the paths in the webdav responses didn't match the request, and bailed out (unless using -t to ignore).

To fix this I needed to change the proxy strategy: use the ID-in-host request form, preserve the original path, and send a new X-Webdav-Prefix header to keep-web so it knows to ignore the "/arvados/v1/containers/{uuid}/log" part.

19889-live-log-webdav @ d7f388108615d981636e464785129ce4a958a7f5 -- developer-run-tests: #3567

retry wb1 developer-run-tests-apps-workbench-integration: #3843

  • For the future: There're many aspects of how the container gateway works that aren't known or fresh in my mind; I tried to find some high-level explanation on the wiki but couldn't find, I think it would be interesting to have some coarse grained explanation for devs to get up to speed without reading the code.

I think the best we have is

Efficient live access to container logs

https://doc.arvados.org/main/architecture/dispatchcloud.html

https://doc.arvados.org/main/architecture/hpc.html

Actions #16

Updated by Peter Amstutz 3 days ago

  • Target version changed from Development 2023-03-29 Sprint to Development 2023-04-12 sprint
Actions

Also available in: Atom PDF