Project

General

Profile

Actions

Idea #18947

closed

Move remaining services into arvados-server subcommands (keepproxy, keep-web, arv-git-httpd)

Added by Tom Clegg almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-
Release relationship:
Auto

Subtasks 6 (0 open6 closed)

Task #18957: Review 18947-keepproxyResolvedTom Clegg04/07/2022Actions
Task #19008: Review 18947-dispatch-localResolvedTom Clegg04/07/2022Actions
Task #19012: Review 18947-githttpdResolvedWard Vandewege04/07/2022Actions
Task #19085: Review 18947-keep-webResolvedTom Clegg04/07/2022Actions
Task #19090: Review 18947-keep-balanceResolvedWard Vandewege04/07/2022Actions
Task #19101: Review 18947-healthResolvedWard Vandewege04/07/2022Actions

Related issues

Has duplicate Arvados - Idea #13874: [CLI] arvados-server "health" subcommandDuplicateActions
Actions #1

Updated by Tom Clegg almost 2 years ago

  • Target version set to 2022-04-13 Sprint
Actions #2

Updated by Tom Clegg almost 2 years ago

  • Status changed from New to In Progress

keepproxy

The generic service.Command logging code gives us request/response logs, and we now have a SetResponseLogFields func, so we now get the standard http response log with the extra keepproxy-specific fields (user_full_name, locatorOut)

Sample keepproxy logs from test on 9tee4 (wb1 upload)

{"ClusterID":"9tee4","Listen":"[::]:25107","PID":30054,"Service":"keepproxy","URL":"http://0.0.0.0:25107/","Version":"e7187c72bd8b2ed54d9af129401294e13a0859f9-dev (go1.17.7)","level":"info","msg":"listening","time":"2022-04-05T19:22:22.087553652Z"}
{"ClusterID":"9tee4","PID":30054,"RequestID":"req-1ryjp6tc16t8c1eneahx","level":"info","msg":"request","remoteAddr":"10.100.35.254:35952","reqBytes":55838099,"reqForwardedFor":"23.16.189.12","reqHost":"keep.9tee4.arvadosapi.com","reqMethod":"POST","reqPath":"","reqQuery":"","time":"2022-04-05T19:23:18.209450201Z"}
{"ClusterID":"9tee4","PID":30054,"RequestID":"req-1ryjp6tc16t8c1eneahx","err":null,"expectLength":55838099,"level":"info","locator":"ba1c2af7f9ef67ee873f2e3fbf733fb7+55838099","msg":"response","remoteAddr":"10.100.35.254:35952","reqBytes":55838099,"reqForwardedFor":"23.16.189.12","reqHost":"keep.9tee4.arvadosapi.com","reqMethod":"POST","reqPath":"","reqQuery":"","respBytes":92,"respStatus":"OK","respStatusCode":200,"time":"2022-04-05T19:23:22.605628387Z","timeToStatus":4.396183,"timeTotal":4.396200,"timeWriteBody":0.000017,"userFullName":"Tom Clegg","userUUID":"9tee4-tpzed-rneoswozqjk5tcv","wantReplicas":2,"wroteReplicas":2}

Notes

  • The user fields used to be "user_uuid" and "user_full_name" -- now changed to "userUUID" and "userFullName" to match the style of the other fields.
  • The user/locator fields used to be in a separate log entry from the HTTP response details, with msg="Block upload" or "Block download" -- now they are all in one response entry, so msg=response in both cases, and one can tell whether it's read or write by checking presence/absence of wroteReplicas.

Old log entries for startup + a single request:

{"ClusterID":"9tee4","PID":8400,"level":"info","msg":"keepproxy 2.4.0~dev20220321141729 started","time":"2022-03-21T16:11:53.788787289Z"}
{"ClusterID":"9tee4","PID":8400,"level":"info","msg":"listening at [::]:25107","time":"2022-03-21T16:11:53.837066749Z"}
{"ClusterID":"9tee4","PID":8400,"RequestID":"req-nxkc81wwljg2g3gi770a","level":"info","msg":"request","remoteAddr":"10.100.35.254:57342","reqBytes":179,"reqForwardedFor":"35.193.239.132","reqHost":"keep.9tee4.arvadosapi.com","reqMethod":"PUT","reqPath":"ca398ed2b6d31ed830ed77eccbd3700a","reqQuery":"","time":"2022-03-21T16:16:20.713091292Z"}
{"ClusterID":"9tee4","PID":8400,"level":"info","locator":"ca398ed2b6d31ed830ed77eccbd3700a+179","msg":"Block upload","time":"2022-03-21T16:16:20.718637950Z","user_full_name":"Automated Diagnostics","user_uuid":"9tee4-tpzed-oyphd3jt7fhlhsy"}
{"ClusterID":"9tee4","PID":8400,"level":"info","msg":"35.193.239.132,10.100.35.254:57342 PUT /ca398ed2b6d31ed830ed77eccbd3700a 200 179 2 2 ca398ed2b6d31ed830ed77eccbd3700a+179+A15878239ce217acaf9c29bb5457701a226d8f5f1@624b19d4 \u003cnil\u003e","time":"2022-03-21T16:16:20.718724709Z"}
{"ClusterID":"9tee4","PID":8400,"RequestID":"req-nxkc81wwljg2g3gi770a","level":"info","msg":"response","remoteAddr":"10.100.35.254:57342","reqBytes":179,"reqForwardedFor":"35.193.239.132","reqHost":"keep.9tee4.arvadosapi.com","reqMethod":"PUT","reqPath":"ca398ed2b6d31ed830ed77eccbd3700a","reqQuery":"","respBytes":87,"respStatus":"OK","respStatusCode":200,"time":"2022-03-21T16:16:20.718760274Z","timeToStatus":0.005627,"timeTotal":0.005667,"timeWriteBody":0.000040}

18947-keepproxy @ e7187c72bd8b2ed54d9af129401294e13a0859f9 -- developer-run-tests: #3023

Actions #3

Updated by Lucas Di Pentima almost 2 years ago

18947-keepproxy LGTM. I got worried about the systemd file removal, but I think it's not needed by the salt installer, or at least a quick grep search didn't turned any results.

Actions #4

Updated by Tom Clegg almost 2 years ago

  • Target version changed from 2022-04-13 Sprint to 2022-04-27 Sprint
Actions #5

Updated by Tom Clegg almost 2 years ago

crunch-dispatch-local and its tests are a mess. I fixed a few things so
  • test suite runs in 30s instead of ~6 minutes
  • integration test no longer hangs if you run it twice without manually resetting the test DB fixtures
  • tests no longer panic if you use -check.f= to select tests and TestSuite.TestIntegration doesn't run first

The main remaining problem is that the whole thing is a terrible mess with globals and everything. The 27-second test case passes all its checks in 3s but then sits around waiting for goroutines to stop retrying, and there are a lot of them for some reason, and they take a long time for some reason. I ran out of patience.

Not going to bother moving this into arvados-server (see #15370) but might as well merge this so we don't waste as much CI time.

18947-dispatch-local @ 79ed8ab99bb2ad779c873620704519cf02177962 -- developer-run-tests: #3038

Actions #6

Updated by Lucas Di Pentima almost 2 years ago

18947-dispatch-local LGTM, provided tests pass.

Actions #8

Updated by Lucas Di Pentima almost 2 years ago

  • There're still some "arv-git-httpd" references on the documentation.
  • I've re-built the arvbox docker image and tried to start it, and I'm getting the following errors:
...
2022-04-25_13:15:24.15538 ++ GOPATH=/var/lib/gopath
2022-04-25_13:15:24.15538 ++ mkdir -p /var/lib/gopath
2022-04-25_13:15:24.16073 ++ cd /usr/src/arvados
2022-04-25_13:15:24.16075 ++ [[ 1000 = 0 ]]
2022-04-25_13:15:24.16076 ++ [[ ! -f /usr/local/bin/arvados-server ]]
2022-04-25_13:15:24.16694 + cd /usr/local/bin
2022-04-25_13:15:24.16703 + ln -sf arvados-server arvados-git-httpd
2022-04-25_13:15:24.17061 + test '' = --only-deps
2022-04-25_13:15:24.17080 + /usr/local/lib/arvbox/runsu.sh flock /var/lib/arvados-arvbox/cluster_config.yml.lock /usr/local/lib/arvbox/cluster-config.sh
2022-04-25_13:15:24.19017 chown: changing ownership of '/dev/stderr': Operation not permitted
2022-04-25_13:15:24.19307 chpst: fatal: unable to setgroups: permission denied
Actions #9

Updated by Tom Clegg almost 2 years ago

18947-githttpd @ 01a3368db1de44656e82fbc066e85ae4feb5eb75 -- developer-run-tests: #3074
  • update doc refs to arv-git-httpd (but left the filenames/anchors alone to avoid breaking links/bookmarks)
  • fix extra runsu.sh usage causing chpst error (I'm not sure why keepproxy didn't fail the same way with the same double-runsu.sh setup, but I fixed it in both places)
Actions #10

Updated by Ward Vandewege almost 2 years ago

Tom Clegg wrote:

18947-githttpd @ 01a3368db1de44656e82fbc066e85ae4feb5eb75 -- developer-run-tests: #3074
  • update doc refs to arv-git-httpd (but left the filenames/anchors alone to avoid breaking links/bookmarks)
  • fix extra runsu.sh usage causing chpst error (I'm not sure why keepproxy didn't fail the same way with the same double-runsu.sh setup, but I fixed it in both places)

LGTM, thanks!

Actions #11

Updated by Peter Amstutz almost 2 years ago

  • Target version changed from 2022-04-27 Sprint to 2022-05-11 sprint
Actions #13

Updated by Ward Vandewege almost 2 years ago

Tom Clegg wrote:

18947-keep-web @ 9acc8cb9cd9ca4429712b0d31b647e9a6ecf2d96 -- developer-run-tests: #3082

In

+++ b/cmd/arvados-server/keep-web.service
@@ -3,15 +3,17 @@
 # SPDX-License-Identifier: AGPL-3.0

 [Unit]
-Description=Arvados Keep web gateway
+Description=Arvados Keep WebDAV gateway
 Documentation=https://doc.arvados.org/

Shouldn't the description mention the S3 interface too, if we're going to name WebDAV?

Nice cleanup in lib/service/cmd.go removing the duplicative interceptMetricsReqs(cluster.ManagementToken, reg, log, ...) wrapper!

LGTM thanks.

Actions #15

Updated by Ward Vandewege almost 2 years ago

Tom Clegg wrote:

18947-keep-balance @ d70eee2298afe8d082c72429a919bd94523c6bde -- developer-run-tests: #3089

LGTM, thanks! This one was very straighforward.

Actions #16

Updated by Tom Clegg almost 2 years ago

18947-health @ 6b1a602cf1de1b6488c7d6b55e8d99ca8f707160 -- developer-run-tests: #3103

Another one where nearly all of the changes were already done.

Actions #18

Updated by Tom Clegg almost 2 years ago

  • Target version changed from 2022-05-11 sprint to 2022-05-25 sprint
Actions #19

Updated by Ward Vandewege almost 2 years ago

Tom Clegg wrote:

18947-dispatch-slurm @ 217aba73f6366cf1af30683baa6d0d5d1e3407a9 -- developer-run-tests: #3106

LGTM. Just one question - is this a good time to also rename crunch-dispatch-slurm to arvados-dispatch-slurm or should that be done in a future ticket?

Actions #20

Updated by Ward Vandewege almost 2 years ago

Tom Clegg wrote:

18947-health @ 6b1a602cf1de1b6488c7d6b55e8d99ca8f707160 -- developer-run-tests: #3103

Another one where nearly all of the changes were already done.

LGTM thanks!

Actions #21

Updated by Peter Amstutz almost 2 years ago

  • Target version changed from 2022-05-25 sprint to 2022-06-08 sprint
Actions #22

Updated by Tom Clegg almost 2 years ago

  • Status changed from In Progress to Resolved
Actions #23

Updated by Peter Amstutz over 1 year ago

  • Release set to 47
Actions #24

Updated by Brett Smith about 1 year ago

  • Has duplicate Idea #13874: [CLI] arvados-server "health" subcommand added
Actions

Also available in: Atom PDF