Feature #12167
closed[SDKs] [API] [keepstore] [keepproxy] Facilitate request tracing for all services
Description
Background¶
When an arv-put process is slow or broken, it is useful to grep logs for related server-side activity. To make this easy, arv-put should generate a "request ID" at startup, and all components should propagate this ID to log files and other Arvados components.
Specific improvements¶
- A request ID looks like "req-" + 20 random alphanum chars, e.g.,
req-iNjNZUDqU35yxSXFYJ8x
- (Done) If running in verbose mode, arv-put should log the request ID to stderr as soon as it's generated.
- Whether running in verbose mode or not, arv-put should log the request ID in every error log. Presumably the easiest way to do this is to include it in every log message, just like timestamp and PID. Thus, a failed arv-put will always log its request ID, because every failure mode causes a log (make sure this is true even for SIGINT).
- (Done) Each keepstore/keepproxy/API request made by arv-put should include a request header like
X-Request-Id: req-iNjNZUDqU35yxSXFYJ8x
(no ETag-esque quoting). - (Done) When keepproxy receives an X-Request-Id request header, it should propagate that same header when it makes keepstore requests in the course of servicing the incoming request instead of inventing its own. Keepproxy's request log should include whichever request ID was used. Keepproxy's generated IDs should be updated to the above format.
- keepproxy should also send the X-Request-Id when making API requests.
- (Done) When keepstore receives an X-Request-Id request header, it should include the header value in the corresponding request log; otherwise it should generate a random one.
- (Done) When API server receives an X-Request-Id request header, it should include the header value in the corresponding request log; otherwise it should generate a random one.
- (Done) Workbench should include its request ID in an X-Request-Id header instead of a "current_request_id" query parameter. (Current code tries to do this, but it doesn't work. Rails middleware is probably overwriting it.)
- (Done) Workbench's generated IDs should also be updated to the above format.
- (Done) When Workbench receives an X-Request-Id request header, it should use that header value instead of generating a new request ID.
- keep-web, too.
Of course, incoming headers can be broken or malicious, so logs should use appropriate quoting.
Related issues
Updated by Tom Clegg about 7 years ago
- Subject changed from [keep] [keepproxy] [API] [arv-put] make a "current_request_id" in arv-put just like in workbench to [SDKs] [API] [keepstore] [keepproxy] Propagate and log request IDs to permit tracing of arv-put processes
- Description updated (diff)
Updated by Tom Clegg about 7 years ago
- Subject changed from [SDKs] [API] [keepstore] [keepproxy] Propagate and log request IDs to permit tracing of arv-put processes to [SDKs] [API] [keepstore] [keepproxy] Facilitate request tracing for arv-put
- Description updated (diff)
Updated by Tom Clegg about 7 years ago
- Tracker changed from Bug to Feature
- Project changed from 35 to Arvados
Updated by Tom Morris about 7 years ago
- Target version set to Arvados Future Sprints
Updated by Tom Clegg about 7 years ago
This should be done in such a way that all clients can benefit from it, although the one we're most interested in is arv-put. Presumably, this means the SDK generates one request ID per process and passes it in all API calls; and arv-put gets it from the SDK in order to log it at all the right times.
Updated by Tom Morris about 7 years ago
I'm not sure I see the value of having the backends (Keep & API servers) generate random request IDs rather than just using a fixed string like "None" or leaving it blank. If the logging is noticeably different we can track where we need to add additional request IDs on the client side.
Updated by Tom Morris about 7 years ago
- Target version changed from Arvados Future Sprints to 2017-11-08 Sprint
Updated by Tom Clegg about 7 years ago
Workbench and API server portions:
12167-workbench-request-id @ 5f46fe83d8a15048c4a5b90553c379803e99b135
Updated by Lucas Di Pentima about 7 years ago
The updates at branch 12167-workbench-request-id
LGTM.
If the req-id is created from the API when not provided, would it be useful to add another prefix to identify which component created the id? For example: req-api-xxxxx, req-wb-xxxx, and arv-put
would be able to use it's own req-put-xxxx. I believe this would allow us do what TomM said on note-8, identify whenever there're clients not creating their own ids.
Also, it would allow us to extract some statistics to compare clients' "chattiness".
Updated by Tom Clegg about 7 years ago
- Target version changed from 2017-11-08 Sprint to 2017-11-22 Sprint
Updated by Tom Clegg almost 7 years ago
- Target version changed from 2017-11-22 Sprint to 2017-12-06 Sprint
Updated by Tom Clegg almost 7 years ago
12167-keep-request-id @ b21dbd5c2fec1c8c61d232b2cf7f0a68240d9203
todo: merge #12483 to avoid conflicts in httpserver, then restore TTFB timing in logs
Updated by Tom Clegg almost 7 years ago
restored TTFB timing without merge conflicts by adding a separate responseTimer wrapper around ResponseWriter. Could combine those two after merge.
12167-keep-request-id @ 4412d92e672d4df12ffc04c2b747b750e313cae9- keepstore logs incoming request ID if any, otherwise a generated one
- keepproxy logs incoming request ID if any, otherwise a generated one, and propagates it to keepstore requests
- keepproxy logs in JSON
Updated by Lucas Di Pentima almost 7 years ago
Some questions/comments:
- There are a couple of tests failing because of
logging_router.go
disappearance. - The story description requires an ID of ‘req-‘ + 20 random alphanum chars, should
IDGenerator
be updated to work like this? (if that’s a hard requirement) - Should these headers be truncated to some max amount of chars to avoid potentially malicious activity?
- Some tests may be needed
Updated by Tom Clegg almost 7 years ago
Lucas Di Pentima wrote:
- There are a couple of tests failing because of
logging_router.go
disappearance.
Fixed keepstore tests.
- The story description requires an ID of ‘req-‘ + 20 random alphanum chars, should
IDGenerator
be updated to work like this? (if that’s a hard requirement)
Indeed, timestamps are not very random, nor do they reach 20 chars in base 36. Fixed.
START: logger_test.go:28: Suite.TestLogRequests map[string]interface {}{"reqMethod":"GET", "time":"2017-11-29T10:56:04.996722515-05:00", "RequestID":"req-e254kixtfyxeaeyzgf3b", "msg":"request", "reqBytes":0, "reqPath":"bar", "level":"info", "remoteAddr":"", "reqForwardedFor":"1.2.3.4:12345"} map[string]interface {}{"level":"info", "remoteAddr":"", "timeToStatus":0.000227, "timeTotal":0.000246, "respBytes":11, "time":"2017-11-29T10:56:04.996982684-05:00", "msg":"response", "reqPath":"bar", "timeWriteBody":1.8e-05, "RequestID":"req-e254kixtfyxeaeyzgf3b", "reqBytes":0, "reqForwardedFor":"1.2.3.4:12345", "reqMethod":"GET", "respStatus":"", "respStatusCode":0} PASS: logger_test.go:28: Suite.TestLogRequests 0.001s
- Should these headers be truncated to some max amount of chars to avoid potentially malicious activity?
Go's http server has a MaxHeaderBytes limit, default 1MB. I guess this is as safe as anything else in the request that gets logged.
- Some tests may be needed
Added a logging test.
12167-keep-request-id @ 20c3fbd046f726590774c5d7a1897835080a34e3
Updated by Tom Clegg almost 7 years ago
- Target version changed from 2017-12-06 Sprint to 2017-12-20 Sprint
Updated by Tom Clegg almost 7 years ago
12167-python-request-id @ b7eae78616dd72ddda597f414f3c208b099c0a27
$ arv-put --no-cache output.txt 2017-12-08 10:32:19 arvados.arv_put[19849] INFO: X-Request-Id: req-33mgkwu9wqvbqufrab5d 2017-12-08 10:32:19 arvados.arv_put[19849] INFO: No cache usage requested for this run. 0M / 0M 100.0% 2017-12-08 10:32:21 arvados.arv_put[19849] INFO: 2017-12-08 10:32:21 arvados.arv_put[19849] INFO: Collection saved as 'Saved at 2017-12-08 15:32:19 UTC by tom@nelle' 9tee4-4zz18-elcxwboktnhm5wc
grep API logs for req-33mgkwu9wqvbqufrab5d:
{"method":"GET","path":"/arvados/v1/users/current","format":"json","controller":"Arvados::V1::UsersController","action":"current","status":200,"duration":5.6,"view":0.35,"db":0.34,"request_id":"req-33mgkwu9wqvbqufrab5d","client_ipaddr":"24.67.204.71","client_auth":"9tee4-gj3su-55u9mmetngu39fd","params":{"alt":"json"},"@timestamp":"2017-12-08T15:32:19.621322066Z","@version":"1","message":"[200] GET /arvados/v1/users/current (Arvados::V1::UsersController#current)"}
{"method":"GET","path":"/arvados/v1/keep_services/accessible","format":"json","controller":"Arvados::V1::KeepServicesController","action":"accessible","status":200,"duration":6.18,"view":0.44,"db":1.81,"request_id":"req-33mgkwu9wqvbqufrab5d","client_ipaddr":"24.67.204.71","client_auth":"9tee4-gj3su-55u9mmetngu39fd","params":{"alt":"json"},"@timestamp":"2017-12-08T15:32:19.752575351Z","@version":"1","message":"[200] GET /arvados/v1/keep_services/accessible (Arvados::V1::KeepServicesController#accessible)"}
{"method":"POST","path":"/arvados/v1/collections","format":"json","controller":"Arvados::V1::CollectionsController","action":"create","status":200,"duration":41.96,"view":0.43,"db":25.88,"request_id":"req-33mgkwu9wqvbqufrab5d","client_ipaddr":"24.67.204.71","client_auth":"9tee4-gj3su-55u9mmetngu39fd","params":{"manifest_text":". 875ec5b819be5be78bf7ac2eb9ba1417+18+A6a9cc649be1699084ad83d2997a9d4d723a733a8@5a3d271b 0:18:output.txt\n","owner_uuid":"9tee4-tpzed-rneoswozqjk5tcv","name":"Saved at 2017-12-08 15:32:19 UTC by tom@nelle","replication_desired":null,"ensure_unique_name":"true","alt":"json","collection":{"owner_uuid":"9tee4-tpzed-rneoswozqjk5tcv","replication_desired":null,"manifest_text":". 875ec5b819be5be78bf7ac2eb9ba1417+18+A6a9cc649be1699084ad83d2997a9d4d723a733a8@5a3d271b 0:18:output.txt\n","name":"Saved at 2017-12-08 15:32:19 UTC by tom@nelle"}},"@timestamp":"2017-12-08T15:32:21.061234728Z","@version":"1","message":"[200] POST /arvados/v1/collections (Arvados::V1::CollectionsController#create)"}
keep0 keepstore:
2017-12-08_15:34:06.77212 {"RequestID":"req-33mgkwu9wqvbqufrab5d","level":"info","msg":"request","remoteAddr":"10.100.32.7:38904","reqBytes":18,"reqForwardedFor":"","reqMethod":"PUT","reqPath":"875ec5b819be5be78bf7ac2eb9ba1417","time":"2017-12-08T15:34:06.771978647Z"}
2017-12-08_15:34:06.84978 {"RequestID":"req-33mgkwu9wqvbqufrab5d","level":"info","msg":"response","remoteAddr":"10.100.32.7:38904","reqBytes":18,"reqForwardedFor":"","reqMethod":"PUT","reqPath":"875ec5b819be5be78bf7ac2eb9ba1417","respBytes":87,"respStatus":"","respStatusCode":0,"time":"2017-12-08T15:34:06.849590663Z","timeToStatus":0.077579,"timeTotal":0.077599,"timeWriteBody":0.000020}
keep1 keepstore:
2017-12-08_15:39:07.37169 {"RequestID":"req-33mgkwu9wqvbqufrab5d","level":"info","msg":"request","remoteAddr":"10.100.32.7:55000","reqBytes":18,"reqForwardedFor":"","reqMethod":"PUT","reqPath":"875ec5b819be5be78bf7ac2eb9ba1417","time":"2017-12-08T15:39:07.371533438Z"}
2017-12-08_15:39:07.45356 {"RequestID":"req-33mgkwu9wqvbqufrab5d","level":"info","msg":"response","remoteAddr":"10.100.32.7:55000","reqBytes":18,"reqForwardedFor":"","reqMethod":"PUT","reqPath":"875ec5b819be5be78bf7ac2eb9ba1417","respBytes":87,"respStatus":"","respStatusCode":0,"time":"2017-12-08T15:39:07.453427752Z","timeToStatus":0.081692,"timeTotal":0.081911,"timeWriteBody":0.000219}
keepproxy:
2017-12-08_15:34:06.72716 {"RequestID":"req-33mgkwu9wqvbqufrab5d","level":"info","msg":"request","remoteAddr":"127.0.0.1:33230","reqBytes":18,"reqForwardedFor":"24.67.204.71","reqMethod":"PUT","reqPath":"875ec5b819be5be78bf7ac2eb9ba1417","time":"2017-12-08T15:34:06.727034881Z"}
2017-12-08_15:34:06.85703 {"RequestID":"req-33mgkwu9wqvbqufrab5d","level":"info","msg":"response","remoteAddr":"127.0.0.1:33230","reqBytes":18,"reqForwardedFor":"24.67.204.71","reqMethod":"PUT","reqPath":"875ec5b819be5be78bf7ac2eb9ba1417","respBytes":86,"respStatus":"","respStatusCode":0,"time":"2017-12-08T15:34:06.856949973Z","timeToStatus":0.129809,"timeTotal":0.129891,"timeWriteBody":0.000082}
Updated by Tom Clegg almost 7 years ago
Added tests.
12167-python-request-id @ 4453022a516e2b1deb30a71d8ee811d6593c44c3
Updated by Lucas Di Pentima almost 7 years ago
Just a couple of comments:
- Shouldn't the
KeepXRequestId
class be calledKeepXRequestIdTestCase
just for code style purposes? - Wouldn’t be convenient to add a couple of tests to confirm that
arv-get
&arv-put
select a properly formatted req-id (just in case we change thearvados.util.new_request_id()
call to something else in the future)?
Apart from that, LGTM.
Updated by Tom Clegg almost 7 years ago
- added arv-get and arv-put tests
- renamed KeepXRequestId to KeepXRequestIdTestCase
Updated by Tom Clegg almost 7 years ago
- Related to Idea #12512: [Workbench] Implement X-Request-ID to enable request tracing added
Updated by Tom Morris almost 7 years ago
- Target version changed from 2017-12-20 Sprint to 2018-01-17 Sprint
Updated by Tom Clegg almost 7 years ago
- Subject changed from [SDKs] [API] [keepstore] [keepproxy] Facilitate request tracing for arv-put to [SDKs] [API] [keepstore] [keepproxy] Facilitate request tracing for all services
- Story points changed from 2.0 to 1.0
Updated by Tom Morris almost 7 years ago
- Target version changed from 2018-01-17 Sprint to 2018-01-31 Sprint
Updated by Tom Clegg almost 7 years ago
- Target version changed from 2018-01-31 Sprint to Arvados Future Sprints
Updated by Tom Morris over 6 years ago
- Assigned To deleted (
Tom Clegg) - Target version changed from Arvados Future Sprints to 2018-03-14 Sprint
Updated by Tom Morris over 6 years ago
- Related to Idea #13147: [arv-put] Add request ID to requests for tracability added
Updated by Lucas Di Pentima over 6 years ago
Pending tasks¶
- Right now
arv-put
is logging the request-id on the first error message log, confirm if this is enough or should be logged on every error message.- Also,
arv-put
isn't logging the request id when handling SIGINT.
- Also,
- Ruby client support.
- Keep-web with related Go SDK updates.
- Confirm if keepproxy is sending the X-Request-Id when making API requests. (as it's not marked as "done" on the description)
Updated by Lucas Di Pentima over 6 years ago
- Target version changed from 2018-03-14 Sprint to 2018-03-28 Sprint
Updated by Lucas Di Pentima over 6 years ago
Branch 12167-arvput-log-on-signals
- 247f218e49da73cfcd5b27a4990f16e3a3231ddd
Test run: https://ci.curoverse.com/job/developer-run-tests/657/
arv-put
logs the request id before exiting because of SIGINT, SIGTERM & SIGQUIT.
Ran the test locally 100 times and didn't got any failure, so although it uses time.sleep(), I think it's pretty safe. Instead of waiting 1 second before sending the signal, I tried to capture some stdout from the process prior to sending it but I always got no output from pipe.stdout.read()
, even if I waited several secs.
Updated by Tom Clegg over 6 years ago
select.select(..., 10) seems to work reliably in place of time.sleep(1).
I also replaced "yes" with "sleep 10". With "yes", failing tests seemed to result in a python process trying to OOM-kill the host.
12167-arvput-log-on-signals @ ea8013a2d0cd0faeeec132f5d950fb3a5461d6ca
If that looks good to you, then LGTM. Thanks.
Updated by Lucas Di Pentima over 6 years ago
Very cool idea, thanks! I'll merge to master.
Updated by Lucas Di Pentima over 6 years ago
- Target version changed from 2018-03-28 Sprint to 2018-04-11 Sprint
Updated by Lucas Di Pentima over 6 years ago
- Target version changed from 2018-04-11 Sprint to 2018-04-25 Sprint
Updated by Tom Clegg over 6 years ago
- Assigned To changed from Lucas Di Pentima to Tom Clegg
Updated by Tom Clegg over 6 years ago
- Target version changed from 2018-04-25 Sprint to 2018-05-09 Sprint
Updated by Tom Clegg over 6 years ago
- keep-web logs requests and responses as JSON via logrus (catching up with keepstore, keepproxy, ws)
- keep-web logs the incoming request-id, if any (otherwise a newly generated one)
- keep-web propagates request-id in outgoing API and keep requests
- keepproxy propagates request-id in outgoing API (which it didn't before) and keep requests (which it did before using its custom http client, but now does with the new sdk/go/keepclient facility)
Updated by Lucas Di Pentima over 6 years ago
Some comments/questions:
- Some tests on
services/keep-web
are failing (cadaver_test.go
file) on my local run. On Jenkins it seems that the 'tests-remainder' job was cancelled - File
sdk/go/arvados/client.go
- Lines 109-112: I think the new
reqid
retrieval/generation could be moved inside the conditional at line 114 to avoid unnecessary work - Line 253: What would happen if
reqid
is an empty string? I believe it could be the case when an older client doesn’t provide the appropriateX-Request-Id
header, for example atservices/keep-web/handler.go:431
- Lines 109-112: I think the new
- File
sdk/go/keepclient/keepclient.go
- Line 551: Should
kc.RequestID
be assigned with the newly generated Id also, to avoid a new id on everysetRequestHeaders
call?
- Line 551: Should
- Do you think adding some tests would be convenient?
Updated by Tom Clegg over 6 years ago
Lucas Di Pentima wrote:
- Some tests on
services/keep-web
are failing (cadaver_test.go
file) on my local run. On Jenkins it seems that the 'tests-remainder' job was cancelled
Could be the $HOME problem we fixed in #13111 by updating an old (<1.9) Go runtime. I've merged master here, so run-tests.sh should demand Go 1.10 now.
- File
sdk/go/arvados/client.go
- Lines 109-112: I think the new
reqid
retrieval/generation could be moved inside the conditional at line 114 to avoid unnecessary work
Indeed, fixed.
- Line 253: What would happen if
reqid
is an empty string? I believe it could be the case when an older client doesn’t provide the appropriateX-Request-Id
header, for example atservices/keep-web/handler.go:431
In that specific case the header has been set by httpserver.AddRequestIDs() but I've updated client.go so WithRequestID("") causes a new ID to be generated for each request.
- File
sdk/go/keepclient/keepclient.go
- Line 551: Should
kc.RequestID
be assigned with the newly generated Id also, to avoid a new id on everysetRequestHeaders
call?
Ah yes, this was generating a new request ID for each outgoing request to service a single Get or Put. If the caller hasn't set a request ID on the client object, I think what we want is a new ID for each call to Get, Put, etc. I've updated accordingly.
- Do you think adding some tests would be convenient?
Yes! Done.
12167-go-request-id @ da570c74ba7949b5e5cc89cfa2db2af603890137
Updated by Tom Clegg over 6 years ago
- Status changed from In Progress to Resolved