Feature #11906
closedBasic authenticated http health check ("ping") for each system service
Description
- respond to “GET /_health/ping” on same addr:port as the microservice’s main http server (or “management” server in nodemanager’s case)
- expect header “Authorization: Bearer XXX” where XXX is “ManagementToken” in config file (note "Bearer", not "OAuth2")
- return 404 if configured management token is blank / missing
- return 401 if Authorization: header is missing
- return 403 if Authorization: header does not match configured token
- return JSON, either {"health":"OK"} or {"health":"ERROR"}
It’s OK if the “ping” healthcheck has lots of false-positive potential -- even if “OK” merely means the process is running. The focus here is instrumenting all services, not detecting all failure modes.
Updated by Tom Clegg over 7 years ago
- apiserver
- workbench
- arv-git-httpd
- nodemanager
- keepstore
- keepproxy
- keep-web
arvados-ws(already done #11901)keep-balance(has no http server)crunch-dispatch-slurm(has no http server)webshell/pam(has no http server)arv-mount(has no http server)login-sync(has no http server)
Updated by Radhika Chippada over 7 years ago
- Status changed from New to In Progress
- Target version set to 2017-07-19 sprint
Updated by Lucas Di Pentima over 7 years ago
Reviewing branch 11906-keepstore-ping
- 499e303783a9b5430147ac4eef3bd13ff72968b9
Local test run services/keepstore
was successful, just one question:
- File
services/keepstore/handlers.go
- Lines 643-646: Shouldn’t the response be inside an “else” clause? Go docs say that the
http.Error
caller should ensure no further writes are made toResponseWriter
.
- Lines 643-646: Shouldn’t the response be inside an “else” clause? Go docs say that the
Updated by Radhika Chippada over 7 years ago
@ e6b3f6f6
Lines 643-646 of handlers.go: Shouldn’t the response be inside an “else” clause? Go docs say that the http.Error caller should ensure no further writes are made to ResponseWriter.
You are right; I should add a "return" statement to that if block. (As a side note, go fmt complains about "pyramid" code and prefers a return in the if block than an else block)
Updated by Radhika Chippada over 7 years ago
11906-api-ping @ 943b3e807126ba5dc936ceeeddbee9801dab7308
- Implemented /_health/ping for API server and added controller tests.
- To test manually:
- Add "management_token: abcdefg" to your application.yml file (and restart)
- curl --insecure https://localhost:3030/_health/ping -H "Authorization: Bearer abcdefg" (assuming your api server is running at localhost:3030)
Updated by Lucas Di Pentima over 7 years ago
- File
services/api/app/controllers/arvados/v1/healthcheck_controller.rb
- Lines 25-29: We could save one level of conditionals by getting the
auth_header
before line 21, and chain if/elsif/elsif/end, I think it would be a little more readable.
- Lines 25-29: We could save one level of conditionals by getting the
- There’s a test failing:
services/api/integration/errors_test.rb:26
Updated by Radhika Chippada over 7 years ago
File services/api/app/controllers/arvados/v1/healthcheck_controller.rb Lines 25-29
Updated
There’s a test failing: services/api/integration/errors_test.rb
Thanks for noticing this. I updated the test to add _health/* to routes
Updated by Lucas Di Pentima over 7 years ago
All services/api
tests pass now, this LGTM. Thanks!
Updated by Radhika Chippada over 7 years ago
Branch 11906-wb-ping @ ffc117651144b197c9d650039a5403e972d54b7c
Adds _health/ping to workbench.
I also made small updates to API server implementation. The code merged in 11906-api-ping was returning responses of the format
{"errors":["authorization required"],"error_token":"1500405563+a2c23148"}. Updated to return {"errors":"authorization required"} instead.
Test run @ https://ci.curoverse.com/job/developer-run-tests/386/
Updated by Lucas Di Pentima over 7 years ago
Updates at ffc117651144b197c9d650039a5403e972d54b7c LGTM, just one tiny detail:
Line 20 at file apps/workbench/app/controllers/healthcheck_controller.rb
has its indentation a bit off.
Updated by Radhika Chippada over 7 years ago
- Target version changed from 2017-07-19 sprint to 2017-08-02 sprint
Updated by Radhika Chippada over 7 years ago
- Assigned To set to Radhika Chippada
Updated by Tom Clegg over 7 years ago
11906-health-check-lib @ 0e0dc3c8ebf442a35c41816fed42fdddb53aed53
Updated by Radhika Chippada over 7 years ago
Branch 11906-health-check-lib:
- In sdk handler.go, can we can call “type Func” as “type HealthFunc” or something to help improve readability. I am thinking if we go back to this code after 6 months, it might be a bit confusing
- The comment “Map of URI paths to health-check Func. The prefix is omitted: Routes["foo"] is the health check invoked by a request to “/_health/foo” => Here should it say a request to “{prefix}/foo” instead?
- In the setup func, is it safe to assume that the prefix always starts with “/“ or do we want to ensure it similar to HasSuffix handling? (I just tested with “_health” as prefix and the test failed)
- Do we want to add a test with wrong prefix as well in sdk handler_test.go? h.ServeHTTP(resp, s.request(“/nosuchprefix/ping", goodToken))
- TestZeroValue => TestNotConfigured?
Updated by Tom Clegg over 7 years ago
Radhika Chippada wrote:
- In sdk handler.go, can we can call “type Func” as “type HealthFunc” or something to help improve readability. I am thinking if we go back to this code after 6 months, it might be a bit confusing
https://blog.golang.org/package-names says "avoid stutter": like http.Server, this is really health.Func.
- The comment “Map of URI paths to health-check Func. The prefix is omitted: Routes["foo"] is the health check invoked by a request to “/_health/foo” => Here should it say a request to “{prefix}/foo” instead?
Fixed.
- In the setup func, is it safe to assume that the prefix always starts with “/“ or do we want to ensure it similar to HasSuffix handling? (I just tested with “_health” as prefix and the test failed)
I think the caller is already responsible for building an http.Server and must know what paths look like in that context. I'm more inclined to remove the special suffix-adding behavior if anything. The example makes it clear what callers should do, and in the long run auto-fixing things just means committing to a more subtle API. Might as well just force all callers to call the same way right from the start.
- Do we want to add a test with wrong prefix as well in sdk handler_test.go? h.ServeHTTP(resp, s.request(“/nosuchprefix/ping", goodToken))
Added.
- TestZeroValue => TestNotConfigured?
TestZeroValueIsDisabled. And added another check.
11906-health-check-lib @ da037c1372879cb0ebc221c42ebdb86af669295b
Updated by Radhika Chippada over 7 years ago
11906-health-check-lib LGTM. Thanks.
Updated by Lucas Di Pentima over 7 years ago
Updates at 91f976b99d4e7f00d2c1fbfee75812c3b1b780c6 - branch 11906-keepproxy-ping
LGTM, thanks!
Updated by Radhika Chippada over 7 years ago
Branch 11906-keep-web-ping @ 1af8267a0d11ad9f986aec5ec7853299f3cfeb0e implements /_health/ping for keep-web.
Test run @ https://ci.curoverse.com/job/developer-run-tests/398/
Updated by Tom Clegg over 7 years ago
I pushed a commit removing the func and servemux, and calling the health handler's ServeHTTP() directly instead.
The rest LGTM, if this looks OK to you.
11906-keep-web-ping @ 2d16a410f0121806bf847f9e6290dae66e96fc5f
Updated by Lucas Di Pentima over 7 years ago
Updates at dc929c778 - branch 11906-arv-git-httpd-ping
LGTM, please merge.
Updated by Lucas Di Pentima over 7 years ago
Reviewed updates at 8ae04f9f4d252a99985d7ac047413318d2f1068b for nodemanager
. Looks good to me, thanks!
Updated by Radhika Chippada over 7 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|commit:dfc4c2c74dc7a31ff0c0d307846b7e9525fdcf0e.