Bug #14804
[keepstore] Return 5xx (not 4xx) if block is not found due to transient backend device failure
100%
Description
Currently, when keepstore is trying to read a block, if one Azure-backed volume encounters a 503 error and all other volumes return 404, keepstore returns 404 to its client. This is a non-retryable error so the client will give up.
The correct behavior is to return a 502 or 503 status in this situation.
Azure error message:
storage: service returned error: StatusCode=503, ErrorCode=ServerBusy, ErrorMessage=The server is busy.
Subtasks
Related issues
Associated revisions
History
#2
Updated by Tom Morris about 2 years ago
- Target version changed from Arvados Future Sprints to 2019-02-27 Sprint
#3
Updated by Lucas Di Pentima about 2 years ago
- Assigned To set to Lucas Di Pentima
#4
Updated by Lucas Di Pentima about 2 years ago
- Status changed from New to In Progress
#5
Updated by Lucas Di Pentima about 2 years ago
Updates at 601764a10 - branch 14804-keepstore-transient-backend-errors
Test run: https://ci.curoverse.com/job/developer-run-tests/1082/
When requesting a block, if keepstore
gets errors from all of its volumes, the error that was being returned to the client was 404 no matter which error the volumes returned.
Now, when receiving a VolumeBusyError
(transient error) from a volume backend, keepstore
will return a 503 status so that the client can retry instead of mistakenly believe that the block is not there.
#6
Updated by Lucas Di Pentima about 2 years ago
Re-running developer-run-tests-remainder at: https://ci.curoverse.com/job/developer-run-tests-remainder/1117/
#7
Updated by Eric Biagiotti about 2 years ago
Small nit pick, I would update the comment for TestGetHandler
to include your test scenario. Otherwise, LGTM.
#8
Updated by Lucas Di Pentima about 2 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|7e5f0e9ca6756099f761cc3f392476f362cd1645.
#9
Updated by Tom Morris about 2 years ago
- Release set to 15
#10
Updated by Tom Clegg almost 2 years ago
- Related to Bug #15118: [keepstore] Return 5xx (not 4xx) if block is not found due to transient backend device failure added
Merge branch '14804-keepstore-transient-backend-errors'
Closes #14804
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <ldipentima@veritasgenetics.com>