Project

General

Profile

Actions

Bug #14804

closed

[keepstore] Return 5xx (not 4xx) if block is not found due to transient backend device failure

Added by Tom Clegg over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
02/22/2019
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Currently, when keepstore is trying to read a block, if one Azure-backed volume encounters a 503 error and all other volumes return 404, keepstore returns 404 to its client. This is a non-retryable error so the client will give up.

The correct behavior is to return a 502 or 503 status in this situation.

Azure error message:

storage: service returned error: StatusCode=503, ErrorCode=ServerBusy, ErrorMessage=The server is busy.


Subtasks 1 (0 open1 closed)

Task #14836: Review 14804-keepstore-transient-backend-errorsResolvedLucas Di Pentima02/22/2019

Actions

Related issues

Related to Arvados - Bug #15118: [keepstore] Return 5xx (not 4xx) if block is not found due to transient backend device failureNew

Actions
Actions #2

Updated by Tom Morris over 3 years ago

  • Target version changed from Arvados Future Sprints to 2019-02-27 Sprint
Actions #3

Updated by Lucas Di Pentima over 3 years ago

  • Assigned To set to Lucas Di Pentima
Actions #4

Updated by Lucas Di Pentima over 3 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Lucas Di Pentima over 3 years ago

Updates at 601764a10 - branch 14804-keepstore-transient-backend-errors
Test run: https://ci.curoverse.com/job/developer-run-tests/1082/

When requesting a block, if keepstore gets errors from all of its volumes, the error that was being returned to the client was 404 no matter which error the volumes returned.
Now, when receiving a VolumeBusyError (transient error) from a volume backend, keepstore will return a 503 status so that the client can retry instead of mistakenly believe that the block is not there.

Actions #6

Updated by Lucas Di Pentima over 3 years ago

Re-running developer-run-tests-remainder at: https://ci.curoverse.com/job/developer-run-tests-remainder/1117/

Actions #7

Updated by Eric Biagiotti over 3 years ago

Small nit pick, I would update the comment for TestGetHandler to include your test scenario. Otherwise, LGTM.

Actions #8

Updated by Lucas Di Pentima over 3 years ago

  • Status changed from In Progress to Resolved
Actions #9

Updated by Tom Morris over 3 years ago

  • Release set to 15
Actions #10

Updated by Tom Clegg about 3 years ago

  • Related to Bug #15118: [keepstore] Return 5xx (not 4xx) if block is not found due to transient backend device failure added
Actions

Also available in: Atom PDF