Bug #15521

[keepstore] error reporting improvements

Added by Peter Amstutz about 1 year ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
10/31/2019
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
2.0
Release relationship:
Auto

Description

From ops bug #15520

Keepstore logging improvements:

  1. Keepstore PutBlock() calls log.Printf, this line of code is untouched from 2014 (!) it is being logged in JSON format but lacks useful context like the request id.
  2. The error that is sent to the client is not logged at all.
  3. The log doesn't say anything about where the block is being fetched from -- which volume, bucket, remote cluster, anything
  4. the error that reaches the user needs to make it clear that the problem was in fetching a remote block; requires some combination of improving server and client error messages

Subtasks

Task #15750: Review 15521-keepstore-loggingResolvedTom Clegg


Related issues

Related to Arvados - Bug #15606: [keep-web] logging doesn't include error messagesResolved10/29/2019

Related to Arvados - Bug #15713: [Controller] Internal error not loggedResolved10/24/2019

Associated revisions

Revision 4554374c
Added by Tom Clegg 10 months ago

Merge branch '15521-keepstore-logging'

refs #15521
refs #15520

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)

#2 Updated by Tom Morris about 1 year ago

  • Target version changed from 2019-08-14 Sprint to Arvados Future Sprints

#3 Updated by Peter Amstutz 12 months ago

  • Subject changed from federation error reporting improvements to [keepstore] error reporting improvements

#4 Updated by Tom Morris 12 months ago

  • Story points set to 2.0

#5 Updated by Peter Amstutz 10 months ago

  • Related to Bug #15606: [keep-web] logging doesn't include error messages added

#6 Updated by Tom Clegg 10 months ago

  • Assigned To set to Tom Clegg
  • Target version changed from Arvados Future Sprints to 2019-11-06 Sprint

#7 Updated by Tom Clegg 10 months ago

  • Status changed from New to In Progress

#8 Updated by Tom Clegg 10 months ago

  • Related to Bug #15713: [Controller] Internal error not logged added

#9 Updated by Tom Clegg 10 months ago

  1. Keepstore PutBlock() calls log.Printf, this line of code is untouched from 2014 (!) it is being logged in JSON format but lacks useful context like the request id.

Updated "MD5 checksum %s did not match request" logs (and many other logs in keepstore) to use ctxlog so they include request id, loglevel, etc.

Fixed an unreported error: PutBlock tries volmgr.NextWritable(), and if that fails, it tries all writable volumes in sequence. The error from that first failure wasn't being logged.

  1. The error that is sent to the client is not logged at all.

Already fixed in #15713.

  1. The log doesn't say anything about where the block is being fetched from -- which volume, bucket, remote cluster, anything

If we get bad data from a remote service, we get two log entries:

msg="%s: MD5 checksum %s did not match request"

respStatusCode=502 respBody="checksum mismatch in remote response" (respBody added in #15713)

  1. the error that reaches the user needs to make it clear that the problem was in fetching a remote block; requires some combination of improving server and client error messages

The server is sending "checksum mismatch in remote response".

15521-keepstore-logging @ 62d28600cbfc31f8e72c61e4519ff198cb66a02a -- https://ci.curoverse.com/view/Developer/job/developer-run-tests/1616/

#11 Updated by Lucas Di Pentima 10 months ago

This LGTM, thanks.

#12 Updated by Tom Clegg 10 months ago

  • Status changed from In Progress to Resolved

#13 Updated by Peter Amstutz 7 months ago

  • Release set to 22

Also available in: Atom PDF