Project

General

Profile

Actions

Bug #19192

closed

WebDAVCache not performing as expected for S3 requests

Added by Tom Clegg 6 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
06/21/2022
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Files

keep-web (11 MB) keep-web dev build at commit 042f0d37dacebddef9d2d7232fa2e5b9922a9451 Tom Clegg, 06/13/2022 06:15 PM
keep-web (11 MB) keep-web dev build at commit 65c1517193b5039769f288e84a11d1ebfddbf031 Tom Clegg, 06/20/2022 04:44 PM
keep-web (11 MB) keep-web dev build at commit b5db4b70878a3907db3691b71979e6e65511a12c Tom Clegg, 06/21/2022 03:22 AM

Subtasks 1 (0 open1 closed)

Task #19204: Review 19192-fix-deadlockResolvedLucas Di Pentima06/21/2022

Actions

Related issues

Related to Arvados - Story #19205: In Go services, monitor request times and record when they continue processing after client disconnects, or exceed a maximum request timeResolvedTom Clegg06/28/2022

Actions
Related to Arvados - Bug #19368: [keep-web] [S3] slow requests caused by logUploadOrDownloadResolvedTom Clegg08/12/2022

Actions
Actions #1

Updated by Tom Clegg 6 months ago

  • Status changed from New to In Progress
Actions #2

Updated by Tom Clegg 6 months ago

Actions #3

Updated by Tom Clegg 5 months ago

Debug logging indicates (*cache)pruneSessions() is not running, which causes expired sessions to get stuck in the cache forever. I'm guessing there's a situation where fs.MemorySize() deadlocks, and once that happens, pruneSessions can't continue/resume, and we effectively have no cache.

SIGABRT would produce a stack trace to confirm this and show where it's getting stuck.

Actions #4

Updated by Tom Clegg 5 months ago

Here's a debug version that will dump a stack trace and exit if cache pruning takes longer than 30s.

Actions #5

Updated by Tom Clegg 5 months ago

  • File keep-web added

19192-fix-deadlock @ dab0c5596e39dc455d88bba797717e829fe5caf5 -- developer-run-tests: #3177

Fixes a lock-ordering bug in lookupnode causing deadlock on concurrent open() and readdir() calls in the same lookupnode (project) directory.

Actions #6

Updated by Tom Clegg 5 months ago

  • File deleted (keep-web)
Actions #7

Updated by Tom Clegg 5 months ago

Here's a dev build with both the bugfix from note-5 and the watchdog / stack dump from note-4.

Actions #8

Updated by Lucas Di Pentima 5 months ago

19192-fix-deadlock LGTM, thanks!

Actions #9

Updated by Peter Amstutz 5 months ago

  • Target version changed from 2022-06-22 Sprint to 2022-07-06
Actions #10

Updated by Tom Clegg 5 months ago

  • Related to Story #19205: In Go services, monitor request times and record when they continue processing after client disconnects, or exceed a maximum request time added
Actions #11

Updated by Peter Amstutz 5 months ago

  • Target version changed from 2022-07-06 to 2022-07-20
Actions #12

Updated by Peter Amstutz 5 months ago

  • Target version changed from 2022-07-20 to 2022-07-06
Actions #13

Updated by Peter Amstutz 5 months ago

  • Status changed from In Progress to Resolved
  • Category set to Keep
Actions #14

Updated by Peter Amstutz 4 months ago

  • Release set to 52
Actions #15

Updated by Tom Clegg 4 months ago

  • Related to Bug #19368: [keep-web] [S3] slow requests caused by logUploadOrDownload added
Actions

Also available in: Atom PDF