Feature #19428
closedkeep-web/collectionfs/sitefs performance improvements
Description
Address performance issues in keep-web, particularly in sequences of S3 requests for a single collection using a single token, which should be an ideal scenario for the session cache.
Updated by Tom Clegg over 2 years ago
Production log files indicate a bimodal timing distribution: most requests have timeToStatus around 20 ms, but some are around 1-2 seconds even though they don't make any calls out to controller.
I suspect this is caused by pruneSessions's call to fs.MemorySize() racing with the filesystem accesses that are needed to fulfil the request. When pruneSessions wins the race, fs.MemorySize() blocks all other filesystem accesses while it traverses the entire filesystem (all projects, collections, files, and data segments). Since a GET / HEAD request does multiple filesystem operations, it can even be interrupted this way more than once. Ironically this is more likely to happen when there are fewer active sessions.
We might want to make pruneSessions less aggressive (e.g., max 1x per 10 s) but the most important change is to make MemorySize() non-blocking.
This branch also fixes a few less significant delays.
19428-webdav-performance @ 2c0638b0444652591fa18d6cae2d1977ee5e5731 -- developer-run-tests: #3280- make fs.MemorySize() non-blocking
- eliminate a leading "/" in an API call that was causing a 301 redirect on each user lookup
- ask for a larger page size when populating a project directory (the per-page overhead for group#contents can be significant)
Updated by Tom Clegg over 2 years ago
- Target version changed from 2022-08-31 sprint to 2022-09-14 sprint
Updated by Tom Clegg over 2 years ago
User tested the note-2 patch in affected production environment, reported huge improvement.
19428-webdav-performance @ 193ebedab37c71170f649308732fe0a18d7d2ba6 -- developer-run-tests: #3283
wb1 retry developer-run-tests-apps-workbench-integration: #3531
- also avoids computing each session's MemorySize twice (won't really affect user-facing performance, but will reduce CPU load on the server host by a bit)
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-09-14 sprint to 2022-09-28 sprint
Updated by Peter Amstutz over 2 years ago
This LGTM, please merge & cherry pick on to 2.4.3
Updated by Tom Clegg over 2 years ago
- Status changed from In Progress to Resolved