Project

General

Profile

Actions

Bug #19368

closed

[keep-web] [S3] slow requests caused by logUploadOrDownload

Added by Tom Clegg over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-
Release relationship:
Auto

Description

While investigating slow HEAD requests I noticed a few inefficiencies in logUploadOrDownload that could be causing substantial delays:
  • if request method is anything other than PUT, POST, or GET, logUploadOrDownload() doesn't log anything -- but it does all the work to determine the collection ID and properties that would be logged if the method were different.
  • if Collections.WebDAVLogEvents is disabled, it spends time determining the collection ID and properties to log them to stderr (although not to the logs table).
  • determineCollection() walks the filesystem tree from root to requested target, looking for a special .arvados#collection file at each level, which may incur several unnecessary API calls (e.g., /by_id/$projectid/.arvados#collection will try to look up a subproject or collection with that name)
  • (most importantly?) generating the magic .arvados#collection file involves writing a new manifest for the entire directory tree, which (a) is not actually needed here, (b) can be very large and therefore slow to generate, and (c) due to the root-to-leaf approach is always generated for the entire collection, even though reading the magic file from the same directory as the requested target would often generate a much smaller manifest and still return the correct collection UUID (although it wouldn't reveal the path relative to collection root, which the logging feature uses).
Proposed fix:
  • Introduce an .arvados#uuid special file that just returns the UUID of the relevant collection or project represented by that directory
Related possible improvement:
  • Attach the user/collection IDs to the "response" log rather than creating a third ("file upload" / "file download") log entry per request with those fields added. We now have a feature in the httpserver package to make this easy.

Files

keep-web (11 MB) keep-web 265afdad112b129c36235935470d4a410161a9ef-dev Tom Clegg, 08/09/2022 04:10 PM
keep-web (11 MB) keep-web 0d8b4f1ad827a575bf74b058426eb898257592e5-dev Tom Clegg, 08/22/2022 02:48 PM
keep-web (11 MB) keep-web b896ceb55db0593631718cb13ee95b2414afe8f9-dev Tom Clegg, 08/24/2022 02:03 PM
keep-web (11 MB) keep-web 87a93969ba0b4eaa1d8c63af5c039e7fed908a31-dev Tom Clegg, 08/29/2022 10:01 PM
keep-web (11 MB) keep-web 13ebfc417bbdb8c5d325112a72248041b4ae49fd-dev Tom Clegg, 08/30/2022 02:44 PM

Subtasks 1 (0 open1 closed)

Task #19384: Review 19368-webdav-logging-speedupResolvedTom Clegg08/12/2022Actions

Related issues

Related to Arvados - Idea #17464: Logging and restricting downloads in keep-web and keepproxyResolvedPeter Amstutz06/15/2021Actions
Related to Arvados - Bug #19192: WebDAVCache not performing as expected for S3 requestsResolvedTom Clegg06/21/2022Actions
Actions #1

Updated by Tom Clegg over 1 year ago

  • Description updated (diff)
Actions #2

Updated by Tom Clegg over 1 year ago

  • Related to Idea #17464: Logging and restricting downloads in keep-web and keepproxy added
Actions #3

Updated by Tom Clegg over 1 year ago

  • Description updated (diff)
Actions #4

Updated by Tom Clegg over 1 year ago

  • Related to Bug #19192: WebDAVCache not performing as expected for S3 requests added
Actions #5

Updated by Tom Clegg over 1 year ago

Here's a dev build with both the bugfix from #19192#note-5, the watchdog / stack dump from #19192#note-4, and a fix for the "generate manifest for entire collection on each request" issue mentioned here.

(This bypasses commits on the main branch since #19192, in the interest of minimizing any version-skew issues while testing the fix.)

Actions #8

Updated by Peter Amstutz over 1 year ago

  • Target version changed from 2022-08-17 sprint to 2022-08-31 sprint
Actions #9

Updated by Lucas Di Pentima over 1 year ago

This LGTM, thanks!

Actions #10

Updated by Tom Clegg over 1 year ago

Here's a dev build with #19368#note-5 and content-sniffing disabled on S3 GET/HEAD requests, i.e., if the requested file's extension is not in /etc/mime.types, the file data will not be read, and the returned content-type will be application/octet-stream.

Actions #11

Updated by Tom Clegg over 1 year ago

Another dev build, #19368#note-10 but with the Sys() solution (since merged to main) instead of the .arvados#collection_id. (When the bucket ID was a project ID, the .arvados#collection_id approach still caused an extra groups#contents API call for the bucket project itself and each intervening project, which can easily add multiple seconds to each HEAD request.)

Actions #12

Updated by Tom Clegg over 1 year ago

Another dev build, #19368#note-11 plus a fix to prevent cache size accounting from blocking concurrent filesystem operations.

Actions #13

Updated by Tom Clegg over 1 year ago

With more concurrency improvements from #19428:

Actions #14

Updated by Tom Clegg over 1 year ago

  • Target version changed from 2022-08-31 sprint to 2022-09-14 sprint
Actions #15

Updated by Tom Clegg over 1 year ago

  • Release set to 53
  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF