Project

General

Profile

Actions

Bug #21901

closed

Reduce redundant file_download logging when client uses multi-threaded/reassembly approach

Added by Tom Clegg 5 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Story points:
-
Release:
Release relationship:
Auto

Description

Background

Currently, if a client (like aws s3 cp ...) launches multiple threads to download file segments concurrently and assemble them on the client side, and the WebDAVLogEvents config is enabled, each file segment generates a new entry in the logs table. This causes excessive load and misleading statistics when downloading, for example, a multi-gigabyte file in 8 MiB segments.

Proposed behavior

Keep-web should maintain an in-memory lookup table of requests that appear to be part of an ongoing multi-request download that has already been logged, and skip the file_download log for subsequent segments. Something like:

If the request has a Range header:
  • Generate a download event key comprising the client IP address (X-Forwarded-For), token, collection ID, and filename
  • If the key is already in the "ongoing download" table with a recent timestamp, and the requested range does not include the first byte of the file, just update the timestamp in the table and don't generate a file_download log entry
  • Otherwise, add the key to the table and generate a file_download log entry

The definition of "recent" should be configurable, default 30 seconds. If configured to 0, this log consolidation behavior should be disabled.


Subtasks 1 (0 open1 closed)

Task #21916: Review 21901-file-log-throttlingResolvedBrett Smith08/23/2024Actions

Related issues

Related to Arvados - Bug #21748: awscli downloads from keep-web slowlyResolvedTom CleggActions
Actions

Also available in: Atom PDF