Bug #21901
openReduce redundant file_download logging when client uses multi-threaded/reassembly approach
Description
Background¶
Currently, if a client (like aws s3 cp ...
) launches multiple threads to download file segments concurrently and assemble them on the client side, and the WebDAVLogEvents
config is enabled, each file segment generates a new entry in the logs table. This causes excessive load and misleading statistics when downloading, for example, a multi-gigabyte file in 8 MiB segments.
Proposed behavior¶
Keep-web should maintain an in-memory lookup table of requests that appear to be part of an ongoing multi-request download that has already been logged, and skip the file_download
log for subsequent segments. Something like:
- Generate a download event key comprising the client IP address (X-Forwarded-For), token, collection ID, and filename
- If the key is already in the "ongoing download" table with a recent timestamp, and the requested range does not include the first byte of the file, just update the timestamp in the table and don't generate a
file_download
log entry - Otherwise, add the key to the table and generate a
file_download
log entry
The definition of "recent" should be configurable, default 30 seconds. If configured to 0, this log consolidation behavior should be disabled.
Related issues
Updated by Tom Clegg 18 days ago
- Related to Bug #21748: awscli downloads from keep-web slowly? added
Updated by Peter Amstutz 14 days ago
- Target version set to Development 2024-07-03 sprint