Project

General

Profile

Actions

Bug #21901

open

Reduce redundant file_download logging when client uses multi-threaded/reassembly approach

Added by Tom Clegg 18 days ago. Updated 13 days ago.

Status:
New
Priority:
Normal
Assigned To:
Category:
Keep
Story points:
-
Release:
Release relationship:
Auto

Description

Background

Currently, if a client (like aws s3 cp ...) launches multiple threads to download file segments concurrently and assemble them on the client side, and the WebDAVLogEvents config is enabled, each file segment generates a new entry in the logs table. This causes excessive load and misleading statistics when downloading, for example, a multi-gigabyte file in 8 MiB segments.

Proposed behavior

Keep-web should maintain an in-memory lookup table of requests that appear to be part of an ongoing multi-request download that has already been logged, and skip the file_download log for subsequent segments. Something like:

If the request has a Range header:
  • Generate a download event key comprising the client IP address (X-Forwarded-For), token, collection ID, and filename
  • If the key is already in the "ongoing download" table with a recent timestamp, and the requested range does not include the first byte of the file, just update the timestamp in the table and don't generate a file_download log entry
  • Otherwise, add the key to the table and generate a file_download log entry

The definition of "recent" should be configurable, default 30 seconds. If configured to 0, this log consolidation behavior should be disabled.


Subtasks 1 (1 open0 closed)

Task #21916: ReviewNewTom CleggActions

Related issues

Related to Arvados - Bug #21748: awscli downloads from keep-web slowly?In ProgressTom CleggActions
Actions #1

Updated by Tom Clegg 18 days ago

  • Related to Bug #21748: awscli downloads from keep-web slowly? added
Actions #2

Updated by Tom Clegg 15 days ago

  • Release set to 70
Actions #3

Updated by Peter Amstutz 14 days ago

  • Category set to Keep
Actions #4

Updated by Peter Amstutz 14 days ago

  • Target version set to Development 2024-07-03 sprint
Actions #5

Updated by Peter Amstutz 13 days ago

  • Assigned To set to Brett Smith
Actions

Also available in: Atom PDF