Project

General

Profile

Actions

Idea #10387

open

Faster downloading using arv-get

Added by Joshua Randall about 8 years ago. Updated 9 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Start date:
Due date:
Story points:
-
Release:
Release relationship:
Auto

Description

As a user wanting to access data kept in keep from an external (non-Arvados) system, I'd like my `arv keep get` requests to be able to fully utilise available network bandwidth for a download from keep. The limiting factor currently seems to be that blocks are downloaded serially starting at the beginning of a manifest and working forward to the end). My expectation would be that destination files could be preallocated and that blocks could then be downloaded concurrently to fill in the files as the blocks arrive. The rate of parallel downloading could be increased until overall throughput stops increasing (indicating the network may be saturated) at which point the client could back off a bit. The maximum number of concurrent connections to each keep server could be limited by configuration.

A request for parallel downloading may be incompatible with options such as `--md5sum` to display the hash of files as they come in (because the hash needs to be computed serially), but as a user I wouldn't mind making the options mutually exclusive and being able to choose either `--parallel` (or whatever) or `--md5sum`.


Related issues

Related to Arvados - Idea #7824: [SDKs] arv-get and arv-ls should use new PySDK Collection APIsResolvedLucas Di Pentima11/19/2015Actions
Actions #1

Updated by Tom Morris about 7 years ago

  • Target version set to Arvados Future Sprints
Actions #2

Updated by Ward Vandewege over 3 years ago

  • Target version deleted (Arvados Future Sprints)
Actions #3

Updated by Peter Amstutz almost 2 years ago

  • Release set to 60
Actions #4

Updated by Peter Amstutz 9 months ago

  • Target version set to Future
Actions

Also available in: Atom PDF