Project

General

Profile

Actions

Idea #7824

closed

[SDKs] arv-get and arv-ls should use new PySDK Collection APIs

Added by Sarah Guthrie over 8 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Story points:
1.0

Description

Fix

  • Rewrite arv-ls and arv-get to use modern CollectionReader API (keys() and open()) instead of legacy methods all_streams() and all_files().
  • To be consistent with other tools, move main code of arv-get into arvados.commands module and replace bin/arv-get with a stub that calls it.
  • Update both tools to consistently use logging instead of print >>sys.stderr
  • Must not change command line behavior of existing arv-get

Original report

In 1.5 hrs, 8MiB of a 55MiB file was downloaded using the command: arv keep get 215dd32873bfa002aa0387c6794e4b2c+54081534/tile.csv .

A top on the computer running the "arv keep get" command results in:

 top - 19:47:07 up 2 days,  9:09,  8 users,  load average: 1.12, 1.26, 1.32
 Tasks: 223 total,   3 running, 217 sleeping,   0 stopped,   3 zombie
 %Cpu(s): 43.5 us,  8.7 sy,  0.0 ni, 47.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
 KiB Mem:  15535256 total, 12281116 used,  3254140 free,  1069760 buffers
 KiB Swap: 15929340 total,   221892 used, 15707448 free.  5467732 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                    
 14366 sguthrie  20   0 2498672 2.173g   7204 R 100.0 14.7  98:02.16 arv-get      

Downloads from workbench on this collection generate a timeout before allowing the user to choose where to download the file.

Story #7729 requires multiple downloads from this qr1hi collection (qr1hi-4zz18-wuld8y0z7qluw00) and ones with similarly large manifests. To unblock #7729 I would need one of:
  • A recipe that allows a user to alter the manifest to be well behaved
  • Faster downloads from collections with very large manifests

Update by Ward:

I investigated a bit while this was ongoing. There was no discernable extra load on keepproxy, or on the API server, or on Postgres while Sally's download was ongoing. But when I tried to run the command locally, after a while I saw arv-get suck up 100% cpu (one core) and peak ram at 3GiB (resident!) until I killed it.


Subtasks 1 (0 open1 closed)

Task #11272: Review 7824-arvls-arvput-collection-api-usageResolvedPeter Amstutz11/19/2015Actions

Related issues

Related to Arvados - Idea #10387: Faster downloading using arv-getNewActions
Related to Arvados - Task #5449: Update examples to use new Python Collection SDK and add deprecation notes to old APIsResolved02/13/2015Actions
Actions

Also available in: Atom PDF