Feature #8707
open
Arvados job: download data from remote site into Keep
Added by Tom Clegg almost 9 years ago.
Updated over 5 years ago.
Category:
Third party integration
Description
...to satisfy an API request like #8688
Implementation¶
One task per requested file -- this avoids retrying everything whenever one file fails
Use writable FUSE (task output dir)
Run wget or curl, probably with some sort of batch-progress flag
- Description updated (diff)
- Category set to Third party integration
- Assigned To set to Tom Clegg
Reviewing db7bd2a. This is good to merge, these are all just "idiomatic Python" nits that you can take or leave as you like.
cStringIO provides the same API as StringIO with better performance. You can switch to it with a one-line change by changing your import to import cStringIO as StringIO
.
It seems a little odd that you open the URL, then check its scheme. Maybe move that up? You might also consider saving the result of urlparse.urlparse()
and reusing it, but that's really small potatoes.
Your download loop can be written a little DRYer as:
with open(outpath, 'w') as outfile:
for chunk in iter(lambda: httpresp.read(BUFFER_SIZE), ''):
outfile.write(chunk)
got_md5.update(chunk)
got_size = outfile.tell()
Thanks.
All of that sounds better, thanks. I was torn between the two uglies -- while-True-if-cond-break
and duplicating the read()
-- the iter
solution is just what I was wishing for.
Now at
aee617c with new test jobs:
Tom Clegg wrote:
Now at aee617c with new test jobs:
That looks great, thanks.
- Status changed from New to In Progress
Also available in: Atom
PDF