Feature #8707
Arvados job: download data from remote site into Keep
100%
Description
Subtasks
Related issues
Associated revisions
History
#1
Updated by Tom Clegg about 6 years ago
- Description updated (diff)
#2
Updated by Tom Clegg about 6 years ago
- Story points set to 1.0
#3
Updated by Tom Clegg about 6 years ago
- Category set to Third party integration
- Assigned To set to Tom Clegg
#4
Updated by Tom Clegg about 6 years ago
- failure due to successful download with right size but wrong md5sum: https://crvr.se/su92l-8i9sb-ful8qhzowkshfoq
- success: https://crvr.se/su92l-8i9sb-aizw0cupzxafowf
#5
Updated by Brett Smith about 6 years ago
Reviewing db7bd2a. This is good to merge, these are all just "idiomatic Python" nits that you can take or leave as you like.
cStringIO provides the same API as StringIO with better performance. You can switch to it with a one-line change by changing your import to import cStringIO as StringIO
.
It seems a little odd that you open the URL, then check its scheme. Maybe move that up? You might also consider saving the result of urlparse.urlparse()
and reusing it, but that's really small potatoes.
Your download loop can be written a little DRYer as:
with open(outpath, 'w') as outfile:
for chunk in iter(lambda: httpresp.read(BUFFER_SIZE), ''):
outfile.write(chunk)
got_md5.update(chunk)
got_size = outfile.tell()
Thanks.
#6
Updated by Tom Clegg about 6 years ago
All of that sounds better, thanks. I was torn between the two uglies -- while-True-if-cond-break
and duplicating the read()
-- the iter
solution is just what I was wishing for.
#7
Updated by Brett Smith about 6 years ago
#8
Updated by Tom Clegg about 6 years ago
- Status changed from New to In Progress
Merge branch '8707-download'
refs #8707