Bug #3616
closed[Workbench] Workbench uses huge amounts of RAM when user downloads a large (> 1 GiB) file.
Updated by Peter Amstutz over 10 years ago
- Subject changed from Workbench uses huge amounts of RAM when user downloads a large (> 1 GiB) file. to [Workbench] Workbench uses huge amounts of RAM when user downloads a large (> 1 GiB) file.
- Category set to Workbench
Updated by Tom Clegg over 10 years ago
- Target version set to Arvados Future Sprints
Updated by Tom Clegg over 10 years ago
- Target version changed from Arvados Future Sprints to 2014-10-08 sprint
Updated by Peter Amstutz about 10 years ago
- Story points changed from 0.5 to 1.0
Updated by Peter Amstutz about 10 years ago
It appears that Rails "streaming" actually means rendering and delivering the page incrementally, as opposed to the normal mode of operation where the entire page is rendered before being delivered. With that use case in mind, it's not so surprising that it hangs on to the data being sent out. So our current method is probably the entirely wrong way to do file download.
To actually stream data out to the socket, it looks like we need to either use send_file() (and rig something up using arv-mount?), Rails 4 live streaming (include ActionController::Live), write our own Rack middleware to specifically intercept file downloads (I've written a couple of rack middlewares already, it's not that hard), or redirect downloads to some to-be-decided alternate service such as keepproxy.
Somewhat related to this is the need to support range clauses so we can do partial downloads for things like large log files.
Updated by Tom Clegg about 10 years ago
3616-live-stream @ 1586924
Tested with a 500M file.- Before: ruby process grows from 819m to 1900m, then client starts receiving data.
- After: ruby process grows from 819m to 826m and stays there while I download.
Updated by Peter Amstutz about 10 years ago
Ok, I've verified this works with puma and passenger. It seems that webrick doesn't support streaming and so it still wants to buffer the whole thing, but that's not a blocker for production
Unfortunately it uses chunked transfer encoding, which turns out to have a few problems:
- it adds a little bit of overhead
- you can't specify content-length so the downloader doesn't know how much data to expect a
- on Firefox these downloads don't show up in the download dialog at all, although they do get downloaded. (on chromium it works) (at least, this is what I'm seeing, its possible my firefox is broken).
Also, I suggest adding --batch-progress or --no-progress the arv-get invocation to avoid cluttering up the logs.
Updated by Tom Clegg about 10 years ago
Peter Amstutz wrote:
Ok, I've verified this works with puma and passenger. It seems that webrick doesn't support streaming and so it still wants to buffer the whole thing, but that's not a blocker for production
Right, this is not the first reason not to use Webrick in production. :) And it seems to make Webrick no more broken than it was before this change, so I don't feel bad about anyone who happens to be running Webrick for some reason.
Unfortunately it uses chunked transfer encoding, which turns out to have a few problems:
- it adds a little bit of overhead
- you can't specify content-length so the downloader doesn't know how much data to expect a
- on Firefox these downloads don't show up in the download dialog at all, although they do get downloaded. (on chromium it works) (at least, this is what I'm seeing, its possible my firefox is broken).
Those problems are regrettable, but less regrettable than using O(N) RAM.
Also, I suggest adding --batch-progress or --no-progress the arv-get invocation to avoid cluttering up the logs.
This shouldn't be necessary since arv-get already defaults to --no-progress unless stderr isatty.
Thanks
Updated by Anonymous about 10 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|commit:7b2d04380952ac79453bd0771679e40c81281f5c.