Bug #3616

[Workbench] Workbench uses huge amounts of RAM when user downloads a large (> 1 GiB) file.

Added by Peter Amstutz about 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Workbench
Target version:
Start date:
09/26/2014
Due date:
% Done:

100%

Estimated time:
(Total: 1.00 h)
Story points:
1.0

Subtasks

Task #4022: Use rails "live" feature ResolvedTom Clegg

Task #4014: Review 3616-live-streamResolvedPeter Amstutz

Task #3919: Debug streaming ResolvedPeter Amstutz

Associated revisions

Revision 7b2d0438
Added by Tom Clegg almost 5 years ago

Merge branch '3616-live-stream' closes #3616

History

#1 Updated by Peter Amstutz about 5 years ago

  • Subject changed from Workbench uses huge amounts of RAM when user downloads a large (> 1 GiB) file. to [Workbench] Workbench uses huge amounts of RAM when user downloads a large (> 1 GiB) file.
  • Category set to Workbench

#2 Updated by Tom Clegg about 5 years ago

  • Target version set to Arvados Future Sprints

#3 Updated by Tom Clegg about 5 years ago

  • Story points set to 1.0

#4 Updated by Tom Clegg almost 5 years ago

  • Target version changed from Arvados Future Sprints to 2014-10-08 sprint

#5 Updated by Tom Clegg almost 5 years ago

  • Assigned To set to Peter Amstutz

#6 Updated by Peter Amstutz almost 5 years ago

  • Story points changed from 1.0 to 0.5

#7 Updated by Peter Amstutz almost 5 years ago

  • Story points changed from 0.5 to 1.0

#8 Updated by Peter Amstutz almost 5 years ago

It appears that Rails "streaming" actually means rendering and delivering the page incrementally, as opposed to the normal mode of operation where the entire page is rendered before being delivered. With that use case in mind, it's not so surprising that it hangs on to the data being sent out. So our current method is probably the entirely wrong way to do file download.

To actually stream data out to the socket, it looks like we need to either use send_file() (and rig something up using arv-mount?), Rails 4 live streaming (include ActionController::Live), write our own Rack middleware to specifically intercept file downloads (I've written a couple of rack middlewares already, it's not that hard), or redirect downloads to some to-be-decided alternate service such as keepproxy.

Somewhat related to this is the need to support range clauses so we can do partial downloads for things like large log files.

#9 Updated by Tom Clegg almost 5 years ago

3616-live-stream @ 1586924

Tested with a 500M file.
  • Before: ruby process grows from 819m to 1900m, then client starts receiving data.
  • After: ruby process grows from 819m to 826m and stays there while I download.

#10 Updated by Tom Clegg almost 5 years ago

  • Status changed from New to In Progress

#11 Updated by Peter Amstutz almost 5 years ago

Ok, I've verified this works with puma and passenger. It seems that webrick doesn't support streaming and so it still wants to buffer the whole thing, but that's not a blocker for production

Unfortunately it uses chunked transfer encoding, which turns out to have a few problems:

  • it adds a little bit of overhead
  • you can't specify content-length so the downloader doesn't know how much data to expect a
  • on Firefox these downloads don't show up in the download dialog at all, although they do get downloaded. (on chromium it works) (at least, this is what I'm seeing, its possible my firefox is broken).

Also, I suggest adding --batch-progress or --no-progress the arv-get invocation to avoid cluttering up the logs.

#12 Updated by Tom Clegg almost 5 years ago

Peter Amstutz wrote:

Ok, I've verified this works with puma and passenger. It seems that webrick doesn't support streaming and so it still wants to buffer the whole thing, but that's not a blocker for production

Right, this is not the first reason not to use Webrick in production. :) And it seems to make Webrick no more broken than it was before this change, so I don't feel bad about anyone who happens to be running Webrick for some reason.

Unfortunately it uses chunked transfer encoding, which turns out to have a few problems:

  • it adds a little bit of overhead
  • you can't specify content-length so the downloader doesn't know how much data to expect a
  • on Firefox these downloads don't show up in the download dialog at all, although they do get downloaded. (on chromium it works) (at least, this is what I'm seeing, its possible my firefox is broken).

Those problems are regrettable, but less regrettable than using O(N) RAM.

Also, I suggest adding --batch-progress or --no-progress the arv-get invocation to avoid cluttering up the logs.

This shouldn't be necessary since arv-get already defaults to --no-progress unless stderr isatty.

Thanks

#13 Updated by Anonymous almost 5 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:7b2d04380952ac79453bd0771679e40c81281f5c.

Also available in: Atom PDF