Project

General

Profile

Actions

Bug #3616

closed

[Workbench] Workbench uses huge amounts of RAM when user downloads a large (> 1 GiB) file.

Added by Peter Amstutz over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Workbench
Target version:
Story points:
1.0

Subtasks 3 (0 open3 closed)

Task #4022: Use rails "live" feature ResolvedTom Clegg09/26/2014Actions
Task #4014: Review 3616-live-streamResolvedPeter Amstutz09/26/2014Actions
Task #3919: Debug streaming ResolvedPeter Amstutz09/26/2014Actions
Actions #1

Updated by Peter Amstutz over 9 years ago

  • Subject changed from Workbench uses huge amounts of RAM when user downloads a large (> 1 GiB) file. to [Workbench] Workbench uses huge amounts of RAM when user downloads a large (> 1 GiB) file.
  • Category set to Workbench
Actions #2

Updated by Tom Clegg over 9 years ago

  • Target version set to Arvados Future Sprints
Actions #3

Updated by Tom Clegg over 9 years ago

  • Story points set to 1.0
Actions #4

Updated by Tom Clegg over 9 years ago

  • Target version changed from Arvados Future Sprints to 2014-10-08 sprint
Actions #5

Updated by Tom Clegg over 9 years ago

  • Assigned To set to Peter Amstutz
Actions #6

Updated by Peter Amstutz over 9 years ago

  • Story points changed from 1.0 to 0.5
Actions #7

Updated by Peter Amstutz over 9 years ago

  • Story points changed from 0.5 to 1.0
Actions #8

Updated by Peter Amstutz over 9 years ago

It appears that Rails "streaming" actually means rendering and delivering the page incrementally, as opposed to the normal mode of operation where the entire page is rendered before being delivered. With that use case in mind, it's not so surprising that it hangs on to the data being sent out. So our current method is probably the entirely wrong way to do file download.

To actually stream data out to the socket, it looks like we need to either use send_file() (and rig something up using arv-mount?), Rails 4 live streaming (include ActionController::Live), write our own Rack middleware to specifically intercept file downloads (I've written a couple of rack middlewares already, it's not that hard), or redirect downloads to some to-be-decided alternate service such as keepproxy.

Somewhat related to this is the need to support range clauses so we can do partial downloads for things like large log files.

Actions #9

Updated by Tom Clegg over 9 years ago

3616-live-stream @ 1586924

Tested with a 500M file.
  • Before: ruby process grows from 819m to 1900m, then client starts receiving data.
  • After: ruby process grows from 819m to 826m and stays there while I download.
Actions #10

Updated by Tom Clegg over 9 years ago

  • Status changed from New to In Progress
Actions #11

Updated by Peter Amstutz over 9 years ago

Ok, I've verified this works with puma and passenger. It seems that webrick doesn't support streaming and so it still wants to buffer the whole thing, but that's not a blocker for production

Unfortunately it uses chunked transfer encoding, which turns out to have a few problems:

  • it adds a little bit of overhead
  • you can't specify content-length so the downloader doesn't know how much data to expect a
  • on Firefox these downloads don't show up in the download dialog at all, although they do get downloaded. (on chromium it works) (at least, this is what I'm seeing, its possible my firefox is broken).

Also, I suggest adding --batch-progress or --no-progress the arv-get invocation to avoid cluttering up the logs.

Actions #12

Updated by Tom Clegg over 9 years ago

Peter Amstutz wrote:

Ok, I've verified this works with puma and passenger. It seems that webrick doesn't support streaming and so it still wants to buffer the whole thing, but that's not a blocker for production

Right, this is not the first reason not to use Webrick in production. :) And it seems to make Webrick no more broken than it was before this change, so I don't feel bad about anyone who happens to be running Webrick for some reason.

Unfortunately it uses chunked transfer encoding, which turns out to have a few problems:

  • it adds a little bit of overhead
  • you can't specify content-length so the downloader doesn't know how much data to expect a
  • on Firefox these downloads don't show up in the download dialog at all, although they do get downloaded. (on chromium it works) (at least, this is what I'm seeing, its possible my firefox is broken).

Those problems are regrettable, but less regrettable than using O(N) RAM.

Also, I suggest adding --batch-progress or --no-progress the arv-get invocation to avoid cluttering up the logs.

This shouldn't be necessary since arv-get already defaults to --no-progress unless stderr isatty.

Thanks

Actions #13

Updated by Anonymous over 9 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:7b2d04380952ac79453bd0771679e40c81281f5c.

Actions

Also available in: Atom PDF