Idea #11167
closed[Workbench] Remove arv-get file download fallback
Description
Currently, if keep-web isn't configured, Workbench will fallback to arv-get and Rails streaming to serve files. However, this is a very bad fallback:
- It fails silently if calling arv-get doesn't work
- It ties up a workbench worker for the duration of the download
- It doesn't report content-length, so user agents are unable to render a progress bar or determine if the entire file was transferred.
- It sometimes silently drops out in the middle of downloads
- It sometimes consumes huge amount of RAM, crashing the workbench server.
- It can't handle [some?] range requests
Instead we should:
- Workbench should refuse start if keep-web is not configured <- see note#22
- Documentation should be updated to emphasize that keep-web is mandatory, Workbench config for keep-web, and the new Workbench startup failure mode
- Remove the arv-get fallback code and adjust any related tests
- For file downloads, prefer to link directly to keep-web instead of redirecting through workbench (especially useful for sharing links) (done in #8784)
Related issues
Updated by Tom Morris over 7 years ago
- Description updated (diff)
- Target version set to 2017-03-15 sprint
- Story points set to 1.0
Updated by Tom Morris over 7 years ago
- Target version changed from 2017-03-15 sprint to Arvados Future Sprints
Updated by Tom Morris over 7 years ago
- Assigned To set to Lucas Di Pentima
- Target version changed from Arvados Future Sprints to 2017-07-19 sprint
Updated by Lucas Di Pentima about 7 years ago
- Status changed from New to In Progress
Updated by Lucas Di Pentima about 7 years ago
- Target version changed from 2017-07-19 sprint to 2017-08-02 sprint
Updated by Lucas Di Pentima about 7 years ago
Updates at cd698336f - branch 11167-wb-remove-arvget
Test run: https://ci.curoverse.com/job/developer-run-tests/391/
- Removed arv-get related code
- Adjusted/removed tests
- Added code to refuse start when keep-web isn't configured
Pending: update documentation
Updated by Lucas Di Pentima about 7 years ago
Some new updates at c7187ec72
Test run: https://ci.curoverse.com/job/developer-run-tests/392/
- Removed outdated comments from the code
- Corrected a test that wasn't failing when runnning them without configuring keep-web.
Updated by Lucas Di Pentima about 7 years ago
Fixed a failing integration test at 8f3070e16
Test run: https://ci.curoverse.com/job/developer-run-tests/393/
Updated by Radhika Chippada about 7 years ago
I didn't spend much time looking at the updates, but here are a couple initial comments.
- collections_controller -> show file method: Since workbench won’t start if keep-web is not enabled, I don’t think you need this check anymore: “if Rails.configuration.keep_web_url or Rails.configuration.keep_web_download_url”. I think you can remove this if statement and outdent the code block here?
- please revisit the integration tests that are removed and amend them to work with keep-web instead (unless a keep-web counterpart test already exists in that context). The test 'view log via keep-web redirect' in jobs_test.rb might come handy in this direction.
Thanks.
Updated by Lucas Di Pentima about 7 years ago
Updates at 2918ed81152a1fe8607f433e36dceddc1cb78f74
Test run: https://ci.curoverse.com/job/developer-run-tests/399/
Addressed almost all suggestions made on the previous comment. I'm still working to adjust jobs' integration test view partial job log
.
Updated by Radhika Chippada about 7 years ago
Lucas, a couple more comments in the meantime:
- application.default.yml => can you add “.com” to the test config as follows?
- keep_web_url: http://example/c=%{uuid_or_pdh} + keep_web_url: http://example.com/c=%{uuid_or_pdh}
- Would it be possible to refactor the following code (similar code in anonymous_access_test and jobs_test as well) into a reusable method?
+ use_keep_web_config + + token = api_fixture('api_client_authorizations')['active']['api_token'] + data = "foo\nfile\n" + datablock = `echo -n #{data.shellescape} | ARVADOS_API_TOKEN=#{token.shellescape} arv-put --no-progress --raw -`.strip + assert $?.success?, $? + + col = nil + use_token 'active' do + mtxt = ". #{datablock} 0:#{data.length}:foo\n" + col = Collection.create(manifest_text: mtxt) + end
Updated by Lucas Di Pentima about 7 years ago
- Target version changed from 2017-08-02 sprint to 2017-08-16 sprint
Updated by Lucas Di Pentima about 7 years ago
Updates at 57647945b08a74bab02fed6adf947eb9adb8321f
Test run: https://ci.curoverse.com/job/developer-run-tests/407/
- Addressed comments from Radhika regarding
keep_web_url
setting and integration test reusable method - Did a lot of testing on why "
view partial job log
" integration test fails, and discovered a bug onkeep-web
in the process, fixed. - The
keep-web
fix isn't enough because there's another bug on our testing environment, seems that the javascript code on the failing integration test doesn't get all the http response headers fromkeep-web
, there are 2 suspects:- One is
nginx
because it's being used as akeep-web
proxy when running tests. - Another possibility is the headless browser being used for integration tests the one losing some headers.
- One is
- The partial log view feature was tested on
arvbox
and worked correctly when trying it with thekeep-web
fix.
Next action will be follow Tom Clegg's suggestion and try using curl
from within the integration test to see if the missing headers appear, because I've been trying different nginx
configurations without any success.
Updated by Lucas Di Pentima about 7 years ago
Updates at 02a0ac50b
Test run: https://ci.curoverse.com/job/developer-run-tests/408/
Using curl
from the integration tests proved that nginx
wasn't being eating those expected response headers.
Finally found the cause of the failing integration test: PhantomJS 1.9.8 seems to have a bug, as reported here: https://github.com/ariya/phantomjs/issues/13200
Changing the test to use selenium fixed the issue.
Updated by Tom Clegg about 7 years ago
IIRC this change means "include ActionController::Live" isn't needed any more in CollectionsController.
LGTM, thanks!
Updated by Lucas Di Pentima about 7 years ago
I've tried removing it (3224524c1a2d6790322448565cb5a16ea0bcb874) and some tests started to fail: https://ci.curoverse.com/job/developer-run-tests/410/
Reverted that change at 36b4355956d23f3b7402fba0e18318fc70e31e27 and started a new test run (https://ci.curoverse.com/job/developer-run-tests/412/) to confirm all is good with the latest changes before merging to master.
Updated by Lucas Di Pentima about 7 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|commit:d0a4adabafa4d132ab2333338c941acc57ca82fb.
Updated by Nico César about 7 years ago
- Description updated (diff)
"workbench should refuse start if keep-web is not configured" <- that's a bad idea. we have all sort of restarts and different configurations that the simple "should refuse to start" will make a nightmare in operations, is better a big-fat-red-sign on the workbench that the workbench not starting at all.
Updated by Ward Vandewege over 3 years ago
- Related to Bug #17528: [Rails] package prints a backtrace on installation when the arvados config.yml file is not present/complete added