Bug #3720

[API] Fetching collection record is very slow when manifest is not small.

Added by Tom Clegg over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Brett Smith
Category:
API
Target version:
Start date:
09/03/2014
Due date:
% Done:

100%

Estimated time:
(Total: 2.00 h)
Story points:
0.5

Description

This makes nginx give up on Workbench and say "502 Bad Gateway".


Subtasks

Task #3806: Review 3720-big-manifest-performance-wipResolvedBrett Smith


Related issues

Related to Arvados - Bug #3842: [Workbench] Manifests with more than one entry for a file display the file multiple times instead of concatinating themResolved

Associated revisions

Revision 41c7f201
Added by Brett Smith over 6 years ago

Merge branch '3720-big-manifest-performance-wip'

Closes #3720, #3806.

Revision f94f97fe (diff)
Added by Brett Smith over 6 years ago

3720: Update arvados Gem versions.

Gemfiles need to be in sync with the latest commit on master.
Refs #3720.

Revision 01f16229 (diff)
Added by Brett Smith over 6 years ago

Install & test Ruby SDK, then use that version for other tests.

This commit gives our Ruby SDK the same treatment as other parts of
our build process: we build and install it early, then test it. We
use that built version to run other tests, to make sure all the
components are synced up throughout the run.

This required turning off `bundle install --deployment`, because
Bundler will not use locally installed Gems in that mode. This does
make our build process a little less like what we use in production,
but the benefits of consistent testing and tightening the build
loop (you can update a Gem and dependent Gemfiles in one push) seem
worth that cost.

Refs #3720.

Revision 01f16229 (diff)
Added by Brett Smith over 6 years ago

Install & test Ruby SDK, then use that version for other tests.

This commit gives our Ruby SDK the same treatment as other parts of
our build process: we build and install it early, then test it. We
use that built version to run other tests, to make sure all the
components are synced up throughout the run.

This required turning off `bundle install --deployment`, because
Bundler will not use locally installed Gems in that mode. This does
make our build process a little less like what we use in production,
but the benefits of consistent testing and tightening the build
loop (you can update a Gem and dependent Gemfiles in one push) seem
worth that cost.

Refs #3720.

Revision bafb417c (diff)
Added by Brett Smith over 6 years ago

3842: Keep::Manifest concatenates file information from manifest.

The previous implementation failed to consider the possibility that
file information would be spread across multiple lines of a manifest.
This would cause, e.g., the same file to be yielded many times from
each_file.

This requirement makes it impossible to return file size information
without parsing the entire manifest. Because of that, I have reworked
the Ruby SDK API so that method names are more consistent with their
performance characteristics. I have also added some methods to do
some basic file existence checking that do not require parsing the
whole manifest.

Closes #3842. Refs #3720.

History

#1 Updated by Tom Clegg over 6 years ago

  • Description updated (diff)
  • Category set to API

#4 Updated by Tom Clegg over 6 years ago

  • Target version set to 2014-09-17 sprint

#5 Updated by Brett Smith over 6 years ago

  • Assigned To set to Brett Smith

#6 Updated by Brett Smith over 6 years ago

  • Status changed from New to In Progress

#7 Updated by Brett Smith over 6 years ago

This bug has both an API server component and a Workbench component. Here are the slow bits when I visit the Workbench page:

API client: 0.000437652 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/collections/d071779966740d0f4884c17c3712a6a6+8369520
API client: 35.640100583 API transaction
API client: 11.322907415 request_time
API client: 0.157921615 Parse response

API client: 0.000361355 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/logs  {"object_uuid":"d071779966740d0f4884c17c3712a6a6+8369520"}
API client: 6.25677107 API transaction
API client: 2.137570119 request_time
API client: 0.052077863 Parse response

Rendered collections/_show_files.html.erb (113004.2ms)

#8 Updated by Brett Smith over 6 years ago

The vast majority of the API response time is in as_api_object. Here's some rough benchmarking added to my API server for Collections#show:

Entered Collections#show: 2014-08-28 17:30:25 -0400
Signed manifests: 0.126431064
Rendered Collection: 5.789607971
Completed 200 OK in 7442.9ms (Views: 5663.0ms | ActiveRecord: 77.5ms)

This Collection has almost 150K files, and part of the problem here is that we're rendering them twice: once in manifest_text, and once in the files attribute. Some brief grepping suggests that Workbench is the only likely user of the files attribute (FIXME: do better grepping). After discussion with Tom, here's our plan:

  • Remove the files attribute from API server.
  • Move the files generation code from API server to Workbench. (Really, it should go in SDKs, but we don't have a Ruby SDK, so…)
  • Because rendering the file list in Workbench itself also takes too long, cap the view at 10K files.

#9 Updated by Brett Smith over 6 years ago

3720-big-manifest-performance-wip is up for review to implement a fix for this. There have been some changes from the previous plan, so to help understanding:

Brett Smith wrote:

  • Remove the files attribute from API server.

data_size went away along with this, because it also requires parsing the manifest, which can cause performance trouble.

  • Move the files generation code from API server to Workbench. (Really, it should go in SDKs, but we don't have a Ruby SDK, so…)

It turns out that the API server still needs to be able to parse file information from manifests in some cases—e.g., to find Docker images. As a consequence, I (1) moved the Locator class from the API server to the Ruby SDK, (2) added a Manifest class to the Ruby SDK to handle regular parsing tasks, and (3) used that code from both the API server and Workbench as needed.

There are many places where Workbench looks at the files attribute. For now, I have added a method for backwards compatibility to keep that code functioning. As needed, we can update these callers to use Manifest methods directly for better performance. (The current Manifest methods all yield results as they go, so if you only want to look at, e.g., the first three files, you can do that without parsing the entire text.) For now, my changes are sufficient to fix this story: the API server response is much quicker (~1 second compared to ~6 before), and the Workbench response is smaller to avoid problems after it receives the response.

I have also prepared changes for Jenkins to let us update the Ruby SDK and Rails apps at the same time, so that we'll no longer have a two-step update process like we currently do with arvados-cli.

  • Because rendering the file list in Workbench itself also takes too long, cap the view at 10K files.

This is done with a simple take call where needed.

#10 Updated by Peter Amstutz over 6 years ago

LGTM

#11 Updated by Brett Smith over 6 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:41c7f2010ebdbb76fada25a21f184e2d1f4049b3.

Also available in: Atom PDF