Bug #4232

[Workbench] Pipeline instances with many components should render quickly (on pipeline_instances#show and projects#show)

Added by Tom Clegg about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Phil Hodgson
Category:
Workbench
Target version:
Start date:
12/20/2014
Due date:
% Done:

80%

Estimated time:
(Total: 0.00 h)
Story points:
1.0

Subtasks

Task #4873: Review 4232ResolvedPhil Hodgson

Task #4668: Find a pipeline_instance that is slow to testResolvedPhil Hodgson

Task #4871: Fix Workbench slowdownsResolvedPhil Hodgson

Task #4814: Determine bottleneck when displaying pipes'n'jobs contents for a projectResolvedPhil Hodgson

Task #4872: Fix API server slowdownsResolvedPhil Hodgson

Associated revisions

Revision 9e111d1d
Added by Phil Hodgson almost 6 years ago

Merge branch '4232-slow-pipes-n-jobs' closes #4232

History

#1 Updated by Tom Clegg about 6 years ago

  • Target version changed from Arvados Future Sprints to 2014-11-19 sprint

#2 Updated by Tom Clegg about 6 years ago

  • Category set to Workbench
  • Assigned To set to Phil Hodgson

#3 Updated by Phil Hodgson about 6 years ago

Please define this further... How many is many and how quick is quickly? When I try this with a whole bunch of components, it seems just as quick/slow as any other page in Workbench.

#4 Updated by Bryan Cosca about 6 years ago

This project is probably going to be the most help: qr1hi-j7d0g-6fum19pfbaxyy9x. Theres ~1000 jobs and pipelines in the tab and the pipeline template is fairly big (~2000 lines).

This does seem to be loading faster than normal though. I'm not exactly sure if anything changed recently.

#5 Updated by Phil Hodgson about 6 years ago

Thank you, Bryan. That will be useful I'm sure: for me it takes almost 20 seconds to load those jobs'n'pipes!

#6 Updated by Ward Vandewege about 6 years ago

  • Target version changed from 2014-11-19 sprint to 2014-12-10 sprint

#7 Updated by Phil Hodgson about 6 years ago

But I was mixed up. While it takes a while to load that list of jobs and pipes, showing one pipeline template or pipeline instance, even one of those with many lines, is to me really quite fast. So in fact I'm still not certain what exactly the trouble is that I should be looking for with this issue...

#8 Updated by Tom Clegg almost 6 years ago

  • Target version changed from 2014-12-10 sprint to 2015-01-07 sprint

#9 Updated by Phil Hodgson almost 6 years ago

Comments from Tom:

It's possible the "showing components of a single pipeline is slow"
bug is already fixed or never even existed. But it seems to me we
should be able to render a page of jobs&pipelines in a lot less than
20 seconds. (IIRC this is even after we reduced the page size to 20
on the "list of jobs and pipelines" tab because it was so slow even
Capybara got impatient and gave up waiting for it.)

So maybe the "pipeline_instances#show" part of the story is obsolete,
and the real bug is just projects#show -> pipelines&jobs tab, which
retrieves projects#contents -> _show_contents_rows.

My guess is that we're doing a lot of individual "get job" and "get
pipeline template" API calls here that could be reduced with some
combination of
  • batching into an "index?filters=..." query like the links_for_object
    helper method
  • only doing one GET per object per request (ArvadosBase::find() is
    supposed to do this, but it might not be for some reason)

It's also possible the API requests themselves are slower than they
should be -- in which case the solution may look totally different --
but I'm currently guessing it's primarily a workbench problem.

#10 Updated by Tom Clegg almost 6 years ago

time python -c "import arvados; print arvados.api().groups().contents(
  uuid='qr1hi-j7d0g-6fum19pfbaxyy9x',
  limit=50, order='created_at desc',
  filters=[['uuid','is_a','arvados#pipelineInstance']]).execute()" >/dev/null

real    0m30.850s
user    0m2.736s
sys    0m0.584s

time arv group contents \
  --uuid qr1hi-j7d0g-6fum19pfbaxyy9x \
  --limit 50 \
  --order 'created_at desc' \
  --filters '[["uuid", "is_a", "arvados#pipelineInstance"]]' >/dev/null

real    0m27.063s
user    0m2.904s
sys    0m0.752s

I notice each pipelineInstance in the response includes a full pipeline_template response, which ~doubles the size of the response and most likely causes a bunch of extra queries on the API side (because we never told ActiveRecord to include(:pipeline_template)).

Even Workbench ignores this part of the pipelineInstanceList response and looks up each template in a separate API call (just so it can display its name -- select would be nice, but might be hard to work into that "preload" pattern). Perhaps :pipeline_template should be removed from PipelineInstance's API response.

The computed dependencies attribute could also be a total waste of time and response size.

#11 Updated by Phil Hodgson almost 6 years ago

Hmm... Right, so I'm also posting just below what I sent to you in an email. I was trying to figure out why the API server reports its request_time as being ~8 seconds, while on my end the actual arv call takes ~77 real seconds. What I don't understand is where the other 69 seconds go. If there are a whole bunch of extra queries going on on the API side, would they not be a part of the "request_time"?

phil@ubuntu:~$ time arv group contents \
   --uuid qr1hi-j7d0g-6fum19pfbaxyy9x \
   --limit 50 \
   --filters '[["uuid", "is_a", "arvados#pipelineInstance"]]' \
   --order 'created_at desc' > tmp/blah.out

real    1m17.680s
user    0m31.789s
sys     0m1.723s

phil@ubuntu:~$ tail tmp/blah.out
   },             
   "description":null,
   "started_at":"2014-11-18T21:21:02Z",
   "finished_at":"2014-11-18T21:28:46Z" 
  }
 ],
 "_profile":{
  "request_time":8.04581508
 }
}

#12 Updated by Phil Hodgson almost 6 years ago

And on the subject of the original subject, that of making the components tab render more quickly when there are a lot of components, perhaps AJAX loading of the accordion bits is in order?

#13 Updated by Phil Hodgson almost 6 years ago

I've pushed a Workbench fix and an API server "fix". The latter is tentative and based on the idea that Tom proposed that the pipeline_templates and dependencies are not needed in the API response and greatly slowdown or enlarge the response. The tests still pass even with these properties removed. I also looked around for something that might rely on the presence of these but found nothing.

The Workbench fix I committed is more likely to hit the mark: using #where to get all the jobs for every pipeline when rendering them in show_components was not taking advantage of the #find methods caching.

#14 Updated by Brett Smith almost 6 years ago

Phil Hodgson wrote:

Hmm... Right, so I'm also posting just below what I sent to you in an email. I was trying to figure out why the API server reports its request_time as being ~8 seconds, while on my end the actual arv call takes ~77 real seconds. What I don't understand is where the other 69 seconds go. If there are a whole bunch of extra queries going on on the API side, would they not be a part of the "request_time"?

My experience in the past has been that the request time only includes the time that "our" code handles the request. Specifically, it does not include time required to convert the object to JSON and send that over the wire, which can be a noticeable difference for large responses. See ApplicationController#render: it records the request_time, then calls super.

Reviewing 2fc8f46

Workbench change

  • I'm concerned that using find? this way will improve the performance of projects#contents at the expense of pipeline_instances#show. Caching this way is great if we're going to be requesting the same job multiple times. When that isn't the case—which it normally won't be when we're displaying one pipeline—this code replaces one round trip to the API server with N round trips. If we decide that's a worthwhile trade-off, fine, but since there's discussion in this ticket about the performance of both pages, I wanted to flag it for attention.
  • .compact at the end of .each seems to be a noop. If I'm following right, it returns a copy of jobuuids without nils, but we don't do anything with that copy.

API server change

I also can't find anything using the dependencies field, and it's not documented, so I'm OK with removing it.

  • A few of our tutorial pages show job responses with dependencies. Please remove those fields from tutorial-gatk-variantfiltration.html.textile.liquid and tutorial-job1.html.textile.liquid in doc/user/topics.
  • Following our code standards, please remove the dependencies code rather than commenting it out.

Thanks.

#15 Updated by Tom Clegg almost 6 years ago

  • Status changed from New to In Progress

#16 Updated by Tom Clegg almost 6 years ago

  • Target version changed from 2015-01-07 sprint to 2015-01-28 Sprint

#17 Updated by Phil Hodgson almost 6 years ago

Timed a few runs again. Here on the qr1hi shell:

phil@shell.qr1hi:~$ time arv group contents  \
 --uuid qr1hi-j7d0g-6fum19pfbaxyy9x \
 --limit 50 --order 'created_at desc' \
 --filters '[["uuid", "is_a", "arvados#pipelineInstance"]]' > /dev/null

real    0m14.123s
user    0m2.856s
sys     0m0.836s

So we can see right away that time seems to have improved (halved) API-side since last we looked (see above where it was 27s for Tom).

I used Chrome's Dev Tools to check the request that Workbench makes for https://workbench.qr1hi.arvadosapi.com/projects/qr1hi-j7d0g-6fum19pfbaxyy9x#Jobs_and_pipelines and for:

https://workbench.qr1hi.arvadosapi.com/projects/qr1hi-j7d0g-6fum19pfbaxyy9x?partial=contents_rows&limit=50&filters=%5B%5B%22uuid%22%2C%22is_a%22%2C%5B%22arvados%23job%22%2C%22arvados%23pipelineInstance%22%5D%5D%5D

The wait time was 18.49s and the delivery time was only 1.02s.

So if we take the difference, it's only 3s that Workbench spends doing processing of some kind.

Honestly I don't have any evidence that says that switching to using #find is truly noticeably better that #where, even for #contents. I think Brett is right that it also would slow down #show. We could rig it to use one method for #contents and the other for #show, of course, if we think it may be worthwhile. Again, I have no evidence to say it wouldn't be half a dozen of one in exchange for six of the other.

Ward has pointed out that Tom has addressed a bug where our configured JSON mechanisms weren't being used except explicitly, and instead Rails' superslow default mechanisms were kicking in. Brett had suggested that these times could be significant. Perhaps that's where some or all of the gains have come from, between the first of these rounds of tests (pre-JSON-improvement) to this next round of tests from today?

We can still keep the part where I removed the code where the API works out and sends all the dependencies, since if they're not being used or referenced then we could gain some more time there.

#18 Updated by Tom Clegg almost 6 years ago

  • Target version changed from 2015-01-28 Sprint to 2015-02-18 sprint

#19 Updated by Phil Hodgson almost 6 years ago

Suggesting that we merge in the removal of the rendering of the "dependencies" in the API and call it a day, with the assumption that Tom's JSON fix has made this bug not.

#20 Updated by Brett Smith almost 6 years ago

Phil Hodgson wrote:

Suggesting that we merge in the removal of the rendering of the "dependencies" in the API and call it a day, with the assumption that Tom's JSON fix has made this bug not.

Tom has agreed this sounds good. Can you please prepare a branch that does that based on current master?

#21 Updated by Ward Vandewege almost 6 years ago

  • Target version changed from 2015-02-18 sprint to 2015-03-11 sprint

#22 Updated by Phil Hodgson almost 6 years ago

Branch prepared with latest master, and with what I believe are the conclusions. That is, in the end I've removed the modification to use #find? but left in the modification to not send the "dependencies" of a job.

#23 Updated by Brett Smith almost 6 years ago

Reviewing ad2792e

  • You can remove the PipelineInstance#dependency_search method from API server as well. Now that the dependencies attribute is gone, nothing else calls this.
  • You removed t.add :pipeline_template, :if => :pipeline_template. Going over the source, I don't see any documentation or code referring to this, so I assume we're removing it for the same reason we're removing dependencies? That's fine, just double-checking that I understand.

Thanks.

#24 Updated by Phil Hodgson almost 6 years ago

  • PipelineInstance#dependency_search removed
  • Indeed, you surmised correctly about removing the pipeline_template from the registered list of properties: same rationale

#25 Updated by Brett Smith almost 6 years ago

ad76f76 is good to merge—love to see some cruft cleaned up. Thanks for sticking through this.

#26 Updated by Phil Hodgson almost 6 years ago

  • Status changed from In Progress to Resolved

Merged.

Also available in: Atom PDF