Bug #10078

[Performance][Workbench] tab_pane=dashboard too expensive

Added by Tom Morris over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
Performance
Target version:
Start date:
09/15/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

This call https://cloud.curoverse.com/?tab_pane=dashboard takes 9 seconds every 30 seconds which is way to expensive for display idling on the dashboard page.


Subtasks

Task #10146: Review 10078-dashboard-perfResolvedNico César

Associated revisions

Revision d68cb184
Added by Radhika Chippada over 2 years ago

refs #10078
Merge branch '10078-dashboard-perf'

Revision f678c1f5 (diff)
Added by Ward Vandewege over 2 years ago

Small naming fix for the config variables.

refs #10078

Revision f853c2c7 (diff)
Added by Ward Vandewege over 2 years ago

Small naming fix for the config variables.

refs #10078

Revision 87473fd8
Added by Radhika Chippada over 2 years ago

refs #10078
Merge branch '10078-dashboard-perf'

History

#1 Updated by Tom Morris over 2 years ago

  • Target version set to 2016-10-12 sprint

#2 Updated by Radhika Chippada over 2 years ago

  • Assigned To set to Radhika Chippada

#3 Updated by Radhika Chippada over 2 years ago

  • Status changed from New to In Progress

#4 Updated by Tom Morris over 2 years ago

I looked at this again yesterday and noticed that:

- it wasn't polling as I'd seen before, so perhaps that was a fallback after web sockets connection had died
- the query itself was only taking ~2 seconds when I ran it by hand as opposed to ~9 seconds as I'd seen before

These two things both make this less urgent I think.

#5 Updated by Radhika Chippada over 2 years ago

Commit 8e6d577

  • Nodes were getting fetched thrice: fixed this
  • Recent collections were getting fetched twice; fixed this
  • Preload collections and links for all the processes and collections
  • Add config parameter to show / hide recent_collections in dashboard
  • Add config parameter to show / hide user notifications in top nav
  • The issue in su92l where permissions graph causes extensive delays in indexing collections is not addressed here.
  • With these updates, it now takes about 2 minutes in place of previous 4 minutes to display dashboard for a non-admin user in su92l (without hiding recent collections and user notifications)
  • To improve dashboard display times measurably, set the following two configuration parameter to false in su92l
      # In systems with many shared projects, dashboard loading can be
      # slow due to collections indexing; use the following parameters
      # to suppress certain properties from dashboard
      show_recent_collections_on_dashboard: true
      show_user_notifications_on_dashboard: true
    

#6 Updated by Tom Morris over 2 years ago

What are "user notifications" and why are they resource intensive?

How long does the su92l dashboard take to display with recent collections hidden? With both recent collections and user notifications hidden?

#7 Updated by Radhika Chippada over 2 years ago

What are "user notifications"

If a user has not yet uploaded a collection or run a pipeline, we display a notification in top nav of workbench. I attached notifications.jpg so that you can see it.

and why are they resource intensive?

The queries basically are trying to find a collection (and a pipeline instance) whose owner_uuid is current user id or any of his projects and hence ending up with the non-performing query of collection indexing. Not only is this query expensive, it is executed for every single page display since it is in top nav.

How long does the su92l dashboard take to display with recent collections hidden? With both recent collections and user notifications hidden?

With both recent_collections panel and notifications turned off, it now takes takes about 10s to load the dashboard in su92l for a non-admin user.

Here is the query executions timings to check for collection in notifications:

production.log:{"method":"POST","path":"/arvados/v1/collections","format":"*/*","controller":"arvados/v1/collections","action":"index","status":200,"duration":78188.8,"view":0.72,"db":78132.13,"params":{"api_token":"m51fv1765fxl8qny8bwd4t3g5fw6kftqaqgck82g5p09k39x5","reader_tokens":"[false]","current_request_id":"1475530528-710836597","_method":"GET","where":"{\"created_by\":\"su92l-tpzed-2hxfl1bjoe64dpr\"}","limit":"1","offset":"0","_profile":"true"},"@timestamp":"2016-10-03T21:38:13Z","@version":"1","message":"[200] POST /arvados/v1/collections (arvados/v1/collections#index)"}

And, this is the query timings to fetch the recent_collections:

production.log:{"method":"POST","path":"/arvados/v1/collections","format":"*/*","controller":"arvados/v1/collections","action":"index","status":200,"duration":78168.33,"view":0.65,"db":78103.15,"params":{"api_token":"m51fv1765fxl8qny8bwd4t3g5fw6kftqaqgck82g5p09k39x5","reader_tokens":"[false]","current_request_id":"1475530528-710836597","_method":"GET","order":"[\"modified_at desc\"]","limit":"8","offset":"0","_profile":"true"},"@timestamp":"2016-10-03T21:36:51Z","@version":"1","message":"[200] POST /arvados/v1/collections (arvados/v1/collections#index)"}

Uploaded the file "su92l-dashboard-improved-with-notifications-and-recent-collections" which lists the requests being made with the improvements in place while still including the notifications and recent_collections.

#9 Updated by Nico César over 2 years ago

test 8e6d577f1381da0d42481ef8dd0479241e3c50bf

I like that show_user_notifications_on_dashboard is there in application.default.yml as true, so no changes in new deploys (except su92l that we want to fix)

I notice this change:

     <div class="col-md-6">
-      <% nodes = Node.filter([["last_ping_at", ">", Time.now - 3600]]) %>
+      <% nodes = Node.filter([["last_ping_at", ">", Time.now - 3600]]).results %>
       <div class="panel panel-default" style="min-height: 10.5em">
         <div class="panel-heading"><span class="panel-title">Compute node status</span>
           <span class="pull-right compute-node-actions">

I'm puzzled, why is results needed now?

talking about performance, this line:

+                <i class="fa fa-fw fa-folder-o"></i><%= link_to_if_arvados_object recent_cs[:owners][p[:owner_uuid]], friendly_name: true %>/

invokes function link_to_if_arvados_object in apps/workbench/app/helpers/application_helper.rb which is a building block for our UI in general can we easily benchmark that function with different objects?

besides the ".results" change all the rest seems fine to me. I'd like some other engineer to take a look too, since I could be leaving something important out

#10 Updated by Tom Morris over 2 years ago

Nico - please open a separate ticket if you'd like additional performance benchmarking done on core UI building blocks.

I think Radhika's changes represent a useful increment of performance improvement and don't want them delayed by other investigations.

Nico / Radhika - please find another volunteer to assign the review task to if it needs another reviewer.

#11 Updated by Radhika Chippada over 2 years ago

Nico: regarding Node.filter("last_ping_at", ">", Time.now - 3600).results update: earlier the "nodes" were being fetched three times, once in _show_dashboard.html and twice in compute_node_summary (nodes.select statements). So making "nodes" reflect the results themselves eliminated the other two calls.

I also pushed one more update (e25a377e) to improve the recent_processes performance, just the SQL update with "select" list. If the dashboard includes large pipelines such as the one in #10144, fetching the entire pipeline will result in serious performance issues during "recent_processes". Since we do not need the "components" of the pipelines during dashboard display, I updated to select only the necessary columns.

#12 Updated by Nico César over 2 years ago

test e25a377

Ready to merge. Once it's merged I'll deploy in su92l and activate the flag .

thanks

#13 Updated by Nico César over 2 years ago

Tom Morris wrote:

Nico - please open a separate ticket if you'd like additional performance benchmarking done on core UI building blocks.

I think Radhika's changes represent a useful increment of performance improvement and don't want them delayed by other investigations.

I agree here, let's merge this as is.

Nico / Radhika - please find another volunteer to assign the review task to if it needs another reviewer.

I'm not a skilled RoR programmer, that's why I thought that a second pair of eyes could help. But since we're short in time and performance is only measured in su92l . let's merge and iterate later

#14 Updated by Ward Vandewege over 2 years ago

Thank you Radhika and reviewers, I've deployed this to su92l and disabled the expensive parts of Workbench, and now logging in as an unprivileged user is not ridiculously slow anymore.

#15 Updated by Tom Morris over 2 years ago

The original ticket for su92l was #9972.

#16 Updated by Radhika Chippada over 2 years ago

  • Status changed from In Progress to Resolved

Permission graph related issue will be addressed in #9972

Also available in: Atom PDF