Project

General

Profile

Collection API - Performance enhancements » History » Version 4

Radhika Chippada, 05/11/2015 07:32 PM

1 1 Radhika Chippada
h1. Collection API - Performance enhancements
2
3
h2. Problem description
4
5
Currently, we are experiencing severe performance issues when working with large collections in Arvados. Below are a few scenario descriptions.
6
7
h3. 1. Fetching a large collection
8
9
Fetching a collection with large manifest text from API server results in timeout errors. This is suspected to be either the root cause or contributing largely to the other issues listed below. Several issues are reported which are the side effects of this issue: #4953, #4943,  #5614, #5901, #5902 
10
11
h3. 2. Collection#show in workbench
12
13
Often times, we see timeout errors in workbench when showing a collection page with large manifest text. It may be mostly due to the above listed concern about fetching the large collections. #5902, #5908
14
15
h3. 3. Create a collection by combining
16
17
Creating new collections by combining other collections or several files from a collection almost always fail when one of more of the involved collections contain large manifest texts. A few issues about this: #4943,  #5614
18
19
h2. Proposed solutions 
20
21
Various operations dealing with these large manifest texts are certainly the cause of these performance issues. Sending and receiving the manifest text to and from the api server to clients, json encoding and decoding of these large manifest texts could be contributing to this performance issues. Reducing the amount of data and the number of times this data is exchanged can greatly help.
22
23
h3. 1. Fetching a large collection
24
25
* Compress the data transferred (We recently enabled gzip compression between API and workbench)
26
27
* Use efficient json encoding / decoding
28
** We are using Oj between API server and Workbench. Is there room for further improvement? (http://devblog.agworld.com.au/post/42586025923/the-performance-of-to-json-in-rails-sucks-and)
29
** Are we consistently using Oj in Ruby SDK? (Radhika: I need to do further research to answer this question)
30
31
* Send the data in smaller chunks (?)
32
** Is it possible for us to implement some form of “paging” strategy in sending the manifest text to the clients from the API server?
33
34
h3. 2. Collection#show in workbench
35
36 2 Radhika Chippada
Collection#show responses are profiled using rack-mini-profiler. When pointed the development environment to qr1hi api server, the following observations are made (based on about 20+ reloads of the page):
37
38
* On average it took about 70s for to show the collection qr1hi-4zz18-tcnxylwkxg0nfhi
39
40
* The most expensive operations (on average) are:
41
** collections/_show_source_summary  -- 30 seconds
42
** collections/show (api request to get collection) -- 15 sec
43
*** It took on average .2 sec to parse response (json)
44
** collections/_show_files  -- 15 sec
45
** applications/_projects_tree_menu -- 3 to 4 sec
46
*** For this collection, 6 requests were made to /groups each taking .2 to .5sec
47
48 4 Radhika Chippada
!{width:40%}https://arvados.org/attachments/download/595/perf-profile-qr1hi-4zz18-tcnxylwkxg0nfhi.png!
49 1 Radhika Chippada
50 4 Radhika Chippada
51 1 Radhika Chippada
* Workbench console log
52 4 Radhika Chippada
53 2 Radhika Chippada
<pre>
54
Started GET "/collections/qr1hi-4zz18-tcnxylwkxg0nfhi" for 127.0.0.1 at 2015-05-11 14:47:03 -0400
55
Processing by CollectionsController#show as HTML
56
  Parameters: {"id"=>"qr1hi-4zz18-tcnxylwkxg0nfhi"}
57
API client: 0.0007654 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/users/current    
58
API client: 0.313339245 API transaction
59
API client: 0.000289537 Parse response
60
API client: 9.8434e-05 Prepare request https://qr1hi.arvadosapi.com/discovery/v1/apis/arvados/v1/rest    
61
API client: 0.250356943 API transaction
62
API client: 0.005898489 Parse response
63
API client: 0.000356541 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/collections/qr1hi-4zz18-tcnxylwkxg0nfhi    
64
API client: 21.405180053 API transaction
65
API client: 0.170310714 Parse response
66
API client: 0.000316374 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/jobs  {"output":"55152d2b989c6b174e298dba10ae3ff7+57708684"}  
67
API client: 0.221916151 API transaction
68
API client: 0.000178293 Parse response
69
API client: 0.000356427 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/jobs  {"log":"55152d2b989c6b174e298dba10ae3ff7+57708684"}  
70
API client: 0.078525257 API transaction
71
API client: 0.00017414 Parse response
72
API client: 0.000424393 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/links  {"head_uuid":"qr1hi-4zz18-tcnxylwkxg0nfhi","link_class":"name"}  modified_at DESC
73
API client: 0.059534807 API transaction
74
API client: 0.000152869 Parse response
75
API client: 0.000302943 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/groups  {"uuid":[]}  
76
API client: 0.06070466 API transaction
77
API client: 0.000145646 Parse response
78
API client: 0.00029954 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/links  {"head_uuid":"qr1hi-4zz18-tcnxylwkxg0nfhi","link_class":"permission","name":"can_read"}  modified_at DESC
79
API client: 0.062907368 API transaction
80
API client: 0.000149375 Parse response
81
API client: 0.00028907 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/logs  {"object_uuid":"qr1hi-4zz18-tcnxylwkxg0nfhi"}  created_at DESC
82
API client: 0.079528074 API transaction
83
API client: 0.00016842 Parse response
84
API client: 0.000523936 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/links  {"head_uuid":"qr1hi-4zz18-tcnxylwkxg0nfhi","tail_uuid":"qr1hi-tpzed-ktpvhqu89qoib9f","link_class":"resources","name":"wants"}  
85
API client: 0.062874638 API transaction
86
API client: 0.000175978 Parse response
87
API client: 0.000377932 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/api_client_authorizations   [["scopes","=",["GET /arvados/v1/collections/qr1hi-4zz18-tcnxylwkxg0nfhi","GET /arvados/v1/collections/qr1hi-4zz18-tcnxylwkxg0nfhi/","GET /arvados/v1/keep_services/accessible"]]] 
88
API client: 0.158867905 API transaction
89
  Rendered application/_show_autoselect_text.html.erb (0.9ms)
90
  Rendered application/_show_autoselect_text.html.erb (0.2ms)
91
  Rendered collections/_show_source_summary.html.erb (26534.7ms)
92
  Rendered collections/_sharing_button.html.erb (1.1ms)
93
API client: 0.000270093 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/groups/qr1hi-j7d0g-ixtfnxccvzm7ui4    
94
API client: 0.227171335 API transaction
95
API client: 0.000226508 Parse response
96
  Rendered application/_title_and_buttons.html.erb (239.8ms)
97
  Rendered collections/_show_files.html.erb (14531.8ms)
98
  Rendered application/_loading_modal.html.erb (1.5ms)
99
  Rendered application/_content.html.erb (14545.1ms)
100
  Rendered application/show.html.erb (14790.7ms)
101
  Rendered collections/show.html.erb within layouts/application (41352.7ms)
102
API client: 0.000296285 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/authorized_keys  {"authorized_user_uuid":"qr1hi-tpzed-ktpvhqu89qoib9f"}  
103
API client: 0.151415184 API transaction
104
API client: 0.000207343 Parse response
105
API client: 0.000238512 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/collections  {"created_by":"qr1hi-tpzed-ktpvhqu89qoib9f"}  
106
API client: 0.330716574 API transaction
107
API client: 0.000160901 Parse response
108
API client: 0.000376137 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/pipeline_instances  {"created_by":"qr1hi-tpzed-ktpvhqu89qoib9f"}  
109
API client: 0.263794223 API transaction
110
API client: 0.000841214 Parse response
111
API client: 0.000266079 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/groups   [["group_class","=","project"]] name
112
API client: 0.968549864 API transaction
113
API client: 0.001104326 Parse response
114
API client: 0.001218433 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/groups   [["group_class","=","project"]] name
115
API client: 1.214942211 API transaction
116
API client: 0.001293295 Parse response
117
API client: 0.001586803 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/groups   [["group_class","=","project"]] name
118
API client: 0.737031104 API transaction
119
API client: 0.000921665 Parse response
120
API client: 0.000487645 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/groups   [["group_class","=","project"]] name
121
API client: 0.848679769 API transaction
122
API client: 0.002339424 Parse response
123
API client: 0.000264704 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/groups   [["group_class","=","project"]] name
124
API client: 0.681490624 API transaction
125
API client: 0.000958536 Parse response
126
API client: 0.000484713 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/groups   [["group_class","=","project"]] name
127
API client: 0.412487956 API transaction
128
API client: 0.000900607 Parse response
129
  Rendered application/_projects_tree_menu.html.erb (5474.8ms)
130
API client: 0.00045925 Prepare request https://qr1hi.arvadosapi.com/arvados/v1/groups/qr1hi-tpzed-rv6f0l8lbvdzkog    
131
API client: 0.078490258 API transaction
132
  Rendered application/_browser_unsupported.html (0.6ms)
133
  Rendered getting_started/_getting_started_popup.html.erb (2.7ms)
134
  Rendered layouts/body.html.erb (6343.2ms)
135
Completed 200 OK in 70876ms (Views: 47804.5ms | ActiveRecord: 0.0ms)
136
</pre>
137
138
139 1 Radhika Chippada
* Implement paging (?) in the collection#show? Get “pages” of collection and display them as needed.  
140
141
* Avoid making multiple calls to the API server for the same data by caching or preloading data (See #5908)
142
143
* Show less information in the collection page (such as not linking images that are going to 404)? (See #5908)
144
145
h3. 3. Create a collection by combining
146
147
* Offer an API server method that accepts the selections array (and optionally owner_uuid and name) and performs the creation of the new collection in the backend. Doing so can help as follows:
148
** When combining entire collections: We can completely eliminate the need to fetch the manifest text for the collections in workbench. Also, workbench would no longer need to work through the combining logic and generate the manifest text for the new collection to be created. No need to do JSON decode and encode the manifest text. Lastly, it would not need to send this manifest text to the API server on the wire. Instead, the API server can do all these steps on the server and create the new collection and send the generated collection uuid to workbench (which will then reduce the performance issue down to collection#show issue; yay)
149
** When combining selected files from within a collection: Here also, we can see significant performance improvements by eliminating need to generate the combined manifest text and sending it on wire.
150
151
h3. 4. Implement caching using a framework such as Memcache
152
153
* One of the issues listed above (#5901) is around being able to access collection in multiple threads in parallel. Also, #5908 highlights several API requests being repeated within one page display. In fact, we have this issue in several areas of workbench implementation.
154
155
* By implementing caching, we will be able to reduce the need to make round trip API requests to fetch these objects. Instead, we can improve performance by fetching these objects from the shared cache. 
156
157
* Question: Not sure how caching would work if / when we cache these huge collections.