Collection API - Performance enhancements » History » Version 1

Radhika Chippada, 05/05/2015 05:33 PM

1 1 Radhika Chippada
h1. Collection API - Performance enhancements
2 1 Radhika Chippada
3 1 Radhika Chippada
h2. Problem description
4 1 Radhika Chippada
5 1 Radhika Chippada
Currently, we are experiencing severe performance issues when working with large collections in Arvados. Below are a few scenario descriptions.
6 1 Radhika Chippada
7 1 Radhika Chippada
h3. 1. Fetching a large collection
8 1 Radhika Chippada
9 1 Radhika Chippada
Fetching a collection with large manifest text from API server results in timeout errors. This is suspected to be either the root cause or contributing largely to the other issues listed below. Several issues are reported which are the side effects of this issue: #4953, #4943,  #5614, #5901, #5902 
10 1 Radhika Chippada
11 1 Radhika Chippada
h3. 2. Collection#show in workbench
12 1 Radhika Chippada
13 1 Radhika Chippada
Often times, we see timeout errors in workbench when showing a collection page with large manifest text. It may be mostly due to the above listed concern about fetching the large collections. #5902, #5908
14 1 Radhika Chippada
15 1 Radhika Chippada
h3. 3. Create a collection by combining
16 1 Radhika Chippada
17 1 Radhika Chippada
Creating new collections by combining other collections or several files from a collection almost always fail when one of more of the involved collections contain large manifest texts. A few issues about this: #4943,  #5614
18 1 Radhika Chippada
19 1 Radhika Chippada
h2. Proposed solutions 
20 1 Radhika Chippada
21 1 Radhika Chippada
Various operations dealing with these large manifest texts are certainly the cause of these performance issues. Sending and receiving the manifest text to and from the api server to clients, json encoding and decoding of these large manifest texts could be contributing to this performance issues. Reducing the amount of data and the number of times this data is exchanged can greatly help.
22 1 Radhika Chippada
23 1 Radhika Chippada
h3. 1. Fetching a large collection
24 1 Radhika Chippada
25 1 Radhika Chippada
* Compress the data transferred (We recently enabled gzip compression between API and workbench)
26 1 Radhika Chippada
27 1 Radhika Chippada
* Use efficient json encoding / decoding
28 1 Radhika Chippada
** We are using Oj between API server and Workbench. Is there room for further improvement? (http://devblog.agworld.com.au/post/42586025923/the-performance-of-to-json-in-rails-sucks-and)
29 1 Radhika Chippada
** Are we consistently using Oj in Ruby SDK? (Radhika: I need to do further research to answer this question)
30 1 Radhika Chippada
31 1 Radhika Chippada
* Send the data in smaller chunks (?)
32 1 Radhika Chippada
** Is it possible for us to implement some form of “paging” strategy in sending the manifest text to the clients from the API server?
33 1 Radhika Chippada
34 1 Radhika Chippada
h3. 2. Collection#show in workbench
35 1 Radhika Chippada
36 1 Radhika Chippada
* Implement paging (?) in the collection#show? Get “pages” of collection and display them as needed.  
37 1 Radhika Chippada
38 1 Radhika Chippada
* Avoid making multiple calls to the API server for the same data by caching or preloading data (See #5908)
39 1 Radhika Chippada
40 1 Radhika Chippada
* Show less information in the collection page (such as not linking images that are going to 404)? (See #5908)
41 1 Radhika Chippada
42 1 Radhika Chippada
h3. 3. Create a collection by combining
43 1 Radhika Chippada
44 1 Radhika Chippada
* Offer an API server method that accepts the selections array (and optionally owner_uuid and name) and performs the creation of the new collection in the backend. Doing so can help as follows:
45 1 Radhika Chippada
** When combining entire collections: We can completely eliminate the need to fetch the manifest text for the collections in workbench. Also, workbench would no longer need to work through the combining logic and generate the manifest text for the new collection to be created. No need to do JSON decode and encode the manifest text. Lastly, it would not need to send this manifest text to the API server on the wire. Instead, the API server can do all these steps on the server and create the new collection and send the generated collection uuid to workbench (which will then reduce the performance issue down to collection#show issue; yay)
46 1 Radhika Chippada
** When combining selected files from within a collection: Here also, we can see significant performance improvements by eliminating need to generate the combined manifest text and sending it on wire.
47 1 Radhika Chippada
48 1 Radhika Chippada
h3. 4. Implement caching using a framework such as Memcache
49 1 Radhika Chippada
50 1 Radhika Chippada
* One of the issues listed above (#5901) is around being able to access collection in multiple threads in parallel. Also, #5908 highlights several API requests being repeated within one page display. In fact, we have this issue in several areas of workbench implementation.
51 1 Radhika Chippada
52 1 Radhika Chippada
* By implementing caching, we will be able to reduce the need to make round trip API requests to fetch these objects. Instead, we can improve performance by fetching these objects from the shared cache. 
53 1 Radhika Chippada
54 1 Radhika Chippada
* Question: Not sure how caching would work if / when we cache these huge collections.