https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422015-06-02T08:39:07ZArvadosArvados - Idea #6203: [API] [Performance] Optimize time spent on the API server side during a large collection creation.https://dev.arvados.org/issues/6203?journal_id=253822015-06-02T08:39:07ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Assigned To</strong> set to <i>Radhika Chippada</i></li><li><strong>Target version</strong> set to <i>2015-06-10 sprint</i></li></ul> Arvados - Idea #6203: [API] [Performance] Optimize time spent on the API server side during a large collection creation.https://dev.arvados.org/issues/6203?journal_id=253892015-06-02T13:41:08ZRadhika Chippadaradhika@curoverse.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Idea #6203: [API] [Performance] Optimize time spent on the API server side during a large collection creation.https://dev.arvados.org/issues/6203?journal_id=255372015-06-04T17:43:39ZRadhika Chippadaradhika@curoverse.com
<ul></ul><p>Below are the numbers after I tweaked the above mentioned methods:</p>
<p>Workbench tests: test/integration_performance/collection_unit_test.rb<br />In my branch:<br />0.158198771s build example<br />0.512019084s new (manifest size = 6MiB)<br />9.831454755s create<br />3.164605099s read<br />0.013937852s read(cached)<br />6.026941976s list<br />4.689271296s update(name-only)<br />13.405557288s update<br />3.584947671s delete<br />0.052028055s read(404)<br />BigCollectionTest#test_crud_cycle_for_collection_with_big_manifest_(compress=true) = 42.39 s = .</p>
<p>In master:<br />0.130952126s build example<br />0.525899681s new (manifest size = 6MiB)<br />14.075667663s create<br />4.108906145s read<br />0.012675329s read(cached)<br />8.084185956s list<br />6.344242344s update(name-only)<br />19.684942139s update<br />4.963485125s delete<br />0.039076585s read(404)<br />BigCollectionTest#test_crud_cycle_for_collection_with_big_manifest_(compress=true) = 58.95 s = .</p>
<p>API TESTS: test/integration/collections_performance_test.rb<br />In my branch: <br />CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest = 0.28716071s make example<br />0.134842282s JSON encode 18MiB manifest<br />11.402078717s create<br />3.083956521s read<br />3.086899177s list<br />12.464819769s update<br />3.742708626s delete<br />35.18 s = .</p>
<p>In master:<br />CollectionsApiPerformanceTest#test_crud_cycle_for_a_collection_with_a_big_manifest = 0.296271169s make example<br />0.131979662s JSON encode 18MiB manifest<br />16.23883748s create<br />3.903416969s read<br />3.928377929s list<br />18.550129007s update<br />4.560380538s delete<br />48.61 s = .</p> Arvados - Idea #6203: [API] [Performance] Optimize time spent on the API server side during a large collection creation.https://dev.arvados.org/issues/6203?journal_id=256262015-06-08T03:17:58ZTom Cleggtom@curii.com
<ul></ul><p>6203-collection-perf-api @ <a class="changeset" title="6203: Benchmarking revealed that regexp.match(string) is 2.5x more expensive than string =~ regex..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/12afe0eb0da3dfd2c4a136478628ec5726016596">12afe0e</a></p>
<a name="compute_pdh-and-caching"></a>
<h3 >compute_pdh and caching<a href="#compute_pdh-and-caching" class="wiki-anchor">¶</a></h3>
I think the new <code>compute_pdh</code> method is dangerous because it doesn't necessarily compute the pdh. For example, this looks like pdh2 will come from c2.manifest_text, but it doesn't:
<ul>
<li><pre><code class="ruby syntaxhl"><span class="n">pdh1</span> <span class="o">=</span> <span class="n">c1</span><span class="p">.</span><span class="nf">compute_pdh</span>
<span class="n">c1</span><span class="p">.</span><span class="nf">manifest_text</span> <span class="o">=</span> <span class="n">c2</span><span class="p">.</span><span class="nf">manifest_text</span>
<span class="n">pdh2</span> <span class="o">=</span> <span class="n">c1</span><span class="p">.</span><span class="nf">compute_pdh</span>
</code></pre></li>
</ul>
I'm finding the following things more confusing than they deserve to be:
<ul>
<li><code>@need_pdh_validation</code></li>
<li><code>compute_pdh()</code></li>
<li><code>@computed_pdh</code></li>
<li><code>computed_pdh()</code></li>
<li><code>computed_pdh</code> (local variable in check_signatures)</li>
</ul>
Perhaps we can make this correct+understandable+fast all at once by doing something like this:
<ul>
<li><code>compute_pdh</code> method computes the pdh right now.</li>
<li><code>computed_pdh</code> method implements a dependable cache of compute_pdh by doing something like this:
<ul>
<li><pre><code class="ruby syntaxhl"><span class="k">def</span> <span class="nf">computed_pdh</span>
<span class="k">if</span> <span class="vi">@computed_pdh_for_manifest_text</span> <span class="o">==</span> <span class="n">manifest_text</span>
<span class="k">return</span> <span class="vi">@computed_pdh</span>
<span class="k">end</span>
<span class="vi">@computed_pdh</span> <span class="o">=</span> <span class="n">compute_pdh</span>
<span class="vi">@computed_pdh_for_manifest_text</span> <span class="o">=</span> <span class="n">manifest_text</span><span class="p">.</span><span class="nf">dup</span>
<span class="k">end</span>
</code></pre></li>
</ul>
</li>
<li>Get rid of <code>@need_pdh_validation</code> -- just call <code>computed_pdh</code> instead of <code>compute_pdh</code>. In fact, the whole <code>set_portable_data_hash</code> method seems to exist to [a] set a flag that validation is needed, which is irrelevant since validation is always fast now that we have computed_pdh, and [b] transparently modify the "size" part of the client-provided locator to the actual size of portable_manifest_text, which (if we need it at all) would make more sense as an explicit "auto-correct" step in <code>ensure_hash_matches_manifest_text</code>.</li>
<li>Get rid of the <code>computed_pdh</code> local variable in <code>check_signatures</code> and use the <code>computed_pdh</code> method instead.</li>
<li>After <code>strip_manifest_text</code>, take care of the special case where <code>@computed_pdh</code> is still valid even though we changed <code>manifest_text</code> (i.e., if <code>@computed_pdh_for_manifest_text == manifest_text</code> before stripping, then set <code>@computed_pdh_for_manifest_text = stripped_manifest_text</code> after stripping).</li>
</ul>
<p>Aside: I think <code>strip_manifest_text_and_...</code> should be renamed to <code>strip_signatures_and_...</code>. The term "strip" is vague, and confusing if you've recently looked at the Python SDK where it refers to what <code>portable_manifest_text</code> does here.</p>
<p>Aside: <code>ensure_hash_matches_manifest_text</code> should probably be renamed to <code>ensure_pdh_matches_manifest_text</code>.</p>
<a name="New-munge-method"></a>
<h3 >New munge method<a href="#New-munge-method" class="wiki-anchor">¶</a></h3>
<p>The new version no longer modifies its argument, so it should be renamed to just <code>munge_manifest_locators</code>.</p>
<p><code>manifest.dup</code> is unnecessary in sign_manifest now that munge doesn't modify its argument. In fact, it can now reduce to:</p>
<pre><code class="diff syntaxhl"><span class="gd">- m = manifest.dup
- m = munge_manifest_locators!(m) do |match|
- Blob.sign_locator(locator_without_signature(match), signing_opts)
- end
- return m
</span><span class="gi">+ munge_manifest_locators manifest do |match|
+ Blob.sign_locator(locator_without_signature(match), signing_opts)
+ end
</span></code></pre>
<p>Likewise, in portable_manifest_text:<br /><pre><code class="diff syntaxhl"><span class="gd">- portable_manifest = self[:manifest_text].dup
- portable_manifest = self.class.munge_manifest_locators!(portable_manifest) do |match|
</span><span class="gi">+ portable_manifest = self.class.munge_manifest_locators!(manifest_text) do |match|
</span></code></pre></p>
<p>Every non-zero manifest ends with <code>"\n"</code> so I don't think we need the "omit trailing \n if the input didn't have one" code. The only valid case this covers can be handled more simply by putting <code>return "" if manifest==""</code> right at the top. Likewise, "return nil if manifest.nil?" should eliminate the <code>.andand</code> stuff.</p>
<p>Instead of appending "\n" to the string, you can do this, to avoid that last (and presumably most expensive) string concatenation:<br /><pre><code class="diff syntaxhl"><span class="gd">- if !new_lines.empty?
- ends_with_newline = manifest.end_with?("\n")
- manifest = new_lines.join("\n")
- manifest += "\n" if ends_with_newline
- end
</span><span class="gi">+ (new_lines + ['']).join("\n")
</span></code></pre></p>
<p>In <code>self.sign_manifest</code> we now sign <code>locator_without_signature(match)</code>. Is there ever a case where sign_manifest is called before the signatures are stripped? I don't think there should be, in which case this could be more efficient as just <code>match[0]</code>, instead of <code>locator_without_signature(match)</code>. AFAICT, we didn't need to remove signatures here before this branch...</p>
<p>There are two uses of "word.strip" here which seem to be superfluous, since <code>word</code> comes from <code>line.split(' ')</code>.</p>
<p>Could we use Ruby's built-in iterators (<code>manifest.each_line</code> and <code>line.each_line(' ')</code>) instead of actually instantiating arrays with split()? Manifests and lines can be many megabytes long, and we don't actually use the third copy of the manifest in order to do what we need. Assuming this change doesn't make performance worse, it seems like a worthwhile memory-consumption optimization. Unfortunately each_line includes the trailing separator with each yielded item, which might make the blocks a bit ugly. We could punt on this -- but please do note if you already tried it and found performance was worse.</p>
<a name="locator_without_signature"></a>
<h3 >locator_without_signature<a href="#locator_without_signature" class="wiki-anchor">¶</a></h3>
<p>This method <em>looks</em> like it does a lot more work than it should. It only ever needs to do a single replacement. Is the split() and the N appends really faster than this?</p>
<pre><code class="ruby syntaxhl"> <span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">locator_without_signature</span> <span class="n">match</span>
<span class="n">match</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nf">sub</span><span class="p">(</span><span class="sr">/\+A[^+]*/</span><span class="p">,</span> <span class="s1">''</span><span class="p">)</span>
<span class="k">end</span>
</code></pre>
<a name="combined-validation-method"></a>
<h3 >combined validation method<a href="#combined-validation-method" class="wiki-anchor">¶</a></h3>
<p>I guess you don't like "_maybe_" in the name, but how about "..._update_replication_confirmed"? I think the new name "..._and_clear_replication_confirmed" suggests it's always going to clear those attrs, making it less helpful than the previous name.</p>
This explanatory comment should probably move to the appropriate place in the new code instead of being lost:
<ul>
<li><pre><code class="diff syntaxhl"><span class="gd">- # If the new manifest_text contains locators whose hashes
- # weren't in the old manifest_text, storage replication is no
- # longer confirmed.
</span></code></pre></li>
</ul>
<p>We should check <code>self.replication_confirmed.nil?</code> directly, instead of adding a new <code>cleared_replication_confirmed</code> variable to track it: the "not in_old_manifest" check is equally useless regardless of when/why <code>replication_confirmed</code> became nil. We should also skip building the <code>in_old_manifest</code> hash in the first place if <code>self.replication_confirmed</code> is already nil. We should also add something to the API unit tests to reveal how slow the replication_confirmed check is -- IIRC all existing performance tests will skip it once we make this optimization.</p>
<p>AIUI, the reason we use <code>self[:manifest_text]</code> instead of the usual <code>manifest_text</code> is to sneakily avoid setting the ActiveRecord "changed" flag when stripping signatures. I think it would be good form to avoid using it in any cases where it isn't necessary. Here, we need to use it when setting, but not when getting.</p>
<p>The part where we build the <code>in_old_manifest</code> hash used to use an <code>each_manifest_locator</code> method which was more efficient because it didn't build a new manifest: it just visited every locator, and ignored the block's return values. The new code loses this optimization. Instead of dropping the <code>each_manifest_locator</code> method, I think we should give it the same optimizations that <code>munge</code> got (i.e., use <code>each</code> or <code>split</code> instead of <code>scan</code>) and use it to build <code>in_old_manifest</code>. That should make it even faster (and use less memory) than <code>munge</code>.</p>
<a name="LOCATOR_REGEXP"></a>
<h3 >LOCATOR_REGEXP<a href="#LOCATOR_REGEXP" class="wiki-anchor">¶</a></h3>
<p>This seems to be copied straight from the Ruby SDK. Can we require 'arvados/keep' and use Keep::Locator::LOCATOR_REGEXP in collection.rb, instead of copying & pasting?</p>
<p>It doesn't seem to have any unit tests. Can we add some in the Ruby SDK test suite?</p>
<p>We pass around "match" objects that depend on the position of regexp matches (<code>match[4]</code> etc). If this is meant to be a supported use of LOCATOR_REGEXP, we should write a test to ensure the right pieces show up in the right positions.</p>
<p>If the <code>=~</code> operator is faster than <code>match()</code> then Keep::Locator::valid() should be updatd to use <code>=~</code> too.</p>
<p>LOCATOR_REGEXP erroneously passes the following bogus locators, (I think) because of a spurious "+" in the "hint content" part:</p>
<pre>
d41d8cd98f00b204e9800998ecf8427e+0+B+0
d41d8cd98f00b204e9800998ecf8427e+0+B++
</pre>
<p>It would be great if we could use the exact regexp documented on the <a class="wiki-page" href="https://dev.arvados.org/projects/arvados/wiki/Keep_locator_format">Keep locator format</a> wiki page -- possibly with only the <code>()</code> groupings added.</p> Arvados - Idea #6203: [API] [Performance] Optimize time spent on the API server side during a large collection creation.https://dev.arvados.org/issues/6203?journal_id=256342015-06-08T13:55:34ZTom Cleggtom@curii.com
<ul></ul><p>6203-collection-perf-ruby-sdk @ <a class="changeset" title="6203: update arvados gem file version for workbench" href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/77bf7469d9c7321a5e38dccd40fa64af5e67542f">77bf746</a></p>
<a name="each_file_spec"></a>
<h3 >each_file_spec<a href="#each_file_spec" class="wiki-anchor">¶</a></h3>
<p>The <code>splits</code> variable is poorly named and never used anyway so this should be compressed back to</p>
<pre><code class="diff syntaxhl"><span class="gd">- splits = line.split
- splits.each do |token|
</span><span class="gi">+ line.split.each do |token|
</span></code></pre>
<a name="FILE_REGEXP"></a>
<h3 >FILE_REGEXP<a href="#FILE_REGEXP" class="wiki-anchor">¶</a></h3>
The colons have no apparent reason to be in regexp groups. How about:
<ul>
<li><pre><code class="diff syntaxhl"><span class="gd">- FILE_REGEXP = /^(\d+)(:)(\d+)(:)(.*)$/
</span><span class="gi">+ FILE_REGEXP = /^(\d+):(\d+):(.*)$/
</span></code></pre></li>
</ul>
<a name="files_count"></a>
<h3 >files_count<a href="#files_count" class="wiki-anchor">¶</a></h3>
<p>I don't see why it's a good idea to re-implement each_file_spec here.</p>
<p>The new method seems to do exactly the same amount of work as the old version, except that it saves work by introducing bugs (e.g., it doesn't unescape stream and file names, and it doesn't realize <code>["./foo","bar"]</code> and <code>[".","foo/bar"]</code> are the same file).</p>
<p>One non-bug tweak I see here is keeping a counter instead of calling seen_files.size each time -- but a quick benchmark indicates the counter method is actually slower, especially if you include the extra <code>hash.include?</code> check before incrementing the counter.</p>
<pre><code class="ruby syntaxhl"><span class="nb">require</span> <span class="s1">'benchmark'</span>
<span class="n">n</span><span class="o">=</span><span class="mi">4000000</span>
<span class="no">Benchmark</span><span class="p">.</span><span class="nf">bm</span><span class="p">(</span><span class="mi">8</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">x</span><span class="o">|</span>
<span class="n">x</span><span class="p">.</span><span class="nf">report</span><span class="p">(</span><span class="s1">'Hash#size'</span><span class="p">)</span> <span class="p">{</span> <span class="n">h</span><span class="o">=</span><span class="p">{};</span> <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">1</span><span class="o">..</span><span class="n">n</span><span class="p">;</span> <span class="n">h</span><span class="p">[</span><span class="n">h</span><span class="p">.</span><span class="nf">size</span><span class="p">.</span><span class="nf">to_s</span><span class="p">]</span><span class="o">=</span><span class="kp">true</span><span class="p">;</span> <span class="k">end</span> <span class="p">}</span>
<span class="n">x</span><span class="p">.</span><span class="nf">report</span><span class="p">(</span><span class="s1">'counter1'</span><span class="p">)</span> <span class="p">{</span> <span class="n">h</span><span class="o">=</span><span class="p">{};</span> <span class="n">c</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">1</span><span class="o">..</span><span class="n">n</span><span class="p">;</span> <span class="n">c</span><span class="o">+=</span><span class="mi">1</span><span class="p">;</span> <span class="n">h</span><span class="p">[</span><span class="n">c</span><span class="p">.</span><span class="nf">to_s</span><span class="p">]</span><span class="o">=</span><span class="kp">true</span><span class="p">;</span> <span class="k">end</span> <span class="p">}</span>
<span class="n">x</span><span class="p">.</span><span class="nf">report</span><span class="p">(</span><span class="s1">'counter2'</span><span class="p">)</span> <span class="p">{</span> <span class="n">h</span><span class="o">=</span><span class="p">{};</span> <span class="n">c</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">1</span><span class="o">..</span><span class="n">n</span><span class="p">;</span> <span class="n">c</span><span class="o">+=</span><span class="mi">1</span> <span class="k">unless</span> <span class="n">h</span><span class="p">.</span><span class="nf">include?</span> <span class="s1">'foo'</span><span class="p">;</span> <span class="n">h</span><span class="p">[</span><span class="n">c</span><span class="p">.</span><span class="nf">to_s</span><span class="p">]</span><span class="o">=</span><span class="kp">true</span><span class="p">;</span> <span class="k">end</span> <span class="p">}</span>
<span class="k">end</span>
</code><br /> user system total real<br />Hash#size 6.200000 0.150000 6.350000 ( 6.360331)<br />counter1 8.740000 0.040000 8.780000 ( 8.801807)<br />counter2 9.750000 0.100000 9.850000 ( 9.878150)<br /></pre>
<p>The other non-bug tweak I see is that it uses FILE_REGEXP instead of <code>not Locator.valid?</code>. Is FILE_REGEXP faster than LOCATOR_REGEXP? If so, we should use that to make <code>each_file_spec</code> faster. (It might even turn out that the old version is the fastest if we fix <code>valid?</code> to use <code>=~</code> instead of <code>match()</code>...)</p>
<p>(Meanwhile, <code>files_size</code> <em>could</em> conceivably benefit by doing something simpler than each_file_spec, because it returns the same number regardless of how the files are named, whether they're fragmented, etc. However, we could easily shoot ourselves in the foot this way by computing files_size a fast way and then having to run the slow loop anyway because the caller also wants to know files_count. So I don't think this is a worthwhile investment.)</p>
<p>Optimizing for the case where files_count and files_size are called, but <code>files()</code> isn't called, has the same problem -- but at least we know one case where it happens, and it does make it possible to get the file count and size without keeping the big <code>@files</code> array in memory, so it seems worth keeping.</p>
<p>But if we expect callers to invoke both <code>files_count</code> and <code>files_size</code>, we should avoid being sensitive to the order in which they're called, like this:</p>
<pre><code class="ruby syntaxhl"><span class="k">def</span> <span class="nf">files_count</span><span class="p">(</span><span class="n">stop_after</span><span class="o">=</span><span class="kp">nil</span><span class="p">)</span>
<span class="o">...</span>
<span class="vi">@files_size</span> <span class="o">=</span> <span class="n">total_size</span>
<span class="vi">@files_count</span> <span class="o">=</span> <span class="n">count_so_far</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">files_size</span>
<span class="n">files_count</span>
<span class="k">return</span> <span class="vi">@files_size</span> <span class="k">if</span> <span class="vi">@files_size</span>
<span class="n">files</span><span class="p">.</span><span class="nf">reduce</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="o">|</span><span class="n">total</span><span class="p">,</span> <span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">size</span><span class="p">)</span><span class="o">|</span> <span class="n">total</span> <span class="o">+</span> <span class="n">size</span> <span class="p">}</span>
<span class="k">end</span>
</code></pre>
<a name="Tests"></a>
<h3 >Tests<a href="#Tests" class="wiki-anchor">¶</a></h3>
<p>There are various new code paths here that could return different results depending on what order the methods are called. If that's our strategy, we need to test accordingly.</p>
<p>Also, when we introduce more manifest-parsing/iterating code (which we should try to avoid) it should at least get tested on things like COLON_FILENAME_MANIFEST, MANY_ESCAPES_MANIFEST, and MULTILEVEL_MANIFEST_WITH_DIRS_IN_FILENAMES.</p> Arvados - Idea #6203: [API] [Performance] Optimize time spent on the API server side during a large collection creation.https://dev.arvados.org/issues/6203?journal_id=256762015-06-09T02:29:11ZRadhika Chippadaradhika@curoverse.com
<ul></ul><p>Responses to <a class="external" href="https://arvados.org/issues/6203#note-4">https://arvados.org/issues/6203#note-4</a></p>
<ul>
<li>Addressed the comments in “compute_pdh and caching”</li>
</ul>
<ul>
<li>Updated munge method accordingly.
<ul>
<li>Also, removed “ends_with_newline” and instead corrected the failing unit test to include “\n” at the end of manifest text.</li>
<li>“In self.sign_manifest we now sign locator_without_signature(match) …” You are right. This is my mistake and corrected it.</li>
<li>manifest.each_line is indeed faster than split(“\n”). However this still includes the trailing new line character. Stripping it using rstrip! made it the same as split(“\n”). However, given the memory optimization, this is better and hence updated accordingly.</li>
<li>Continuing to use split instead of each_line(‘ ‘) since this is resulting in some performance loss. We can revisit this if in future we experience memory issues.</li>
</ul></li>
</ul>
<ul>
<li>locator_without_signature: using sub (gsub in fact) as you suggested, resulted in another 2s gain in api integration test performance.</li>
</ul>
<ul>
<li>combined validation method
<ul>
<li>Added the suggested enhancements </li>
<li>Did not add tests in the interest of time + as you said no real point in doing this at this stage</li>
<li>Please check self[:manifest_text] usages; I could not see any misuses after the updates.</li>
</ul></li>
</ul>
<ul>
<li>LOCATOR_REGEXP
<ul>
<li>Switched to Keep::Locator::LOCATOR_REGEXP as you suggested</li>
<li>Did not add the tests; let’s do this as part of <a class="issue tracker-6 status-5 priority-4 priority-default closed" title="Idea: [SDKs] [Performance] Ruby SDK performance metrics (Closed)" href="https://dev.arvados.org/issues/6270">#6270</a></li>
<li>I also did not update the LOCATOR_REGEXP. Let’s try to do it as part of <a class="issue tracker-6 status-5 priority-4 priority-default closed" title="Idea: [SDKs] [Performance] Ruby SDK performance metrics (Closed)" href="https://dev.arvados.org/issues/6270">#6270</a> or <a class="issue tracker-6 status-5 priority-4 priority-default closed" title="Idea: [SDKs] [Performance] Improve ruby SDK performance (Closed)" href="https://dev.arvados.org/issues/6271">#6271</a> rather than adding more to this story.</li>
</ul></li>
</ul>
<p>Thanks.</p> Arvados - Idea #6203: [API] [Performance] Optimize time spent on the API server side during a large collection creation.https://dev.arvados.org/issues/6203?journal_id=256772015-06-09T02:33:21ZRadhika Chippadaradhika@curoverse.com
<ul></ul><p>Regarding "Review branch: 6203-collection-perf-ruby-sdk" and <a class="external" href="https://arvados.org/issues/6203#note-5">https://arvados.org/issues/6203#note-5</a>:</p>
<p>I removed this review task and instead created <a class="issue tracker-6 status-5 priority-4 priority-default closed" title="Idea: [SDKs] [Performance] Ruby SDK performance metrics (Closed)" href="https://dev.arvados.org/issues/6270">#6270</a> and <a class="issue tracker-6 status-5 priority-4 priority-default closed" title="Idea: [SDKs] [Performance] Improve ruby SDK performance (Closed)" href="https://dev.arvados.org/issues/6271">#6271</a>. Based on what we learned from #6087 and <a class="issue tracker-6 status-3 priority-4 priority-default closed parent behind-schedule" title="Idea: [API] [Performance] Optimize time spent on the API server side during a large collection creation. (Resolved)" href="https://dev.arvados.org/issues/6203">#6203</a> (benchmarking API methods and improving performance), I think there are no short cuts and we need to do a thorough job of first measuring the performance of various methods and making updates to the ruby SDK based on those findings. Thanks for reviewing though.</p> Arvados - Idea #6203: [API] [Performance] Optimize time spent on the API server side during a large collection creation.https://dev.arvados.org/issues/6203?journal_id=257352015-06-10T06:32:21ZTom Cleggtom@curii.com
<ul></ul><p>Updates look good. I've pushed a few more fixes on 6203-collection-perf-api-TC. Mostly comments and clarity, but also saved a bit of time in "sign" by removing an unused split() left over from changing to each_line. Can you take a look, and merge if these tweaks look OK?</p> Arvados - Idea #6203: [API] [Performance] Optimize time spent on the API server side during a large collection creation.https://dev.arvados.org/issues/6203?journal_id=257382015-06-10T13:22:33ZTom Cleggtom@curii.com
<ul></ul><p>6203-collection-perf-api-TC @ <a class="changeset" title="6203: Accept (and discard) hints in client-provided portable_data_hash." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/02156d71070b21872d2dac3c5912c9a1ccd8d684">02156d7</a></p> Arvados - Idea #6203: [API] [Performance] Optimize time spent on the API server side during a large collection creation.https://dev.arvados.org/issues/6203?journal_id=257442015-06-10T14:20:12ZRadhika Chippadaradhika@curoverse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>Applied in changeset arvados|commit:1a98070dcdd379ab6effdc3fc4301c9435a137b1.</p>