https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422016-10-31T20:39:13ZArvadosArvados - Bug #10224: [FUSE] Fix expensive calls to log APIhttps://dev.arvados.org/issues/10224?journal_id=447372016-10-31T20:39:13ZTom Cleggtom@curii.com
<ul></ul><p>Copied from <a class="issue tracker-1 status-3 priority-4 priority-default closed parent" title="Bug: Workbench search is very slow (>60sec) (Resolved)" href="https://dev.arvados.org/issues/10028">#10028</a>:</p>
I tried some queries on test servers and found
<ul>
<li>counting lots of rows in a huge table is slow, regardless of how great the index is: in order to count accurately, postgresql has to visit every counted row</li>
<li>given this, postgresql might be outsmarting us when it does a seq scan in our test trials. It knows most of the rows match the query, which means it will have to visit them anyway when it does a count -- so there is not much benefit in consulting the index first.</li>
</ul>
Typically, the huge table that causes trouble is "logs". For example:
<ul>
<li>arv-mount: subscribe to all events<pre><code class="python syntaxhl"> <span class="bp">self</span><span class="p">.</span><span class="n">events</span> <span class="o">=</span> <span class="n">arvados</span><span class="p">.</span><span class="n">events</span><span class="p">.</span><span class="n">subscribe</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">_api_client</span><span class="p">,</span>
<span class="p">[[</span><span class="s">"event_type"</span><span class="p">,</span> <span class="s">"in"</span><span class="p">,</span> <span class="p">[</span><span class="s">"create"</span><span class="p">,</span> <span class="s">"update"</span><span class="p">,</span> <span class="s">"delete"</span><span class="p">]]],</span>
<span class="bp">self</span><span class="p">.</span><span class="n">on_event</span><span class="p">)</span>
</code></pre></li>
<li>Python SDK: <pre><code class="python syntaxhl"><span class="n">items</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">api</span><span class="p">.</span><span class="n">logs</span><span class="p">().</span><span class="nb">list</span><span class="p">(</span><span class="n">limit</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">order</span><span class="o">=</span><span class="s">"id desc"</span><span class="p">,</span> <span class="n">filters</span><span class="o">=</span><span class="n">f</span><span class="p">).</span><span class="n">execute</span><span class="p">()[</span><span class="s">'items'</span><span class="p">]</span></code></pre></li>
<li>API server: <pre><code class="ruby syntaxhl"> <span class="n">list</span><span class="p">[</span><span class="ss">:items_available</span><span class="p">]</span> <span class="o">=</span> <span class="vi">@objects</span><span class="p">.</span>
<span class="nf">except</span><span class="p">(</span><span class="ss">:limit</span><span class="p">).</span><span class="nf">except</span><span class="p">(</span><span class="ss">:offset</span><span class="p">).</span>
<span class="nf">count</span><span class="p">(</span><span class="ss">:id</span><span class="p">,</span> <span class="ss">distinct: </span><span class="kp">true</span><span class="p">)</span>
</code></pre></li>
<li>Postgresql: Slowly count all rows of the giant logs table with event_type ∈ {"create", "update", "delete"}.</li>
</ul>
<p>Nobody here actually cares about items_available.</p>
<p>All PollClient needs is max(id) from the logs table, and only for the purpose of passing it to the next logs().list() call.</p>
<p>Perhaps it could pass a filter like <code class="python syntaxhl"><span class="p">[[</span><span class="s">"created_at"</span><span class="p">,</span><span class="s">">="</span><span class="p">,</span><span class="n">subscription_time</span><span class="p">]]</span></code> (instead of doing this expensive API call to find the last known ID and then passing <code class="python syntaxhl"><span class="p">[[</span><span class="s">"logs.id"</span><span class="p">,</span><span class="s">">"</span><span class="p">,</span><span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="nb">id</span><span class="p">)]]</span></code>) ...?</p> Arvados - Bug #10224: [FUSE] Fix expensive calls to log APIhttps://dev.arvados.org/issues/10224?journal_id=447492016-11-01T15:30:27ZTom Cleggtom@curii.com
<ul></ul><p>10224-efficient-event-poll-startup</p>
<p>test <a class="changeset" title="10224: Choose a recent-event threshold without querying the entire event history." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/0e98d5f2f9c827deb5beb4a1765a4718d7cffd88">0e98d5f2f9c827deb5beb4a1765a4718d7cffd88</a></p> Arvados - Bug #10224: [FUSE] Fix expensive calls to log APIhttps://dev.arvados.org/issues/10224?journal_id=447512016-11-01T15:38:32ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Bug #10224: [FUSE] Fix expensive calls to log APIhttps://dev.arvados.org/issues/10224?journal_id=447522016-11-01T15:48:48ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> set to <i>2016-11-09 sprint</i></li></ul> Arvados - Bug #10224: [FUSE] Fix expensive calls to log APIhttps://dev.arvados.org/issues/10224?journal_id=447532016-11-01T15:50:10ZTom Cleggtom@curii.com
<ul><li><strong>Assigned To</strong> set to <i>Tom Clegg</i></li></ul> Arvados - Bug #10224: [FUSE] Fix expensive calls to log APIhttps://dev.arvados.org/issues/10224?journal_id=447852016-11-02T14:36:44ZRadhika Chippadaradhika@curoverse.com
<ul></ul><p><a class="changeset" title="10224: Choose a recent-event threshold without querying the entire event history." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/0e98d5f2f9c827deb5beb4a1765a4718d7cffd88">0e98d5f</a></p>
<p>API unit tests failed (log_test)</p>
<p>One python test failed: test_subscribe_poll (tests.test_events.WebsocketTest)</p>
<p>I didn’t run the fuse tests</p>
<p>With the tests fixed, LGTM</p> Arvados - Bug #10224: [FUSE] Fix expensive calls to log APIhttps://dev.arvados.org/issues/10224?journal_id=448712016-11-03T18:28:06ZTom Cleggtom@curii.com
<ul></ul><p>API unit tests fixed.</p>
<p>Python SDK tests pass for me. We'll see what Jenkins thinks -- build 56 queued at <a class="external" href="https://ci.curoverse.com/job/developer-run-tests/">https://ci.curoverse.com/job/developer-run-tests/</a></p> Arvados - Bug #10224: [FUSE] Fix expensive calls to log APIhttps://dev.arvados.org/issues/10224?journal_id=450892016-11-09T18:58:43ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul>