https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422016-08-24T19:31:54ZArvadosArvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=422882016-08-24T19:31:54ZTom Cleggtom@curii.com
<ul></ul><p>Possible approach:</p>
<p>In ContainerRequest#container_completed! in <a class="source" href="https://dev.arvados.org/projects/arvados/repository/arvados/entry/services/api/app/models/container_request.rb">source:services/api/app/models/container_request.rb</a>, create new log and output collections with the same owner_uuid as the container request itself (copying manifest_text from the container log/output).</p>
<p>In crunch-run, set expires_at=(now + permissionTTL) when saving log and output collections (otherwise, all logs and outputs will use up storage space forever, even after users delete their copies).</p> Arvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=425382016-08-31T19:53:53ZTom Cleggtom@curii.com
<ul><li><strong>Assigned To</strong> set to <i>Tom Clegg</i></li></ul> Arvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=427592016-09-06T18:25:56ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Story points</strong> set to <i>1.0</i></li><li><strong>Target version</strong> set to <i>Arvados Future Sprints</i></li></ul> Arvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=427602016-09-06T18:26:25ZTom Cleggtom@curii.com
<ul></ul><p>In the container re-use code, we'll need to check whether the log is still available (as we currently do with outputs in crunch1)</p> Arvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=430682016-09-14T19:22:50ZTom Cleggtom@curii.com
<ul><li><strong>Subject</strong> changed from <i>[API] Ensure saved logs and outputs from containers are accessible to non-admin users</i> to <i>[Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin users</i></li></ul> Arvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=430692016-09-14T19:23:40ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> changed from <i>Arvados Future Sprints</i> to <i>2016-09-28 sprint</i></li></ul> Arvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=434492016-09-26T19:35:10ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=434682016-09-27T17:51:20ZTom Cleggtom@curii.com
<ul></ul><p>9848-copy-container-output @ <a class="changeset" title="9848: Set expiry time on container output and log collections." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/feff3b35134c95961d9846836491a3ad1600969c">feff3b3</a></p> Arvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=435072016-09-28T14:44:01ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><ul>
<li>The log checking code:</li>
</ul>
<pre>
readable_pdh = Collection.
readable_by(current_user).
select('portable_data_hash')
completed = completed.
where("log in (#{readable_pdh.to_sql})").
order('finished_at asc').
limit(1)
</pre>
<p>This kind of hurts my head. I get this is using a subquery, so we're assuming the postgres query planner will DTRT. However, if it does not optimize the query, it could inefficiently enumerate every collection visible to the user. Can we verify that it is going to be smart and execute the query in a way that ensures that it uses the portable_data_hash index?</p>
<ul>
<li>Right above the new code I noticed this line (from Lucas):</li>
</ul>
<pre>
completed = candidates.where(state: Complete).where(exit_code: 0)
</pre>
<p>I don't think we want to filter on exit code, since a deterministic container would always return the same exit code? Also a few tools like "grep" return non-zero codes for conditions that are not errors.</p>
<ul>
<li>It appears that when job reuse happens, the container_request state remains in Committed (which means it won't create the output/log collections). I think there needs to be an additional hook for when the container request is in Committed state which checks to see if the referenced container is in Complete/Cancelled state and calls container_completed!</li>
</ul> Arvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=435102016-09-28T15:54:09ZTom Cleggtom@curii.com
<ul></ul><p>Peter Amstutz wrote:</p>
<blockquote>
<ul>
<li>The log checking code:</li>
</ul>
<p>[...]</p>
<p>This kind of hurts my head. I get this is using a subquery, so we're assuming the postgres query planner will DTRT. However, if it does not optimize the query, it could inefficiently enumerate every collection visible to the user. Can we verify that it is going to be smart and execute the query in a way that ensures that it uses the portable_data_hash index?</p>
</blockquote>
<p>"Explain analyze" on a small database doesn't necessarily tell us very much about real-world performance (postgres chooses seq scan over an index when it knows the table is small) but here it is:</p>
<pre>
arvados_production=# explain analyze SELECT "containers".* FROM "containers" WHERE (log in (SELECT portable_data_hash FROM "collections" WHERE (expires_at IS NULL or expires_at > statement_timestamp()))) ORDER BY finished_at asc LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1136.43..1136.43 rows=1 width=2299) (actual time=0.031..0.031 rows=0 loops=1)
-> Sort (cost=1136.43..1136.43 rows=1 width=2299) (actual time=0.031..0.031 rows=0 loops=1)
Sort Key: containers.finished_at
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=1072.28..1136.42 rows=1 width=2299) (actual time=0.014..0.014 rows=0 loops=1)
Hash Cond: ((collections.portable_data_hash)::text = (containers.log)::text)
-> HashAggregate (cost=1071.16..1117.80 rows=4664 width=36) (never executed)
Group Key: (collections.portable_data_hash)::text
-> Seq Scan on collections (cost=0.00..1047.53 rows=9455 width=36) (never executed)
Filter: ((expires_at IS NULL) OR (expires_at > statement_timestamp()))
-> Hash (cost=1.05..1.05 rows=5 width=2299) (actual time=0.005..0.005 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 0kB
-> Seq Scan on containers (cost=0.00..1.05 rows=5 width=2299) (actual time=0.003..0.005 rows=5 loops=1)
Planning time: 1.122 ms
Execution time: 0.149 ms
(15 rows)
</pre>
<p>The more obvious solution is to get batches of containers and do the readable_by query manually until one is readable, which seems messier and as likely to be slower as anything. I think by default we should let Postgres do this sort of thing for us, and only poke it in the eye on an as-deserved basis.</p>
<blockquote>
<ul>
<li>Right above the new code I noticed this line (from Lucas):</li>
</ul>
<p>[...]</p>
<p>I don't think we want to filter on exit code, since a deterministic container would always return the same exit code? Also a few tools like "grep" return non-zero codes for conditions that are not errors.</p>
</blockquote>
<p>It's common for a container to exit non-zero after a fatal transient error, like OOM or a network problem that prevents it from reading its input. I don't think it would make sense to reuse those outputs, or disqualify all successful containers with identical outputs just because there was also one attempt that produced different output because it failed.</p>
<blockquote>
<ul>
<li>It appears that when job reuse happens, the container_request state remains in Committed (which means it won't create the output/log collections). I think there needs to be an additional hook for when the container request is in Committed state which checks to see if the referenced container is in Complete/Cancelled state and calls container_completed!</li>
</ul>
</blockquote>
<p>Yes, presumably committed requests should get finalized immediately if the container is already finished. I guess we didn't notice this in <a class="issue tracker-6 status-3 priority-4 priority-default closed parent" title="Idea: [Crunch2] [API] Reuses Containers to satisfy ContainerRequests (Resolved)" href="https://dev.arvados.org/issues/9623">#9623</a>? We should add a test. OK to merge this in the meantime?</p> Arvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=435112016-09-28T16:02:43ZTom Cleggtom@curii.com
<ul></ul><p>On the topic of exit codes, surely the only sane approach is to follow the rule: exit 0 means success, exit non-zero means failure. Programs that are broken (e.g., a shell script that says "grep x y" where it should say "grep x y || true") are broken. If we start second-guessing exit codes, we'll just never know whether anything succeeded, even programs that <em>aren't</em> broken...</p> Arvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=435382016-09-28T18:39:45ZRadhika Chippadaradhika@curoverse.com
<ul><li><strong>Target version</strong> changed from <i>2016-09-28 sprint</i> to <i>2016-10-12 sprint</i></li></ul> Arvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=438452016-10-11T21:22:02ZTom Cleggtom@curii.com
<ul></ul><p>9848-finalize-on-reuse</p>
<p>test <a class="changeset" title="9848: Finalize container request immediately if resolving to an already-finished container." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/5185a24aa8d9557cf5e5d2689f599116771caae4">5185a24aa8d9557cf5e5d2689f599116771caae4</a></p>
<p><a class="external" href="https://ci.curoverse.com/job/developer-run-tests/20/">https://ci.curoverse.com/job/developer-run-tests/20/</a></p> Arvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=439152016-10-12T18:42:49ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> changed from <i>2016-10-12 sprint</i> to <i>2016-10-26 sprint</i></li></ul> Arvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=440282016-10-13T15:13:45ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>9848-finalize-on-reuse LGTM.</p> Arvados - Bug #9848: [Crunch2] [API] Ensure saved logs and outputs from containers are accessible to non-admin usershttps://dev.arvados.org/issues/9848?journal_id=440352016-10-13T15:55:05ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul><p>Applied in changeset arvados|commit:06f40b3d477f3ab1c834d9671808c3da02ca7dfd.</p>