https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422020-05-13T16:23:35ZArvadosArvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=840772020-05-13T16:23:35ZTom Cleggtom@curii.com
<ul><li><strong>Story points</strong> set to <i>3.0</i></li></ul> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=840782020-05-13T16:24:32ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Target version</strong> set to <i>2020-06-03 Sprint</i></li></ul> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=841212020-05-14T20:41:15ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/84121/diff?detail_id=80900">diff</a>)</li></ul> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=841322020-05-15T18:32:32ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul><p>work in progress at 16427-undelete</p> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=843132020-05-20T15:48:12ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Assigned To</strong> set to <i>Tom Clegg</i></li></ul> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=843772020-05-22T21:30:23ZWard Vandewegeward@curii.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-3 status-3 priority-4 priority-default closed parent" href="/issues/16421">Support #16421</a>: [doc] document deletion lifecycle of collections, and steps to undelete collections</i> added</li></ul> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=844372020-05-29T20:40:24ZTom Cleggtom@curii.com
<ul></ul><p>16427-undelete @ <a class="changeset" title="16427: Fix test order dependency. Test was incorrectly assuming keep2, keep3 data dirs existed. ..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/167e2537365ec68fe6be6a330a2eb698f177aa05">167e2537365ec68fe6be6a330a2eb698f177aa05</a> -- <a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1879/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1879/">developer-run-tests: #1879 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=1879" alt="" /></a></a></p>
<p>TODO: accept UUID of collection-delete/update log entry (currently must provide file containing manifest text)<br />TODO: doc page</p> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=844792020-06-03T14:02:29ZWard Vandewegeward@curii.com
<ul></ul><p>Review comments:</p>
<p>lib/undelete/cmd.go</p>
<p>+ func (und undeleter) newestMtime(logger logrus.FieldLogger, blk string, svc arvados.KeepService) (time.Time, bool) {</p>
<p>It would be nice to follow the Go error pattern instead, and make the second arg an 'error'. Presumably, block not on svc is an error too!</p>
<p>+func (und undeleter) ensureSafe(ctx context.Context, logger logrus.FieldLogger, blk string, svc arvados.KeepService, blobsigttl time.Duration, blobsigexp time.Time) bool {</p>
<p>Same - why not return an error instead of a bool?</p>
<p>+func (und undeleter) RecoverManifest(mtxt string) (string, error) {</p>
<p>Is there a particular reason why half a BlobSigningTTL is a good choice for the save deadline? Can the reason go into a comment?</p>
<p>+ for i := 0; i < 2*len(services); i++ {</p>
<p>It would be great to make it a bit more clear that these are worker threads, and that the number of threads is being sized by the number of keepstore nodes (rather than the cores on the client, or something). Maybe something like</p>
<pre><code>"workerThreads := 2 * len(services)"</code></pre>
<p>+ if havenot > 0 {<br />+ if have > 0 {<br />+ und.logger.Warn("partial recovery is not implemented")<br />+ }<br />+ return "", fmt.Errorf("unable to recover %d of %d blocks", havenot, have+havenot)<br />+ }</p>
<p>Partial unrecovery will be nice. Just listing the affected blocks, could also be nice. Blocks + affected filenames even better. That should probably be part of the partial recovery feature.</p>
<p>lib/undelete/cmd_test.go</p>
<p>+func (*Suite) TestUntrashAndTouchBlock(c *check.C) {</p>
<p>+ if fi, err := os.Stat(datadir); err != nil || !fi.IsDir() {<br />+ c.Logf("skipping datadir %q, evidently neither keepstore is using it", datadir)<br />+ continue<br />+ }</p>
<p>This seems like it could generate unnecessary noise in the tests, no? Can we make this debug-only, or even just skip the logging altogether? It doesn't seem super useful.</p>
<p>Functionally: running arvados-server with the `undelete` command prints this, which seems unhelpful:</p>
<p>$ ./arvados-server undelete<br />Usage: ./arvados-server undelete [options] uuid_or_file ...<br /> -config file<br /> Site configuration file (default may be overridden by setting an ARVADOS_CONFIG environment variable) (default "/etc/arvados/config.yml")<br /> -legacy-crunch-dispatch-slurm-config file<br /> Legacy crunch-dispatch-slurm configuration file (default "/etc/arvados/crunch-dispatch-slurm/crunch-dispatch-slurm.yml")<br /> -legacy-git-httpd-config file<br /> Legacy arv-git-httpd configuration file (default "/etc/arvados/git-httpd/git-httpd.yml")<br /> -legacy-keepbalance-config file<br /> Legacy keep-balance configuration file (default "/etc/arvados/keep-balance/keep-balance.yml")<br /> -legacy-keepproxy-config file<br /> Legacy keepproxy configuration file (default "/etc/arvados/keepproxy/keepproxy.yml")<br /> -legacy-keepstore-config file<br /> Legacy keepstore configuration file (default "/etc/arvados/keepstore/keepstore.yml")<br /> -legacy-keepweb-config file<br /> Legacy keep-web configuration file (default "/etc/arvados/keep-web/keep-web.yml")<br /> -legacy-ws-config file<br /> Legacy arvados-ws configuration file (default "/etc/arvados/ws/ws.yml")<br /> -log-level string<br /> logging level (debug, info, ...) (default "info")<br /> -skip-legacy<br /> Don't load legacy config files<br />INFO[2020-06-03T09:59:49.409731940-04:00] exiting</p>
<p>Can we have some help text?</p>
<p>Otherwise, LGTM so far!</p> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=845022020-06-03T15:48:06ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> changed from <i>2020-06-03 Sprint</i> to <i>2020-06-17 Sprint</i></li></ul> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=845652020-06-04T20:54:11ZTom Cleggtom@curii.com
<ul></ul><p>Ward Vandewege wrote:</p>
<blockquote>
<p>It would be nice to follow the Go error pattern instead, and make the second arg an 'error'. Presumably, block not on svc is an error too!</p>
</blockquote>
<p>Not exactly (we might go on to find it on a different server), but yes, I've updated both funcs to return error instead of an ok bool.</p>
<blockquote>
<p>Is there a particular reason why half a BlobSigningTTL is a good choice for the save deadline? Can the reason go into a comment?</p>
</blockquote>
<p>Yes, added comment.</p>
<blockquote>
<p>+ for i := 0; i < 2*len(services); i++ {</p>
</blockquote>
<p>Added explanatory comment for this too.</p>
<blockquote>
<p>Partial unrecovery will be nice. Just listing the affected blocks, could also be nice. Blocks + affected filenames even better. That should probably be part of the partial recovery feature.</p>
</blockquote>
<p>Yeah, I think the first "partial recovery" feature would remove any files that are affected by the missing blocks, and save the files that are still intact, and exit non-zero.</p>
<p>Manifests don't support sparse files, so our only options for saving partial files seem to be zero-padding and truncating, so if you start with "abcdefghi" and "def" goes missing, you can recover either "abcghi" or "abc\0\0\0ghi". (I don't think we're going to implement either right away though.)</p>
<blockquote>
<p>This seems like it could generate unnecessary noise in the tests, no? Can we make this debug-only, or even just skip the logging altogether? It doesn't seem super useful.</p>
</blockquote>
<p>Yes, it only shows up when the test fails or you run -check.vv, but I changed it to print the datadirs that <em>are</em> used instead of the ones that aren't, which is pretty much the same information but less mysterious.</p>
<blockquote>
<p>Functionally: running arvados-server with the `undelete` command prints this, which seems unhelpful:</p>
</blockquote>
<p>Removed the legacy config flags, added more explanation, and made the "Usage: ..." part appear in the "bad args" case as well as the "no positional args" case.</p>
<pre>
Usage:
./tmp/GOPATH/bin/arvados-server undelete [options ...] /path/to/manifest.txt [...]
This program recovers deleted collections. Recovery is
possible when the collection's manifest is still available and
all of its data blocks are still available or recoverable
(e.g., garbage collection is not enabled, the blocks are too
new for garbage collection, the blocks are referenced by other
collections, or the blocks have been trashed but not yet
deleted).
For each provided collection manifest, once all data blocks
are recovered/protected from garbage collection, a new
collection is saved and its UUID is printed on stdout.
Exit status will be zero if recovery is successful, i.e., a
collection is saved for each provided manifest.
Options:
-config file
Site configuration file (default may be overridden by setting an ARVADOS_CONFIG environment variable) (default "/etc/arvados/config.yml")
-log-level string
logging level (debug, info, ...) (default "info")
</pre>
<p>16427-undelete @ <a class="changeset" title="16427: Don't print legacy config path flags if they won't be used. Arvados-DCO-1.1-Signed-off-by..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/f525d01de971b8b06dabc827bc0bbf46ee7ce9cc">f525d01de971b8b06dabc827bc0bbf46ee7ce9cc</a> -- <a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1892/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1892/">developer-run-tests: #1892 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=1892" alt="" /></a></a></p>
<p>and with master merged:</p>
<p>16427-undelete @ <a class="changeset" title="16427: Merge branch 'master' Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>" href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/2be95ce7f069d9eb131b0d2d922a5e556f75810c">2be95ce7f069d9eb131b0d2d922a5e556f75810c</a> -- <a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1893/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1893/">developer-run-tests: #1893 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=1893" alt="" /></a></a></p> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=845662020-06-05T15:11:16ZWard Vandewegeward@curii.com
<ul></ul><p>Tom Clegg wrote:</p>
<blockquote>
<p>Ward Vandewege wrote:</p>
<blockquote>
<p>It would be nice to follow the Go error pattern instead, and make the second arg an 'error'. Presumably, block not on svc is an error too!</p>
</blockquote>
<p>Not exactly (we might go on to find it on a different server), but yes, I've updated both funcs to return error instead of an ok bool.</p>
</blockquote>
<p>Cool. Looks like the comments for both functions still needs updating.</p>
<blockquote><blockquote>
<p>Partial unrecovery will be nice. Just listing the affected blocks, could also be nice. Blocks + affected filenames even better. That should probably be part of the partial recovery feature.</p>
</blockquote>
<p>Yeah, I think the first "partial recovery" feature would remove any files that are affected by the missing blocks, and save the files that are still intact, and exit non-zero.</p>
</blockquote>
<p>Yes. That would be very useful already.</p>
<blockquote>
<p>Manifests don't support sparse files, so our only options for saving partial files seem to be zero-padding and truncating, so if you start with "abcdefghi" and "def" goes missing, you can recover either "abcghi" or "abc\0\0\0ghi". (I don't think we're going to implement either right away though.)</p>
</blockquote>
<p>I would probably prefer the zero padding approach (because it shows exactly what sections are missing). I agree that this could be a follow on story.</p>
<blockquote><blockquote>
<p>This seems like it could generate unnecessary noise in the tests, no? Can we make this debug-only, or even just skip the logging altogether? It doesn't seem super useful.</p>
</blockquote>
<p>Yes, it only shows up when the test fails or you run -check.vv, but I changed it to print the datadirs that <em>are</em> used instead of the ones that aren't, which is pretty much the same information but less mysterious.</p>
</blockquote>
<p>Perfect, thanks.</p>
<blockquote><blockquote>
<p>Functionally: running arvados-server with the `undelete` command prints this, which seems unhelpful:</p>
</blockquote>
<p>Removed the legacy config flags, added more explanation, and made the "Usage: ..." part appear in the "bad args" case as well as the "no positional args" case.</p>
</blockquote>
<p>Excellent thank you.</p>
<p>LGTM with those function comments updated. Thanks!</p> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=845712020-06-05T21:20:45ZTom Cleggtom@curii.com
<ul><li><strong>File</strong> <a href="/attachments/2554">16427-doc.png</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/2554/16427-doc.png">16427-doc.png</a> added</li></ul><p><img src="https://dev.arvados.org/attachments/download/2554/16427-doc.png" alt="" /></p>
<p>This branch also adds the "get old manifest from log entry UUID" feature.</p>
<p>16427-undelete @ <a class="changeset" title="16427: Add "undeleting collections" doc page in admin section. Arvados-DCO-1.1-Signed-off-by: To..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/570c793f1aa43fd9763a0368554dd395dacdd238">570c793f1aa43fd9763a0368554dd395dacdd238</a> -- <a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1897/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1897/">developer-run-tests: #1897 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=1897" alt="" /></a></a></p>
<p>Should we also add "provide collection UUID -> get old version from most recent log entry concerning that collection" ...? That would probably be the most convenient way to do the most obvious case, i.e., undo the last change, whether or not it was a "delete".</p> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=845752020-06-08T13:24:47ZWard Vandewegeward@curii.com
<ul></ul><p>Tom Clegg wrote:</p>
<blockquote>
<p><img src="https://dev.arvados.org/attachments/download/2554/16427-doc.png" alt="" /></p>
</blockquote>
<p>Nice! I like the documentation.</p>
<blockquote>
<p>This branch also adds the "get old manifest from log entry UUID" feature.</p>
<p>16427-undelete @ <a class="changeset" title="16427: Add "undeleting collections" doc page in admin section. Arvados-DCO-1.1-Signed-off-by: To..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/570c793f1aa43fd9763a0368554dd395dacdd238">570c793f1aa43fd9763a0368554dd395dacdd238</a> -- <a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1897/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1897/">developer-run-tests: #1897 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=1897" alt="" /></a></a></p>
</blockquote>
<p>Yeah that LGTM. The test failure looks unrelated, do you agree?</p>
<blockquote>
<p>Should we also add "provide collection UUID -> get old version from most recent log entry concerning that collection" ...? That would probably be the most convenient way to do the most obvious case, i.e., undo the last change, whether or not it was a "delete".</p>
</blockquote>
<p>Yes, that would be nice. Should we call that 'undo' instead of 'undelete' ?</p> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=845762020-06-08T13:38:00ZTom Cleggtom@curii.com
<ul></ul><blockquote><blockquote>
<p>Should we also add "provide collection UUID -> get old version from most recent log entry concerning that collection" ...? That would probably be the most convenient way to do the most obvious case, i.e., undo the last change, whether or not it was a "delete".</p>
</blockquote>
<p>Yes, that would be nice. Should we call that 'undo' instead of 'undelete' ?</p>
</blockquote>
<p>I think "undo" isn't quite right because we're (intentionally) saving the recovered data elsewhere, rather than restoring the same collection UUID back to an earlier state. Same reasoning applies to calling the current functionality "undelete" for that matter -- it doesn't actually make the collection UUID come back, there's no attempt to restore name/metadata, etc. Would "recover-collection-data" be better, and make sense whether passing a log uuid, collection uuid, or manifest saved in a file? It's a bit unwieldy/verbose but I'm thinking that's ok for something only used in rare/unusual cases.</p> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=845822020-06-08T14:49:27ZWard Vandewegeward@curii.com
<ul></ul><p>Tom Clegg wrote:</p>
<blockquote><blockquote><blockquote>
<p>Should we also add "provide collection UUID -> get old version from most recent log entry concerning that collection" ...? That would probably be the most convenient way to do the most obvious case, i.e., undo the last change, whether or not it was a "delete".</p>
</blockquote>
<p>Yes, that would be nice. Should we call that 'undo' instead of 'undelete' ?</p>
</blockquote>
<p>I think "undo" isn't quite right because we're (intentionally) saving the recovered data elsewhere, rather than restoring the same collection UUID back to an earlier state. Same reasoning applies to calling the current functionality "undelete" for that matter -- it doesn't actually make the collection UUID come back, there's no attempt to restore name/metadata, etc. Would "recover-collection-data" be better, and make sense whether passing a log uuid, collection uuid, or manifest saved in a file? It's a bit unwieldy/verbose but I'm thinking that's ok for something only used in rare/unusual cases.</p>
</blockquote>
<p>Yes, good call. You could shorten it to recover-collection if you want.</p> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=846582020-06-10T15:32:37ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-6 status-1 priority-4 priority-default" href="/issues/16514">Idea #16514</a>: Actionable insight into keep usage</i> added</li></ul> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=847342020-06-11T20:34:14ZTom Cleggtom@curii.com
<ul></ul>16427-undelete @ <a class="changeset" title="16427: Merge branch 'master' Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>" href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/e48478841828b1dbab8b69eb9453db23b42ed63f">e48478841828b1dbab8b69eb9453db23b42ed63f</a> -- <a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1913/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1913/">developer-run-tests: #1913 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=1913" alt="" /></a></a>
<ul>
<li>rename subcommand to "recover-collection" </li>
<li>accept collection UUID (equivalent to passing UUID of most recent log entry with matching object_uuid)</li>
</ul> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=847472020-06-12T20:08:03ZWard Vandewegeward@curii.com
<ul></ul><p>Tom Clegg wrote:</p>
<blockquote>
16427-undelete @ <a class="changeset" title="16427: Merge branch 'master' Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>" href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/e48478841828b1dbab8b69eb9453db23b42ed63f">e48478841828b1dbab8b69eb9453db23b42ed63f</a> -- <a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1913/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1913/">developer-run-tests: #1913 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=1913" alt="" /></a></a>
<ul>
<li>rename subcommand to "recover-collection" </li>
<li>accept collection UUID (equivalent to passing UUID of most recent log entry with matching object_uuid)</li>
</ul>
</blockquote>
<p>LGTM thanks!</p> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=847522020-06-12T20:50:43ZAnonymous
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul><p>Applied in changeset <a class="changeset" title="Merge branch '16427-undelete' closes #16427 Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomcl..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/9447d2f3bd4c15a4b3bda16dc3ddef0faeb68771">arvados|9447d2f3bd4c15a4b3bda16dc3ddef0faeb68771</a>.</p> Arvados - Idea #16427: "undelete" command to recover trashed blocks and restore a deleted collectionhttps://dev.arvados.org/issues/16427?journal_id=876292020-10-07T02:11:34ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Release</strong> set to <i>25</i></li></ul>