Recovering lost data » History » Version 7
Ward Vandewege, 08/03/2020 02:15 PM
1 | 5 | Tom Morris | h1. Recovering Lost Data |
---|---|---|---|
2 | 1 | Tom Clegg | |
3 | 5 | Tom Morris | h2. Untrashing lost blocks |
4 | |||
5 | 4 | Tom Clegg | In some cases it is possible to recover data blocks that have been trashed by keep-balance (due to a bug like #15148, or an install/config error). |
6 | 1 | Tom Clegg | |
7 | If you suspect blocks have been trashed erroneously, you should immediately: |
||
8 | # On all keepstore servers: set EmptyTrashInterval to a long time like 2400h |
||
9 | # On all keepstore servers: restart keepstore |
||
10 | # Stop the keep-balance service |
||
11 | |||
12 | When you think you have corrected the underlying problem, you should: |
||
13 | 6 | Tom Morris | # Set LostBlocksFile to a suitable value (perhaps "/tmp/keep-balance-lost-blocks.txt") in your keep-balance config (Arvados v1.3.3 or later) |
14 | 1 | Tom Clegg | # Start keep-balance |
15 | |||
16 | After keep-balance completes its first sweep, inspect /tmp/keep-balance-lost-blocks.txt. If it's not empty, you can request all keepstores to untrash any blocks that are still recoverable with a script like this: |
||
17 | |||
18 | <pre><code class="bash"> |
||
19 | #!/bin/bash |
||
20 | set -e |
||
21 | |||
22 | # see Client.AuthToken in /etc/arvados/keep-balance/keep-balance.yml |
||
23 | token=xxxxxxx-your-system-auth-token-xxxxxxx |
||
24 | |||
25 | # all keep server hostnames |
||
26 | hosts=(keep0 keep1 keep2 keep3 keep4 keep5) |
||
27 | |||
28 | 3 | Tom Clegg | while read hash pdhs; do |
29 | 1 | Tom Clegg | echo "${hash}" |
30 | for h in ${hosts[@]}; do |
||
31 | 2 | Nico César | if curl -fgs -H "Authorization: Bearer $token" -X PUT "http://${h}:25107/untrash/$hash"; then |
32 | 1 | Tom Clegg | echo "${hash} ok ${host}" |
33 | fi |
||
34 | done |
||
35 | done < /tmp/keep-balance-lost-blocks.txt |
||
36 | </code></pre> |
||
37 | 5 | Tom Morris | |
38 | Obviously this could be improved upon with increased parallelism for large scale tasks, if needed. Any blocks which were successfully untrashed can be removed from the list of blocks and collections which need to be recovered. |
||
39 | |||
40 | h2. Regenerating missing blocks |
||
41 | |||
42 | For blocks which were trashed long enough that they've been deleted, it's possible to regenerate them by rerunning the workflows which generated them. To do this, the process is: |
||
43 | 6 | Tom Morris | # Delete the affected collections so that job reuse doesn't attempt to reuse them (it's likely that if one block is missing, they all are, so they're unlikely to contain any useful data) |
44 | # Resubmit any container requests for which you want the output collections regenerated |
||
45 | 5 | Tom Morris | |
46 | 7 | Ward Vandewege | There's tool script that can be used to generate a report to help with this task in the Arvados repository at "arvados/tools/keep-xref/keep-xref.py":https://github.com/arvados/arvados/blob/master/tools/keep-xref/keep-xref.py |