Multi-pass mode to reduce keep-balance memory footprint » History » Version 4
Tom Clegg, 01/16/2025 09:08 PM
1 | 1 | Tom Clegg | h1. Multi-pass mode to reduce keep-balance memory footprint |
---|---|---|---|
2 | |||
3 | 2 | Tom Clegg | Background: Currently keep-balance's RAM footprint increases with the number of stored blocks. On a large site, even the largest available machine might not have enough RAM to complete a keep-balance cycle. Scalability will be vastly improved if we can |
4 | # run keep-balance without keeping the entire list of stored blocks in memory at once (addressed below) |
||
5 | # distribute the keep-balance work across multiple system hosts or compute nodes (future work) |
||
6 | 1 | Tom Clegg | |
7 | Proposal: On large clusters, balance in N passes where each pass considers 1/N of the possible block locators -- for example, if N=16, the first pass considers blocks whose locators begin with "0", the next pass "1", etc. For simplicity, N must be a power of 16. |
||
8 | |||
9 | 3 | Tom Clegg | <pre><code class="yaml"> |
10 | Clusters: |
||
11 | xxxxx: |
||
12 | Collections: |
||
13 | # When rebalancing, split the stored blocks into the specified |
||
14 | # number of bins and process one bin at a time. The default (1) |
||
15 | # is suitable for small clusters. Larger numbers (16, 256) are |
||
16 | # needed when the keep-balance host does not have enough RAM to |
||
17 | # hold the entire list of block IDs. |
||
18 | # |
||
19 | # BalanceBins must be a power of 16. |
||
20 | BalanceBins: 1 |
||
21 | </code></pre> |
||
22 | |||
23 | 1 | Tom Clegg | New behaviors relating to trash/pull lists: |
24 | * When starting a new sweep, clear trash lists (no change to existing behavior) |
||
25 | * [keepstore] When posting a new trash/pull list, check @X-Keep-List-Prefix@ header, and don't clear existing entries that have a different prefix |
||
26 | * [keep-balance] When posting a new trash/pull list, set @X-Keep-List-Prefix@ header, so keepstore knows which entries to clear |
||
27 | * [keep-balance] Run a pass for each prefix, then merge resulting statistics to produce full cluster summary |
||
28 | |||
29 | New behaviors relating to setting @replication_confirmed@: |
||
30 | * [rails] add collections column @replication_confirmed_partial@, default null |
||
31 | * [rails] reset @replication_confirmed_partial=null@ when updating a collection (just like existing behavior of @replication_confirmed@) |
||
32 | * [keep-balance] when starting a multi-pass sweep, clear @replication_confirmed_partial@: |
||
33 | ** @update collections set replication_confirmed_partial=NULL@ |
||
34 | * [keep-balance] after each pass (single prefix), set or reduce replication_confirmed_partial: |
||
35 | ** @update collections set replication_confirmed_partial=min($1,coalesce(replication_confirmed_partial,$1)) where portable_data_hash=$2@ |
||
36 | * [keep-balance] after all passes (prefixes) are done, copy replication_confirmed_partial to replication_confirmed: |
||
37 | ** @update collections set replication_confirmed=replication_confirmed_partial, replication_confirmed_at=$1 where replication_confirmed_partial is not NULL@ |
||
38 | |||
39 | 4 | Tom Clegg | Implementation tasks: |
40 | * ##22469 |
||
41 | * ##22470 |
||
42 | * ##22471 |
||
43 | |||
44 | 1 | Tom Clegg | Concurrency: |
45 | * For now, a single keep-balance process will perform N passes serially, then merge results. |
||
46 | 2 | Tom Clegg | * In future, we should allow multiple keep-balance processes on different nodes running passes concurrently. This will require further coordination such that a single "coordinator" process merges statistics produced by "worker" processes and updates @replication_confirmed@ when all workers are finished. Ideally, the workers can be automatically dispatched as containers on cloud/HPC nodes. |