Project

General

Profile

Multi-pass mode to reduce keep-balance memory footprint » History » Version 3

Tom Clegg, 01/16/2025 04:29 PM

1 1 Tom Clegg
h1. Multi-pass mode to reduce keep-balance memory footprint
2
3 2 Tom Clegg
Background: Currently keep-balance's RAM footprint increases with the number of stored blocks. On a large site, even the largest available machine might not have enough RAM to complete a keep-balance cycle. Scalability will be vastly improved if we can
4
# run keep-balance without keeping the entire list of stored blocks in memory at once (addressed below)
5
# distribute the keep-balance work across multiple system hosts or compute nodes (future work)
6 1 Tom Clegg
7
Proposal: On large clusters, balance in N passes where each pass considers 1/N of the possible block locators -- for example, if N=16, the first pass considers blocks whose locators begin with "0", the next pass "1", etc. For simplicity, N must be a power of 16.
8
9 3 Tom Clegg
<pre><code class="yaml">
10
Clusters:
11
  xxxxx:
12
    Collections:
13
      # When rebalancing, split the stored blocks into the specified
14
      # number of bins and process one bin at a time. The default (1)
15
      # is suitable for small clusters. Larger numbers (16, 256) are
16
      # needed when the keep-balance host does not have enough RAM to
17
      # hold the entire list of block IDs.
18
      #
19
      # BalanceBins must be a power of 16.
20
      BalanceBins: 1
21
</code></pre>
22
23 1 Tom Clegg
New behaviors relating to trash/pull lists:
24
* When starting a new sweep, clear trash lists (no change to existing behavior)
25
* [keepstore] When posting a new trash/pull list, check @X-Keep-List-Prefix@ header, and don't clear existing entries that have a different prefix
26
* [keep-balance] When posting a new trash/pull list, set @X-Keep-List-Prefix@ header, so keepstore knows which entries to clear
27
* [keep-balance] Run a pass for each prefix, then merge resulting statistics to produce full cluster summary
28
29
New behaviors relating to setting @replication_confirmed@:
30
* [rails] add collections column @replication_confirmed_partial@, default null
31
* [rails] reset @replication_confirmed_partial=null@ when updating a collection (just like existing behavior of @replication_confirmed@)
32
* [keep-balance] when starting a multi-pass sweep, clear @replication_confirmed_partial@:
33
** @update collections set replication_confirmed_partial=NULL@
34
* [keep-balance] after each pass (single prefix), set or reduce replication_confirmed_partial:
35
** @update collections set replication_confirmed_partial=min($1,coalesce(replication_confirmed_partial,$1)) where portable_data_hash=$2@
36
* [keep-balance] after all passes (prefixes) are done, copy replication_confirmed_partial to replication_confirmed:
37
** @update collections set replication_confirmed=replication_confirmed_partial, replication_confirmed_at=$1 where replication_confirmed_partial is not NULL@
38
39
Concurrency:
40
* For now, a single keep-balance process will perform N passes serially, then merge results.
41 2 Tom Clegg
* In future, we should allow multiple keep-balance processes on different nodes running passes concurrently. This will require further coordination such that a single "coordinator" process merges statistics produced by "worker" processes and updates @replication_confirmed@ when all workers are finished. Ideally, the workers can be automatically dispatched as containers on cloud/HPC nodes.