Project

General

Profile

Actions

Bug #7167

closed

[Deployment] Write an efficient Keep migration script

Added by Brett Smith over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
Deployment
Target version:
Story points:
5.0

Description

This is a script aimed at system administrators who are migrating a cluster from one installation to another. It copies Keep data from the old to the new cluster, in a way that's efficient both for the migration itself, and for accessing data on the destination cluster (in other words, blocks live on services early in their rendezvous hash order).

Functional requirements

  • The script dynamically finds all blocks available on the source cluster. This can only be done by getting the "index" from each keepstore on the source side.
  • Get each block from the source cluster exactly once, and write it to the destination cluster, using standard Keep APIs and algorithms (e.g., rendezvous hashing, checksum validation). This can be done with the existing Keep SDKs.
  • Include a checkpointing mechanism so that if the process is interrupted, it has a record of what blocks have already been copied and doesn't re-send them. In the implementation below, the keep block index on the destination side serves as the checkpoint mechanism.
  • When writing a block on the destination side, use the destination cluster's default replication level, as given in the discovery document.
  • The "destination cluster" may just be a series of Keepstores that are being prepped to replace an existing cluster. It must be possible for the administrator to get data copied to that destination without an API server in front of them.
Possible future work (specifically excluded from the requirements here):
  • Determine the desired replication level for each block by reading all collection records from the source cluster, and write to the destination cluster based on that information. (Until then, keep-rsync will use the destination cluster's default replication level, leaving further adjustments to the destination cluster's Data Manager after the database has been migrated.)
  • Verify integrity of blocks that (according to the checkpoint/index data on the destination side) already exist on the destination side. For now, we assume that some other mechanism is responsible for ensuring corrupt blocks aren't listed in keepstore index responses.

Implementation

keep-rsync will be written in Go. Source code will live in source:services/keep-rsync. Debian/RedHat packages, and the binaries they install, will be called keep-rsync.
  • Accepts -src and -dst arguments and reads settings/conf files just like arv-copy.
    • Reads ARVADOS_BLOB_SIGNING_KEY from the settings files in addition to the usual *_HOST, *_HOST_INSECURE, and *_TOKEN entries. The ARVADOS_API_TOKEN entry in each settings file must be the "data manager token" recognized by the relevant Keep servers.
  • Accepts optional -dst-keep-services-json (and -src-keep-services-json for good measure) arguments, giving files whose contents look just like the output of "arv --json keep_services accessible". This will allow the user to control the dst/src Keep services in situations where the relevant API service isn't working/reachable/configured. If not given, let keepclient discover keep services as usual.
  • Accepts a -replication argument (default to whatever is advertised in "destination" discovery doc).
  • Accepts a -prefix argument that passes through to index requests on both sides. This makes it possible to divide the work into (e.g.) 16 asynchronous jobs, one for each hex digit.
  • Gets indexes from the source and destination keepstores.
  • Gets data from source keepstores/keepproxy, stores in destination using configured replication level.
  • Uses regular SDK functions to get and put blocks.
  • Displays progress.
    • "getting indexes: 10... 9... [...]" (count down number of indexes todo)
    • "copying data block 1 of 1234 (0% done, ETA 2m3s): acbd18db4cc2f85cedef654fccc4a4d8+3"

Usage example

How to use in a migration:
  • Turn off data manager on destination cluster.
  • Run keep-rsync.
  • Disable access to source cluster.
  • Dump database and restore to destination cluster.
  • Run keep-rsync again.

Subtasks 3 (0 open3 closed)

Task #7494: Review 7167-blob-sign-sdkResolvedTom Clegg09/30/2015Actions
Task #7526: review 7167-propagate-errorResolvedRadhika Chippada09/30/2015Actions
Task #7414: Review branch 7167-keep-rsync (contains all the work from the test setup branch as well)ResolvedTom Clegg09/30/2015Actions

Related issues

Related to Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstoreResolvedTom Clegg08/28/2015Actions
Related to Arvados - Feature #7240: [Keep] keep-rsync should support "verify-existing" flagClosed09/08/2015Actions
Blocked by Arvados - Feature #7200: [Keep] keepproxy supports "index" APIResolvedRadhika Chippada09/28/2015Actions
Actions

Also available in: Atom PDF