Bug #7167

[Deployment] Write an efficient Keep migration script

Added by Brett Smith over 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
Deployment
Target version:
Start date:
09/30/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
5.0

Description

This is a script aimed at system administrators who are migrating a cluster from one installation to another. It copies Keep data from the old to the new cluster, in a way that's efficient both for the migration itself, and for accessing data on the destination cluster (in other words, blocks live on services early in their rendezvous hash order).

Functional requirements

  • The script dynamically finds all blocks available on the source cluster. This can only be done by getting the "index" from each keepstore on the source side.
  • Get each block from the source cluster exactly once, and write it to the destination cluster, using standard Keep APIs and algorithms (e.g., rendezvous hashing, checksum validation). This can be done with the existing Keep SDKs.
  • Include a checkpointing mechanism so that if the process is interrupted, it has a record of what blocks have already been copied and doesn't re-send them. In the implementation below, the keep block index on the destination side serves as the checkpoint mechanism.
  • When writing a block on the destination side, use the destination cluster's default replication level, as given in the discovery document.
  • The "destination cluster" may just be a series of Keepstores that are being prepped to replace an existing cluster. It must be possible for the administrator to get data copied to that destination without an API server in front of them.
Possible future work (specifically excluded from the requirements here):
  • Determine the desired replication level for each block by reading all collection records from the source cluster, and write to the destination cluster based on that information. (Until then, keep-rsync will use the destination cluster's default replication level, leaving further adjustments to the destination cluster's Data Manager after the database has been migrated.)
  • Verify integrity of blocks that (according to the checkpoint/index data on the destination side) already exist on the destination side. For now, we assume that some other mechanism is responsible for ensuring corrupt blocks aren't listed in keepstore index responses.

Implementation

keep-rsync will be written in Go. Source code will live in source:services/keep-rsync. Debian/RedHat packages, and the binaries they install, will be called keep-rsync.
  • Accepts -src and -dst arguments and reads settings/conf files just like arv-copy.
    • Reads ARVADOS_BLOB_SIGNING_KEY from the settings files in addition to the usual *_HOST, *_HOST_INSECURE, and *_TOKEN entries. The ARVADOS_API_TOKEN entry in each settings file must be the "data manager token" recognized by the relevant Keep servers.
  • Accepts optional -dst-keep-services-json (and -src-keep-services-json for good measure) arguments, giving files whose contents look just like the output of "arv --json keep_services accessible". This will allow the user to control the dst/src Keep services in situations where the relevant API service isn't working/reachable/configured. If not given, let keepclient discover keep services as usual.
  • Accepts a -replication argument (default to whatever is advertised in "destination" discovery doc).
  • Accepts a -prefix argument that passes through to index requests on both sides. This makes it possible to divide the work into (e.g.) 16 asynchronous jobs, one for each hex digit.
  • Gets indexes from the source and destination keepstores.
  • Gets data from source keepstores/keepproxy, stores in destination using configured replication level.
  • Uses regular SDK functions to get and put blocks.
  • Displays progress.
    • "getting indexes: 10... 9... [...]" (count down number of indexes todo)
    • "copying data block 1 of 1234 (0% done, ETA 2m3s): acbd18db4cc2f85cedef654fccc4a4d8+3"

Usage example

How to use in a migration:
  • Turn off data manager on destination cluster.
  • Run keep-rsync.
  • Disable access to source cluster.
  • Dump database and restore to destination cluster.
  • Run keep-rsync again.

Subtasks

Task #7494: Review 7167-blob-sign-sdkResolvedTom Clegg

Task #7526: review 7167-propagate-errorResolvedRadhika Chippada

Task #7414: Review branch 7167-keep-rsync (contains all the work from the test setup branch as well)ResolvedTom Clegg


Related issues

Related to Arvados - Feature #7159: [Keep] Implement an Azure blob storage volume in keepstoreResolved08/28/2015

Related to Arvados - Feature #7240: [Keep] keep-rsync should support "verify-existing" flagClosed09/08/2015

Blocked by Arvados - Feature #7200: [Keep] keepproxy supports "index" APIResolved09/28/2015

Associated revisions

Revision e9a7e33f
Added by Tom Clegg about 5 years ago

Merge branch '7167-blob-sign-sdk' refs #7167

Revision 7072110d
Added by Tom Clegg about 5 years ago

Merge branch '7167-propagate-error' refs #7167

Revision e88a0a3f
Added by Radhika Chippada about 5 years ago

closes #7167
Merge branch '7167-keep-rsync'

Revision b67e5990
Added by Radhika Chippada about 5 years ago

refs #7167
Merge branch '7167-keep-rsync-test-setup'

Revision b67e5990
Added by Radhika Chippada about 5 years ago

refs #7167
Merge branch '7167-keep-rsync-test-setup'

Revision 680d70e6
Added by Radhika Chippada about 5 years ago

refs #7167
Merge branch '7167-keep-rsync'

Revision 036656d9
Added by Radhika Chippada about 5 years ago

refs #7167
Merge branch '7167-keep-rsync'

Revision d954b409 (diff)
Added by Radhika Chippada about 5 years ago

refs #7167
Add a log statement to see why the test failed intermittently.

History

#1 Updated by Brett Smith about 5 years ago

Notes from Ward: Optimize for the following, in order:

  • Development time
  • Egress bandwidth (this is what we pay the most for)
  • Reliability
  • Scalability (e.g., you can run more than one at a time to decrease total runtime)

#2 Updated by Brett Smith about 5 years ago

  • Target version changed from Arvados Future Sprints to 2015-09-30 sprint

#3 Updated by Peter Amstutz about 5 years ago

What's the rational for having this "pull" from the destination instead of "push" from the source? The issues of fetching the index and potential security question of having a copy of the data manager key and blob signing key offsite both go away if the data is "pushed" from the source cluster to the target cluster.

#4 Updated by Tom Clegg about 5 years ago

  • Description updated (diff)
  • Story points set to 5.0

#5 Updated by Tom Clegg about 5 years ago

  • Description updated (diff)

#6 Updated by Tom Clegg about 5 years ago

Peter Amstutz wrote:

What's the rational for having this "pull" from the destination instead of "push" from the source? The issues of fetching the index and potential security question of having a copy of the data manager key and blob signing key offsite both go away if the data is "pushed" from the source cluster to the target cluster.

As specified this can be used as either a "push" or a "pull": functionally, it doesn't matter whether you run it on the source cluster, the destination cluster, or somewhere else entirely, as long as you can access Keep (either keepproxy or the individual keepstore nodes) at both ends, and your bandwidth to both ends is compatible with your patience/deadlines.

If the destination index can't be fetched, keep-rsync would have to maintain its own independent "checkpoint state" information. A checkpoint state other than the actual destination index would not be nearly as convincing as an assurance that all blocks have been copied. For example, it would be left as a manual exercise to determine whether Data Manager on the destination side had garbage-collected some of the data that (according to the checkpoint) had already been copied.

It isn't the scenario at hand, but yes, it is possible to imagine a usage scenario where source and destination clusters don't trust one another enough for either one to see the other's data manager key, but do want to copy data wholesale. To support cases like this, we could do a couple of things:
  • split the "data manager key" in two: one key for getting index, one for deleting blocks (this might be a good idea anyway). It would border on silly for the source cluster to refuse to give out its "get index" key even though it's agreeing to send all of its data, given how easy it is to change data manager keys after the copy finishes.
  • add a "maintain own destination index" feature -- essentially a checkpoint file that can be populated from a keepstore index and updated on the fly. With the right set of features, this would be very flexible: the sysadmin could decide when to copy the destination index out-of-band, when to have keep-rsync update it by adding blocks as they're copied, and when to have keep-rsync update it by using the keepstore "index" API on the destination side.

#7 Updated by Brett Smith about 5 years ago

  • Target version changed from 2015-09-30 sprint to Arvados Future Sprints

#8 Updated by Brett Smith about 5 years ago

  • Description updated (diff)

#9 Updated by Brett Smith about 5 years ago

  • Target version changed from Arvados Future Sprints to 2015-10-14 sprint

#10 Updated by Tom Clegg about 5 years ago

  • Description updated (diff)

#11 Updated by Radhika Chippada about 5 years ago

  • Assigned To set to Radhika Chippada

#12 Updated by Radhika Chippada about 5 years ago

  • Status changed from New to In Progress

#13 Updated by Tom Clegg about 5 years ago

  • Description updated (diff)

#14 Updated by Tom Clegg about 5 years ago

  • Description updated (diff)

#15 Updated by Tom Clegg about 5 years ago

  • Description updated (diff)

#16 Updated by Tom Clegg about 5 years ago

  • Description updated (diff)

#17 Updated by Radhika Chippada about 5 years ago

commit f3b5ffc2a4409d4c1b676e40ba4e582ca2beeb61

Enhanced our testing environment to allow creating a second set of keepservers to facilitate keep-rsync testing.

#18 Updated by Radhika Chippada about 5 years ago

Review comments for 7167-blob-sign-sdk branch:

  • keepclient/perms.go
    • glint suggests adding comments for ErrSignatureExpired etc (all 4 of the errors defined)
    • Comment on VerifySignature says “ExpiredError” and “PermissionError”. It would be helpful to either use the same names as the defined errors or something like “signature expired error” (plain text than a non-existing error definition name) …
    • Would you hate too much the names expiryHex, expiryTime (instead of expHex, expTime) and signatureHex (in place of sigHex)?
  • keepclient/perms_test.go is failing due to error definition renames; "undefined: PermissionError" and "undefined: ExpiredError"
  • keepstore/permission_tests.go: since there is no other tests “without” a permission signature, I don’t see why “TestSignLocatorUsesPermissionSecret” can’t simply be “TestSignLocator” (same with the other test). If you prefer to use such a name, can we call it “TestSignLocator’With/Using’PermissionSecret”?
  • While at it, may be you can address the golint complaints you newly introduced in keepstore/azure_blob_volume.go?

#19 Updated by Tom Clegg about 5 years ago

Radhika Chippada wrote:

Review comments for 7167-blob-sign-sdk branch:

  • keepclient/perms.go
    • glint suggests adding comments for ErrSignatureExpired etc (all 4 of the errors defined)

Fixed (and merged "Malformed" into "Invalid")

  • Comment on VerifySignature says “ExpiredError” and “PermissionError”. It would be helpful to either use the same names as the defined errors or something like “signature expired error” (plain text than a non-existing error definition name) …

Fixed

  • Would you hate too much the names expiryHex, expiryTime (instead of expHex, expTime) and signatureHex (in place of sigHex)?

Fixed

  • keepclient/perms_test.go is failing due to error definition renames; "undefined: PermissionError" and "undefined: ExpiredError"

Fixed (whoops, only ran keepstore tests, not sdk tests!)

  • keepstore/permission_tests.go: since there is no other tests “without” a permission signature, I don’t see why “TestSignLocatorUsesPermissionSecret” can’t simply be “TestSignLocator” (same with the other test). If you prefer to use such a name, can we call it “TestSignLocator’With/Using’PermissionSecret”?

Fixed. (I was trying to say the only thing tested here that isn't already covered in the SDK tests is that PermissionSecret is actually being used.)

  • While at it, may be you can address the golint complaints you newly introduced in keepstore/azure_blob_volume.go?

Added this to #7159, since it's still open.

#20 Updated by Radhika Chippada about 5 years ago

Branch 7167-blob-sign-sdk lgtm. Thanks.

#21 Updated by Tom Clegg about 5 years ago

SDK

This "config struct" idea seemed good when we talked about it before, but now that I'm looking at it, it seems redundant. Is there an advantage to this:
  • 
    client := arvadosclient.New(arvadosclient.APIConfig{
      APIToken: "foo",
      APIHost: "bar",
    })
    
  • ...over this:
  • 
    client := arvadosclient.ArvadosClient{
      APIToken: "foo",
      APIHost: "bar",
    }
    
  • IOW, it seems like the only real purpose of New() is to initialize the Client field. The Go stdlibs seems to use "leave Client nil to use the default", so that would be ideal. For this case I don't think we actually need to deal with that, though: we can move "readConfigFromFile" to arvadosclient.NewFromFile(filename) and initialize Client there. I suppose this means adding a BlobSigningKey field to the ArvadosClient struct, which seems like a reasonable change even though most clients will just leave it empty / not use it. (Conveniently, this should fix the bug that keep-rsync uses the same blobSigningKey global for both src and dst.)

Config file reader should use strings.SplitN(line, "=", 2), and use TrimSpace() on kv[0] and kv[1]

Config file reader should accept "foo" (a config file containing no "/") to mean "read config from $HOME/.config/arvados/foo.conf", like arv-copy does.

"initializeKeepRsync" looks like it should be called "setupKeepClients", and it looks like we could fix the use of globals something like this:
  • setupKeepClients returns two *keepclient.KeepClient and an error
  • instead of tracking an arvSrc variable separately from kcSrc, just use kcSrc.Arvados when you need it

run_test_server

--keep-enforce-permissions should be action='store_true' instead of type=str

enforce_permissions should not become a global. Just pass run_keep(enforce_permissions=args.keep_enforce_permissions).

The "--keep-existing" flag looks like trouble. It seems like a circuitous way to satisfy a rather specific and unusual use case: "start two keepstore processes that the API server doesn't know about, and one that it does know about." For the sake of getting keep-rsync working I can live with it. But I think it would be better to start keepstore procs directly from arvadostest without using the Python stuff at all, and do most of the rsync tests by specifying srcKeepServicesJSON and dstKeepServicesJSON (i.e., without contacting an API server at all).

keep-rsync

copyBlocksToDst should take kcSrc and kcDst as arguments, rather than using globals. (Same goes for the other new globals -- seems like they can all be passed in to the relevant functions as arguments.)

Combine these into one line ("Getting block %d of %d: %v"?):

+               log.Printf("Getting block %d of %d", done+1, total)
+
+               log.Printf("Getting block: %v", locator)

In getUniqueLocators, how about:

-               if _, ok := uniqueLocators[locator]; !ok {
-                       uniqueLocators[locator] = true
-               }
+               uniqueLocators[locator] = true

Also in getUniqueLocators, instead of building one long index and then splitting it, how about something like:

for uuid := range kc.LocalRoots() {
  reader, err := kc.GetIndex(uuid, indexPrefix)
  scanner := bufio.NewScanner(reader)
  for scanner.Scan() {
    uniqueLocators[strings.Split(scanner.Text(), " ")[0]] = true
  }
}

main() already prints the error returned by performKeepRsync(), so it shouldn't be necessary to log errors from copyBlocksToDst() as well. For example, how about:

-                       log.Printf("Error putting block data: %q %v", locator, err)
-                       return err
+                       return fmt.Errorf("Write error: %q %v", locator, err)

The log message "copying block" should say "Writing block %d of %d: %v" (pick one of %q and %v and use for both "reading" and "writing" messages)

keep-rsync tests

setupTestData has three copies of what looks like a verbose way to do something like this:
  • 
    hash := kcDst.PutB([]byte(fmt.Sprintf("test-data-%d", i)))
    _, _, err := kcDst.Ask(getLocator)
    c.Assert(err, IsNil)
    dstLocators = append(dstLocators, hash) // hash will be already be signed if necessary
    
  • I don't think the extra assertions belong here. We're not testing keepclient, we're just setting up our keep-rsync test.

c.Check(localRoots != nil, Equals, true) should be c.Check(localRoots, Not(IsNil))

#22 Updated by Radhika Chippada about 5 years ago

SDK

This "config struct" idea seemed good when we talked about it before, ...

Done

we can move "readConfigFromFile" to arvadosclient.NewFromFile(filename) and initialize Client there

With all the changes in (the one above and the one below), I did not see the need to move this and change / rearrange / add additional tests. Hence, I left this here. May be we can move this when we have another use case for this?

(Conveniently, this should fix the bug that keep-rsync uses the same blobSigningKey global for both src and dst.)

Fixed this

Config file reader should use strings.SplitN(line, "=", 2), and use TrimSpace() on kv0 and kv1

Done

Config file reader should accept "foo" (a config file containing no "/") to mean "read config from $HOME/.config/arvados/foo.conf", like arv-copy does.

Done

"initializeKeepRsync" looks like it should be called "setupKeepClients",

Done

and it looks like we could fix the use of globals something like this:

Done and got rid off all the globals

run_test_server

--keep-enforce-permissions should be action='store_true' instead of type=str

Done

enforce_permissions should not become a global. Just pass run_keep(enforce_permissions=args.keep_enforce_permissions).

Done

The "--keep-existing" flag looks like trouble ...

Removed this and replaced with "num-keep-servers". All the 3 keep servers are created at once now and the first two are used as src and the third is used as dst

keep-rsync

copyBlocksToDst should take kcSrc and kcDst as arguments,

Done. Removed all globals

Combine these into one line ("Getting block %d of %d: %v"?):

Done

In getUniqueLocators, how about:

Done

Also in getUniqueLocators, instead of building one long index and then splitting it, how about something like:

Done

main() already prints the error returned by performKeepRsync(), so it shouldn't be necessary to log errors from copyBlocksToDst() as well. For example, how about:

Done

The log message "copying block" should say "Writing block %d of %d: %v" (pick one of %q and %v and use for both "reading" and "writing" messages)

Done

keep-rsync tests

setupTestData has three copies of what looks like a verbose way to do something like this:

Done. Also, since PutB also return err, I added the check on PutB and did not see the need to do an Ask

c.Check(localRoots != nil, Equals, true) should be c.Check(localRoots, Not(IsNil))

Done. NotNil is available.

#23 Updated by Tom Clegg about 5 years ago

I can live with using APIConfig this way in keep-rsync for now, but let's move it from SDK to keep-rsync, unexport it, and get rid of the redundant "consists of" comment.

Rename DiscoverKeepServersFromJSON to LoadKeepServicesFromJSON

MakeKeepClientFromJSON doesn't really seem necessary: a caller can do MakeKeepClient() (and ignore any error) followed by LoadKeepServicesFromJSON(). How about func New(arv *arvadosclient.ArvadosClient) *KeepClient? This would be a step toward what I think would be a more reasonable API:
  • // always:
    kc := keepclient.New()
    // optional:
    kc.LoadKeepServicesFromJSON(...)
    // discover keep services automatically on first use if you didn't Load or Discover them yourself:
    kc.Get(foo)
    
  • For now, "load or discover" will be a mandatory step when using the New() method. When we make the "automatic" way work, new code will be a bit easier to write, and existing code will continue to work.

MakeKeepClient (and any new variants) should initialize its "insecure" flag from arv.ApiInsecure instead of an env var.

In run_test_server.py, stop_keep() should also accept a num_servers argument.

In run_test_server.py, instead of the "if args.num_keep_servers is not None:" block, get argparse to do it -- say parser.add_argument('--num-keep-servers', default=2, ...), and pass args.num_keep_servers directly to run_keep(). (This should also let you remove "default is 2 keep servers" from the help text because argparse will already know/show what the default is.)

loadConfig, readConfigFromFile, and setupKeepClients have two copies of code, one for src and one for dst. They should each just handle a single client/config, and main() should call loadConfig and setupKeepClient twice.
  • -       srcConfig, dstConfig, srcBlobSigningKey, _, err := loadConfig(srcConfigFile, dstConfigFile)
    +       srcConfig, srcBlobSigningKey, _, err := loadConfig(srcConfigFile)
    +       if err != nil { ... }
    +       dstConfig, _, _, err := loadConfig(dstConfigFile)
    

Help text for -replications should say that 0 means "use default replication level configured on destination server".

If -replications isn't provided and there's an error loading the destination discovery doc, return an error instead of using 2.

#24 Updated by Radhika Chippada about 5 years ago

let's move it from SDK to keep-rsync, unexport it, and get rid of the redundant "consists of" comment.

Done

Rename DiscoverKeepServersFromJSON to LoadKeepServicesFromJSON

Done

MakeKeepClientFromJSON doesn't really seem necessary ... For now, "load or discover" will be a mandatory step when using the New() method

Done

MakeKeepClient (and any new variants) should initialize its "insecure" flag from arv.ApiInsecure instead of an env var.

Done

In run_test_server.py, stop_keep() should also accept a num_servers argument.

Done

In run_test_server.py, ... default=2 ...

Done

loadConfig, readConfigFromFile, and setupKeepClients have two copies of code ...

Done. readConfigFromFile was already reused

Help text for -replications should say that 0 means "use default replication level configured on destination server".

Done

If -replications isn't provided and there's an error loading the destination discovery doc, return an error instead of using 2.

Done

#25 Updated by Tom Clegg about 5 years ago

The TestRsyncPutInOne_GetFromOtherShouldFail and TestRsyncWithBlobSigning_PutInOne_GetFromOtherShouldFail tests
  • should each be about two lines long -- just setupRsync(c,whatever,true,1) and then call a testNoCrosstalk helper twice, once for (src,dst) and once for (dst,src).
  • don't need to check whether blobSigningKey is empty (SignLocator is a no-op if the signing key is empty)
  • shouldn't do all those extra checks that are really just testing keepclient and keepstore -- testNoCrosstalk() probably just needs something like
  • dstLocator, _, err := kcDst.PutB([]byte("srcdata"))
    c.Assert(err, Equals, nil)
    dstLocator = strings.Split(dstLocator, "+A")[0]
    _, _, _, err := kcSrc.Get(keepclient.SignLocator(dstLocator, ...))
    c.Assert(err, NotNil)
    c.Check(err.Error(), Equals, "Block not found")
    

Make dstIndex accessible to the individual test cases (like dstLocators etc) so TestKeepRsync_WithNoSuchPrefixInSrc can check len(dstIndex) len(dstLocators), and TestKeepRsync_WithPrefix can check len(dstIndex) > len(dstLocators).

arvadostest.StartAPI() should move from setupRsync() to SetUpSuite() (we can leave the same one running throughout the tests, right?)

arvadostest.StopKeep() should move from TearDownSuite() to TearDownTest() (since we StartKeep in each test)

The setupRsync arguments updateDstReplications bool, replications int seem to be redundant. How about just pass zero as replications when testing "discovery doc default" cases, and drop updateDstReplications?

setupRsync calls setupKeepClients with a replications arg, but then it overwrites kcDst.Want_replicas by itself. Instead of overwriting, it should check kcDst.Want_replicas replications (unless replications==0), right? (IOW, the checks done in TestInitializeRsyncReplicationsCount and TestInitializeRsyncDefaultReplicationsCount should be done right in setupRsync itself instead.)

The first if prefix "" { case in testKeepRsync looks unnecessary: the "else" case is correct whether or not prefix "".

The "else" side of the second if prefix == "" { case should comment that the magic number 2 depends on the coincidence that len(dstLocators) is 2 and len(dstLocatorsMatchingPrefix) is 0 (or would be if we computed it) for all of the prefixes that we have tests for.

Probably defer: It looks like we aren't testing whether we waste time copying blocks that already exist in dst. Perhaps we could backdate the "dst" blocks before running rsync, then filter dstIndex by mtime, and make sure none of dstLocators were touched during the rsync.

After 1 block is done, the progress display should show ETA based on time elapsed and number of blocks done/todo. (We could go further and inspect sizes, but it won't make much difference to the outcome: either there are lots of blocks, in which case the time taken for an average "done" block is a good prediction of an average "todo" block, or there aren't lots of blocks, in which case ETA is small and probably doesn't matter that much.)

Instead of ioutil.ReadAll() followed by kcDst.PutB(), how about:


reader, size, _, err := kcSrc.Get(getLocator)
// [check err]
_, _, err = kcDst.PutHR(getLocator[:32], reader, size)
// [check err]

#26 Updated by Tom Clegg about 5 years ago

Some tests like TestLoadConfig don't actually use servers, so they shouldn't be part of ServerRequiredSuite (this should save a bunch of time starting and stopping Keep servers via Python wrapper).

We should log total #blocks on src and dst before starting to copy data (otherwise it's hard to see the difference between "not much to do because not much src data found" and "not much to do because dst is already nearly complete").

This should have + ".conf" at the end:
  • filename = os.Getenv("HOME") + "/.config/arvados/" + filename
Update help message for -src and -dst. Maybe just copy the arv-copy help messages:
  • "The name of the source Arvados instance (required) - points at an Arvados config file. May be either a pathname to a config file, or (for example) "foo" as shorthand for $HOME/.config/arvados/foo.conf"

#27 Updated by Radhika Chippada about 5 years ago

The TestRsyncPutInOne_GetFromOtherShouldFail and TestRsyncWithBlobSigning_PutInOne_GetFromOtherShouldFail tests

Updated

Make dstIndex accessible to the individual test cases

Done

arvadostest.StartAPI() should move from setupRsync() to SetUpSuite()

Done

arvadostest.StopKeep() should move from TearDownSuite() to TearDownTest()

Done (this was unintentional and now corrected)

The setupRsync arguments updateDstReplications bool, replications int seem to be redundant.

Updated

setupRsync calls setupKeepClients with a replications arg, but then it overwrites ...

Improved this

The first if prefix "" { case in testKeepRsync looks unnecessary: ...

Done

The "else" side of the second if prefix == "" { case should comment that ...

Updated comment

After 1 block is done, the progress display should show ETA ...

Done. The acronym ETA does not seem correct in this context. May be we should say, "estimated time remaining ..."?

Instead of ioutil.ReadAll() followed by kcDst.PutB(), how about ...

Done

Some tests like TestLoadConfig don't actually use servers ...

Added ServerNotRequiredSuite

We should log total #blocks on src and dst before starting to copy data

Done

This should have + ".conf" at the end: filename = os.Getenv("HOME") + "/.config/arvados/" + filename

Done

Update help message for -src and -dst. Maybe just copy the arv-copy help messages:

Updated comment

#28 Updated by Tom Clegg about 5 years ago

I've pushed a couple of small changes:
  • change conf args to -src an -dst to match spec and arv-copy
  • update ETA logging (remove redundant "copying block X of X", compute ETA based on average time for all blocks done so far, not just the very first block)
  • in arvadostest.go, always pass --num-keep-servers to python wrapper, and update comments

The rest LGTM, thanks. I'm waiting for the full test suite to finish. If the above changes look good & pass tests for you, please merge. Thanks!

#29 Updated by Radhika Chippada about 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 67 to 100

Applied in changeset arvados|commit:e88a0a3ff00fd5448f893909d08ee11dc301776e.

#30 Updated by Radhika Chippada about 5 years ago

(Even though I have not reopened this ticket "yet"), I was unhappy with the fact that a big chunk of the code in the keep-rsync -> main method was not tested. The main func had all the arg parsing code that had not been tested. Unfortunately, this is the problem with all our go code where we have a main func. So, I researched a little, after discussing with Tom, and:

  • extended the keep-rsync main func. All the code from main is now moved into a separate doMain func.
  • doMain func returns error on any errors and main does the usual Fatalf on any errors.
  • Args are parsed using flag.FlagSet rather than the default flag. This is needed if we need to set flags several time from different tests.
  • Another set of tests are added to pass various combination of good and bad flags. Also added a test that uses good config files and does end-to-end keep-rsync test (though this does not actually do any copying to dst, it tests a lot of the other code).
  • Happy to report that we can now test other go code with main func using this code as an example
  • Also, tested the script manually at command prompt using the command:
 go run keep-rsync.go -src test -replications x -dst test 
  • commit a9e08c498ac8126026d1226f34913f87c8e590a0 removes StartKeepWithParams and StopKeepWithParams funcs and exposes only StartKeep and StopKeep with the parameters. All the earlier references are updated.

Thanks.

Also available in: Atom PDF