In keep.go → GetKeepServersAndSummarize I think we should return right away if GetKeepServers returns an error. The number of keep disks and replication level distribution won't be correct anyway, and "we'll return this error later" generally seems a bit fragile (it's normal in Go to reuse an existing err variable without worrying about whether it already has a deferred error in it, so we might do that by accident).
Returning empty objects to indicate errors seems a bit fragile too. How about adding a distinct Err field to keep.ServerResponse and collection.ReadCollections? (That way we can report the actual error, rather than "no host info found")
This should let us check if readCollections.Err != nil
instead of if len(readCollections.UUIDToCollection) == 0
...and return the actual error from singlerun instead of treating it as "success because either there's nothing to do or we hit an error and it has already been reported". (Even if there are no collections, it seems correct to continue and do the MaybeWriteData, report block stats, and so on. If we're going to flag "no collections" as a special case that's "probably an accident", perhaps we could handle it by setting the (forthcoming) "dry run" flag and printing a message explaining what's happening.)
In datamanager.go, func makeArvadosClient()
seems to be redundant now; we can delete it and just call arvadosclient.MakeArvadosClient() directly.
In datamanager.go, we should probably call loggerutil.FatalWithMessage instead of log.Fatalf when singlerun fails, so we still propagate the error to the API server logs as well.
In summary/file.go, MaybeWriteData's first return value is ignored by the only code that uses it; it might as well only return err.
In keep_test.go, use net.SplitHostPort(ksURL.Host)
instead of strings.Split(ksURL.Host, ":")
-- that way it will work with IPv6 addresses, too.
In keep_test.go, testGetKeepServersFromAPI could match the error against a testData.errorMatches string field passed by the test function rather than having its own logic about which tests expect which errors. (This should also resolve the question of what TestGetKeepServers_ServerError expects to happen -- it looks like it currently doesn't assert anything.)