Idea #7490
closed[Data Manager] Return errors instead of calling log.Fatal in code that needs to be tested
Description
Currently, parts of Data Manager handle some runtime errors by calling log.Fatal(). This practice makes it unnecessarily hard (impossible?) to test the relevant error functions: log.Fatal() causes the whole test suite to exit, rather than failing the current test.
Any offending bits of Data Manager should be fixed at least to the point where tests can demonstrate Data Manager's behavior when faced a network error during startup or while retrieving collections from the API server.
Updated by Radhika Chippada about 9 years ago
- Status changed from New to In Progress
- Assigned To set to Radhika Chippada
Updated by Radhika Chippada about 9 years ago
- Story points changed from 0.5 to 1.0
Updated by Brett Smith about 9 years ago
- Target version changed from Arvados Future Sprints to 2015-12-02 sprint
Updated by Radhika Chippada about 9 years ago
- Converted most of fatalfs into errors.
- Added several tests with errors in keep/keep_test.go; this test was actually not passing and included earlier and this was also corrected
- Updated datamanager to not continue if not all collections are read. Do let me know if this should not be done or should be done in a better way
if len(readCollections.UUIDToCollection) == 0 { return nil // no collections read so no more work to do? }
- Added several additional failing tests in datamanager_test.go
- I renamed file.go -> writeDataTo as WriteDataTo and collection.go -> heapProfileFilename as HeapProfileFilename so that I could inject errors during datamanager singlerun. I tried to convert these into FlagSet, however due to multiple packages and all, it seemed much too involved and might not be worth the trouble. Of course, this introduces the vulnerability of exposing these two variables outside the package
Updated by Tom Clegg about 9 years ago
In keep.go → GetKeepServersAndSummarize I think we should return right away if GetKeepServers returns an error. The number of keep disks and replication level distribution won't be correct anyway, and "we'll return this error later" generally seems a bit fragile (it's normal in Go to reuse an existing err variable without worrying about whether it already has a deferred error in it, so we might do that by accident).
Returning empty objects to indicate errors seems a bit fragile too. How about adding a distinct Err field to keep.ServerResponse and collection.ReadCollections? (That way we can report the actual error, rather than "no host info found")
This should let us check if readCollections.Err != nil
instead of if len(readCollections.UUIDToCollection) == 0
...and return the actual error from singlerun instead of treating it as "success because either there's nothing to do or we hit an error and it has already been reported". (Even if there are no collections, it seems correct to continue and do the MaybeWriteData, report block stats, and so on. If we're going to flag "no collections" as a special case that's "probably an accident", perhaps we could handle it by setting the (forthcoming) "dry run" flag and printing a message explaining what's happening.)
In datamanager.go, func makeArvadosClient()
seems to be redundant now; we can delete it and just call arvadosclient.MakeArvadosClient() directly.
In datamanager.go, we should probably call loggerutil.FatalWithMessage instead of log.Fatalf when singlerun fails, so we still propagate the error to the API server logs as well.
In summary/file.go, MaybeWriteData's first return value is ignored by the only code that uses it; it might as well only return err.
In keep_test.go, use net.SplitHostPort(ksURL.Host)
instead of strings.Split(ksURL.Host, ":")
-- that way it will work with IPv6 addresses, too.
In keep_test.go, testGetKeepServersFromAPI could match the error against a testData.errorMatches string field passed by the test function rather than having its own logic about which tests expect which errors. (This should also resolve the question of what TestGetKeepServers_ServerError expects to happen -- it looks like it currently doesn't assert anything.)
Updated by Tom Clegg about 9 years ago
Pushed a small fix to error messages (3ed87fe) -- using "%#q" avoids newlines in error messages, which makes it easier to match everything with ErrorMatches.
Everything else LGTM, thanks.
Updated by Radhika Chippada about 9 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:c4023fb9b27ba31926faf530a4e59b9aee0989b3.