Project

General

Profile

Actions

Idea #2853

closed

[Keep] Use rendezvous hashing to assign blocks to servers in Go and Python SDKs

Added by Peter Amstutz almost 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
11/11/2014
Due date:
Story points:
2.0

Description

See: http://en.wikipedia.org/wiki/Rendezvous_hashing

Some thoughts here from earlier technical discussions:

  • Rendezvous hashing makes it harder to perform weighted block distribution (we may be able to get around this by adding virtual storage nodes). Currently we do not expect to want weighted block distribution. Rather, we want distribution to be equal in steady state, and we want to minimize excess probes during rebalancing.
  • Computing N hashes per block transaction (one for each keep node or virtual node) seems acceptable as long as N isn't huge; the more precise you want your weighting to be, the more hashes you have to compute. Using a fast hash function would be helpful here.
  • If the map of storage nodes is tree structured, you can get O(log n) performance, e.g. compare group1 to group2, then follow the branch and compare group3 to group4, etc. However, this makes it impossible to add an arbitrary number of storage nodes with equal weight.

Subtasks 4 (0 open4 closed)

Task #4492: Fix tests that expect old probe orderResolvedTom Clegg11/11/2014Actions
Task #4491: Replace shuffling in Go SDKResolvedTom Clegg11/11/2014Actions
Task #4490: Replace shuffling in Python SDKResolvedTom Clegg11/11/2014Actions
Task #4495: Review 2853-rendezvousResolvedTom Clegg11/11/2014Actions
Actions #1

Updated by Tom Clegg almost 10 years ago

  • Target version set to Arvados Future Sprints
Actions #2

Updated by Tim Pierce almost 10 years ago

  • Category set to Keep
Actions #3

Updated by Tim Pierce over 9 years ago

  • Description updated (diff)
Actions #4

Updated by Tom Clegg over 9 years ago

  • Subject changed from Keep uses consistent hashing to chose which servers get which blocks in a stable way to [Keep] Use rendezvous hashing to assign blocks to servers
  • Story points set to 1.0
Actions #5

Updated by Tom Clegg over 9 years ago

  • Subject changed from [Keep] Use rendezvous hashing to assign blocks to servers to [DRAFT] [Keep] Use rendezvous hashing to assign blocks to servers in Go and Python SDKs
Actions #6

Updated by Tom Clegg over 9 years ago

  • Story points changed from 1.0 to 2.0
Actions #7

Updated by Tom Clegg over 9 years ago

  • Description updated (diff)
Actions #8

Updated by Tom Clegg over 9 years ago

  • Target version changed from Arvados Future Sprints to 2014-11-19 sprint
Actions #9

Updated by Tom Clegg over 9 years ago

  • Subject changed from [DRAFT] [Keep] Use rendezvous hashing to assign blocks to servers in Go and Python SDKs to [Keep] Use rendezvous hashing to assign blocks to servers in Go and Python SDKs
Actions #10

Updated by Tom Clegg over 9 years ago

  • Assigned To set to Tom Clegg
Actions #11

Updated by Tom Clegg over 9 years ago

2853-rendezvous @ c7b7e4d

Notes
  • Chose MD5. Faster hash functions are out there (e.g., blake2) but it seems much more important to pick something that every client/language has built-in or readily available.
  • Added a "reference set" of probe orders using 4 easy-to-read data blocks (consisting mostly of ascii zeroes) and 16 services with contrived names.
  • Added a test for the desired rebalancing behavior: when starting with N storage nodes and adding D more, no block turns out to be further than D places away from its ideal probe order, and the average block moves about D/N places away from its ideal probe order.
  • Only the last 15 characters of the service UUID are used for sorting. (This makes it possible to clone an entire instance, which changes the UUID prefix, without doing any rebalancing.)
Didn't:
  • Benchmark
  • Test that distribution is uniform (although the rebalancing test is pretty convincing)
Actions #12

Updated by Tom Clegg over 9 years ago

  • Status changed from New to In Progress
Actions #13

Updated by Brett Smith over 9 years ago

Compatibility

  • Go's sort is not guaranteed to be stable. Python's is. Does this matter? It will only come up in the admittedly very unlikely case of a hash collision, and then it will just mean at most one excess request, so I can easily believe the answer is "no," but I wanted to flag the issue for discussion. On the flip side, fixing it is as easy as changing sort.Sort to sort.Stable.

Go

  • Can we use Md5String in RootSorter.getWeight?
  • TestShuffleServiceRoots seems obsoleted by root_sorter_test.go.
  • The reference to ../../python/arvados/keep.py in root_sorter_test.go should be ../../python/tests/test_keep_client.py.
  • This is probably a noob question, but why did you increase the size of the stub channels in many of the existing tests?

Python

  • Please don't use hash as an argument/variable name. It masks Python's built-in hash function (which makes funny syntax highlighting).
  • KeepClientServiceTestCase also has code to present a mock service list to the KeepClient, and test with it. Would it make sense to merge this and KeepRendezvousWeightTestCase somehow, to reduce code duplication and mucking with KeepClient's internal structures?
  • I've kind of idly wondered before about compatibility issues around stuff that we never advertised as part of the API, but was discoverable and looked public, like KeepClient.service_roots and KeepClient.shuffled_service_roots. I think I'm okay with renaming them if you are, but I guess I wanted to take the opportunity to flag the longer-term issue here. (I'm definitely glad that we're marking private variables and methods more consistently.)
Actions #14

Updated by Tom Clegg over 9 years ago

Brett Smith wrote:

  • Go's sort is not guaranteed to be stable. Python's is. Does this matter? It will only come up in the admittedly very unlikely case of a hash collision, and then it will just mean at most one excess request, so I can easily believe the answer is "no," but I wanted to flag the issue for discussion. On the flip side, fixing it is as easy as changing sort.Sort to sort.Stable.

I don't think even a stable sort would fully address this, since we aren't careful (in various places) to keep the incoming list of services sorted in the first place. We could use the service uuid as a secondary sort key, of course. But more importantly, I'd say "undefined order" is perfectly acceptable here: it's an extremely rare event, and even when it happens the expected performance impact is very small. Agreed?

  • Can we use Md5String in RootSorter.getWeight?

Yes! Fixed.

  • TestShuffleServiceRoots seems obsoleted by root_sorter_test.go.

Indeed. Deleted.

  • The reference to ../../python/arvados/keep.py in root_sorter_test.go should be ../../python/tests/test_keep_client.py.

Whoops. Fixed.

  • This is probably a noob question, but why did you increase the size of the stub channels in many of the existing tests?

If you write two "I did a failure" messages to a channel with buffer size 1, the second write blocks until someone reads the first message. I found that many of the tests had a small enough channel size that (1) they depended on having no more than X failures before a success where X was determined by "whatever happened when I wrote the test" rather than a maximum number of failures we actually want to protect, and (2) when that dependency wasn't met, they would fail by hanging in deadlock instead of failing an assertion. This was really annoying to debug when I first ran into it, so I thought it would be best to get rid of the probe order sensitivity except where the test's effectiveness genuinely depends on it.

(Surely a mock library could give us something much neater than these channels, but I didn't want to go that far into it here.)

  • Please don't use hash as an argument/variable name. It masks Python's built-in hash function (which makes funny syntax highlighting).

Fixed. Now data_hash.

  • KeepClientServiceTestCase also has code to present a mock service list to the KeepClient, and test with it. Would it make sense to merge this and KeepRendezvousWeightTestCase somehow, to reduce code duplication and mucking with KeepClient's internal structures?

Yes, good point. Will rearrange.

  • I've kind of idly wondered before about compatibility issues around stuff that we never advertised as part of the API, but was discoverable and looked public, like KeepClient.service_roots and KeepClient.shuffled_service_roots. I think I'm okay with renaming them if you are, but I guess I wanted to take the opportunity to flag the longer-term issue here. (I'm definitely glad that we're marking private variables and methods more consistently.)

Yes, I agree we should be much more careful about accidentally adding public (as in no-leading-underscore) APIs even when they don't come with other signs of being public, like documentation.

In this particular case, we don't have any particular reason to expose them publicly at all, right?

Now at 9ef386f with todo: Rearrange KeepRendezvousWeightTestCase mock.

Actions #15

Updated by Tom Clegg over 9 years ago

Now at 9ef386f with todo: Rearrange KeepRendezvousWeightTestCase mock.

Tests are refactored at cf1db39

Actions #16

Updated by Brett Smith over 9 years ago

Tom Clegg wrote:

Brett Smith wrote:

  • Go's sort is not guaranteed to be stable. Python's is. Does this matter? It will only come up in the admittedly very unlikely case of a hash collision, and then it will just mean at most one excess request, so I can easily believe the answer is "no," but I wanted to flag the issue for discussion. On the flip side, fixing it is as easy as changing sort.Sort to sort.Stable.

I don't think even a stable sort would fully address this, since we aren't careful (in various places) to keep the incoming list of services sorted in the first place. We could use the service uuid as a secondary sort key, of course. But more importantly, I'd say "undefined order" is perfectly acceptable here: it's an extremely rare event, and even when it happens the expected performance impact is very small. Agreed?

You're right that stable doesn't help. And yes, I agree that the impact is minimal enough to be unconcerned.

In this particular case, we don't have any particular reason to expose them publicly at all, right?

No. If anything makes sense as a public interface here, it's weighted_service_roots. Everything else makes sense as private.

cf1db398 is good to merge, thanks.

Actions #17

Updated by Anonymous over 9 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 67 to 100

Applied in changeset arvados|commit:06e402b11ad4d503feb5fa45845cb27c93478cfc.

Actions #18

Updated by Tom Clegg over 9 years ago

    def test_probe_order_visual(self):
        hashes = [
            hashlib.md5("{:064x}".format(x)).hexdigest()
            for x in range(8)]
        for nsvc in range(2,16):
            sys.stderr.write("\n")
            for i, hash in enumerate(hashes):
                api_client = self.mock_n_keep_disks(nsvc)
                keep_client = arvados.KeepClient(api_client=api_client)
                roots = keep_client.weighted_service_roots(hash)
                got_order = [
                    re.search(r'//\[?keep0x([0-9a-f]+)', root).group(1)
                    for root in roots]
                sys.stderr.write("{:20s}".format(''.join(got_order)))
        sys.stderr.write("\n")
10                  01                  01                  10                  01                  01                  10                  10
210                 021                 021                 102                 021                 201                 120                 120
3210                0213                0231                1023                0231                2013                1230                1320
32104               02413               40231               10234               02341               20413               12304               13240
325104              052413              540231              102354              023541              520413              512304              132405
3256104             0526413             5402361             1026354             6023541             5204163             5123064             1326405
32561074            07526413            54023761            10276354            60723541            52704163            51230647            13276405
325681074           075264813           540238761           810276354           607235418           527804163           512306487           138276405
3259681074          0975264813          5402387691          9810276354          6079235418          5279804163          5912306487          1382764095
3a259681074         097a5264813         540238a7691         9810276a354         6079235418a         527a9804163         591230a6487         1a382764095         
3ab259681074        097ba5264813        5b40238a7691        9810276a3b54        60792354b18a        527a9804b163        5912b30a6487        1a38276b4095        
3ab25c9681074       097ba526481c3       c5b40238a7691       981c0276a3b54       60792c354b18a       5c27a9804b163       591c2b30a6487       1a38276b40c95       
3ab2d5c9681074      097dba526481c3      c5b40238a7d691      9d81c0276a3b54      60792c354b18da      5c2d7a9804b163      d591c2b30a6487      1a3827d6b40c95      
3eab2d5c9681074     097dba52e6481c3     c5b4e0238a7d691     9d81c02e76a3b54     60792c354be18da     5c2ed7a9804b163     d591c2be30a6487     1a3827d6b40ce95     
Actions

Also available in: Atom PDF