Changing Keep hash algorithm » History » Version 2

Tom Clegg, 03/08/2017 04:08 PM

1 1 Tom Clegg
h1. Changing Keep hash algorithm
2 1 Tom Clegg
3 2 Tom Clegg
(PA's notes moved from #4424)
4 1 Tom Clegg
5 1 Tom Clegg
The md5 algorithm is well known to be insecure.  Attackers can force hash collisions on arbitrary data, for example http://natmchugh.blogspot.com/2014/10/how-i-created-two-images-with-same-md5.html?m=1
6 1 Tom Clegg
7 1 Tom Clegg
In the case of Keep, this exposes an obvious vulnerability.  If an attacker known the hash of a block, he or she could subvert the permission system this way:
8 1 Tom Clegg
9 1 Tom Clegg
# Generate a block that collides with the desired hash
10 1 Tom Clegg
# Upload the collision block and receive a signed token
11 1 Tom Clegg
# Use the signed token to request the block
12 1 Tom Clegg
13 1 Tom Clegg
(We may be able to tighten Keep's behavior to make this attack more difficult, such as doing a byte-for-byte check that the uploaded block matches a known block.)
14 1 Tom Clegg
15 1 Tom Clegg
This is a general vulnerability that attacks the assumptions of content-based addressing, so it seems very likely that there are other more subtle attack vectors.  Another possible attack would be a denial-of-service attack by uploading bogus blocks with specific content hashes and garbage content, preventing a user from uploading legitimate data.
16 1 Tom Clegg
17 2 Tom Clegg
_(Note the above is somewhat overstated, since the known attack here is a chosen prefix collision, not arbitrary data. For example, the "obvious vulnerability" would only give access to blocks that were specifically crafted by an attacker who intended them to be vulnerable. But the conclusion stands: MD5 is not a good long term solution! -TC)_
18 2 Tom Clegg
19 1 Tom Clegg
We need to start thinking about moving to a best practices cryptographic hash.  The first obvious choice would be SHA-1 (used by git), but it is already considered vulnerable so we should look at SHA-2 or SHA-3.
20 1 Tom Clegg
21 1 Tom Clegg
Fixing this is likely to be somewhat difficult and disruptive, since there is already a lot of code that makes assumptions about the format and length of content hashes used by Keep, which would become longer with a stronger hash function.
22 2 Tom Clegg
23 2 Tom Clegg
https://github.com/multiformats/multihash -WV