Changing Keep hash algorithm » History » Version 1

Tom Clegg, 03/08/2017 03:57 PM

1 1 Tom Clegg
h1. Changing Keep hash algorithm
2 1 Tom Clegg
3 1 Tom Clegg
(notes moved from #4424)
4 1 Tom Clegg
5 1 Tom Clegg
The md5 algorithm is well known to be insecure.  Attackers can force hash collisions on arbitrary data, for example
6 1 Tom Clegg
7 1 Tom Clegg
In the case of Keep, this exposes an obvious vulnerability.  If an attacker known the hash of a block, he or she could subvert the permission system this way:
8 1 Tom Clegg
9 1 Tom Clegg
# Generate a block that collides with the desired hash
10 1 Tom Clegg
# Upload the collision block and receive a signed token
11 1 Tom Clegg
# Use the signed token to request the block
12 1 Tom Clegg
13 1 Tom Clegg
(We may be able to tighten Keep's behavior to make this attack more difficult, such as doing a byte-for-byte check that the uploaded block matches a known block.)
14 1 Tom Clegg
15 1 Tom Clegg
This is a general vulnerability that attacks the assumptions of content-based addressing, so it seems very likely that there are other more subtle attack vectors.  Another possible attack would be a denial-of-service attack by uploading bogus blocks with specific content hashes and garbage content, preventing a user from uploading legitimate data.
16 1 Tom Clegg
17 1 Tom Clegg
We need to start thinking about moving to a best practices cryptographic hash.  The first obvious choice would be SHA-1 (used by git), but it is already considered vulnerable so we should look at SHA-2 or SHA-3.
18 1 Tom Clegg
19 1 Tom Clegg
Fixing this is likely to be somewhat difficult and disruptive, since there is already a lot of code that makes assumptions about the format and length of content hashes used by Keep, which would become longer with a stronger hash function.