Changing Keep hash algorithm

(PA's notes moved from #4424)

The md5 algorithm is well known to be insecure. Attackers can force hash collisions on arbitrary data, for example http://natmchugh.blogspot.com/2014/10/how-i-created-two-images-with-same-md5.html?m=1

In the case of Keep, this exposes an obvious vulnerability. If an attacker known the hash of a block, he or she could subvert the permission system this way:

  1. Generate a block that collides with the desired hash
  2. Upload the collision block and receive a signed token
  3. Use the signed token to request the block

(We may be able to tighten Keep's behavior to make this attack more difficult, such as doing a byte-for-byte check that the uploaded block matches a known block.)

This is a general vulnerability that attacks the assumptions of content-based addressing, so it seems very likely that there are other more subtle attack vectors. Another possible attack would be a denial-of-service attack by uploading bogus blocks with specific content hashes and garbage content, preventing a user from uploading legitimate data.

(Note the above is somewhat overstated, since the known attack here is a chosen prefix collision, not arbitrary data. For example, the "obvious vulnerability" would only give access to blocks that were specifically crafted by an attacker who intended them to be vulnerable. But the conclusion stands: MD5 is not a good long term solution! -TC)

We need to start thinking about moving to a best practices cryptographic hash. The first obvious choice would be SHA-1 (used by git), but it is already considered vulnerable so we should look at SHA-2 or SHA-3.

Fixing this is likely to be somewhat difficult and disruptive, since there is already a lot of code that makes assumptions about the format and length of content hashes used by Keep, which would become longer with a stronger hash function.

https://github.com/multiformats/multihash -WV