Zombse

The Zombie Stack Exchanges That Just Won't Die

View the Project on GitHub anjackson/zombse

Should Digital Preservationists be Worried about cryptographic hash collisions?

Given the probabilities of hash value collisions are MD5 hashes sufficient for ensuring file fixity? Or should SHA1 or SHA2 be used? Or, should folks be catching all three. I would be interested in both issues to consider for tamper resistance and for simply knowing if what I have is what I think I have.

Trevor Owens

Comments

Answer by wizzard0

Overall, I would use 256-bit SHA2 for next 100+ years.

For tamper resistance, using any two hash algorithms (even if one is broken) increases hack difficulty immensely. So calculating MD5 in addition to sha256 (because it is already widely used) should be more than enough.

Comments

Answer by Nick Krabbenhoeft

MD5 is broken and SHA-1 will be crackable within five years (I especially like the speculation of using criminal organizations harnessing a botnet to crack SHA-1 cheaply.) Because archives help establish the historical records and have often be attacked/manipulated to alter that record, I don't doubt that digital archives will be the target of attacks to remove or alter that record as well.

Checksums will continue to be cracked over time, so digital preservation systems will need a graceful method to calculate new checksums for stored objects and store them in their database alongside past checksums. Retaining hashes from a variety of algorithms will also complicate the task of crafting a collision file.

Comments