Zombse

The Zombie Stack Exchanges That Just Won't Die

View the Project on GitHub anjackson/zombse

What role can Forward error correction techniques like Erasure Coding play in designing digital preservation storage system architectures?

There is a lot of general discussion about keeping multiple copies of digital content. With that said, I've heard a lot less about a range of techniques for Forward error correction, like erasure codeing. As I understand it, these techniques could get at some of the values and goals that go behind decisions for how many copies of a digital file one is preserving. At the same time, the fact that the original object has been transformed as part of storing it is a bit disconcerting.

Trevor Owens

Comments

Answer by Chris Adams

These are very interesting for reducing storage costs: here's a Microsoft Research paper talking about how they used a new erasure coding scheme to dramatically reduce their raw disk requires for the Azure Storage system to about 1.3 times the original data:

http://research.microsoft.com/en-us/um/people/chengh/papers/LRC12.pdf

When you note that they did this on an exabyte-scale deployment adding a petabyte every few days, the cost savings are appreciable and cost is an under-appreciated preservation risk.

That said, ideally I would keep this below the level of DP architecture because it's really an implementation decision, much like a RAID scheme or filesystem layout. I would focus on setting the high-level reliability guarantees and periodically validating those contracts: I might choose to store some data on Azure or S3, some on tape or quartz crystals, etc. Erasure Coding would be a great way to reduce the cost of those storage services without no user-perceptible change other than the reduced cost. My real focus would then be on inventory and high-level audits.

Pushing use up to a higher level would allow recovery from disparate file fragments but the requirement to use a niche tool for reassembly seems like a high risk without much benefit - if you can't rely on at least one storage system staying available it seems like this detail will be the least of your problems.

Comments