Zombse

The Zombie Stack Exchanges That Just Won't Die

View the Project on GitHub anjackson/zombse

What services do libraries need from public cloud storage before they are trusted for long-term archival?

Here's what's bothering me today...

Gail Truman

Comments

Answer by nik

I think that one of the biggest problems with public cloud storage is that you have limited control over your data, i.e. you don't have physical control over it.

While you may have all the SLAs and long-term assurances, what happens when the cloud hosting company goes bankrupt or is bought out by another company? This is especially a problem for very long term projects such as archiving.

Comments

Answer by argotechnica

Whether by "public cloud storage" you mean e.g. Box.com, Dropbox, etc. (consumer-grade services); or you mean e.g. Amazon Glacier, which is specifically built for long-term archiving; then maybe there's no reason you couldn't use public cloud storage today if your "assurances for durability" were adequate, e.g. through redundancy across multiple services.

Kindura Project took the approach of using "a hybrid cloud, shared service and in house model" in which the prototype:

provides Infrastructure-as-a-Service (IaaS) components, via storage and compute services, but more importantly it will combine these, using DuraCloud and Fedora as enabling technologies, to provide an integrated Software-as-a-Service (SaaS) package of repository-centric services.

So, what @Chris said: your own strategy must ensure reliability, so redundancy is maybe a way to use cloud storage today.

But it sounds like what you're really looking for is "Archiving as a Service" and wondering, what would make such a thing possible? Zoolz, which uses Glacier, could be a good place to start. Or check out this ACM paper about cloud archiving: https://dl.acm.org/citation.cfm?id=1940782. (Includes a case study "using the records transfer process from Japanese government agencies to the National Archives of Japan as an example.")

In any case, the way you figure out "AaaS" still goes back to your archiving strategy. What problems are you actually trying to solve, and what makes cloud attractive/unattractive as a result? This sounds like an institution-by-institution question to me.

Comments