Zombse

The Zombie Stack Exchanges That Just Won't Die

View the Project on GitHub anjackson/zombse

When using a Fedora Repository what are the relative merits of storing files inside and outside the system?

So Fedora is intended as a framework for building digital repositories. With that said, some organizations chose to store digital objects and files inside the Fedora system while others chose to reference them from inside Fedora. What are the relative merits of each of these options? In particular, what preservation risks do you mitigate and expose your self to with either option.

Trevor Owens

Comments

Answer by Greg Jansen

One has to assume that in a preservation repository the external datastreams are also managed in some sense. So the risk will depend upon how well they are managed. A local solution always implies that you sustain local knowledge of the system and so there is some degree of risk in that. In contrast, the Fedora system knowledge is sustained by the DuraSpace organization and user community.

In general there is more likely to be a complete and accurate audit trail for managed data streams, since changes come in via the Fedora APIM. Timestamps in Fedora will match up with when the data was written to storage. If configured properly, Fedora will calculate checksums whenever the data stream is written. The data will also be protected by the Fedora security layer; for instance delete may be forbidden by FeSL policies. External data streams will have to be protected by some other means.

One disadvantage of managed datastreams is the ingest process. If Fedora manages the data, then Fedora must stream the data into storage. The data moves through the Fedora server. Unless you enhance the Fedora ExternalContentManager module, this means data is uploaded to the Fedora server over HTTP or HTTPS. For larger data this is a problem. Hopefully this constraint will go away at some point, allowing for other ingest strategies. There are workarounds. At UNC we extended the ExternalContentManager to support ingest from iRODS locations.

Having to physically move the data, as opposed to a logical move, seems like a risk to me. It necessitates a checksum comparison, one calculated before ingest and one calculated by Fedora.

Comments