Zombse

The Zombie Stack Exchanges That Just Won't Die

View the Project on GitHub anjackson/zombse

Are there inherent benefits to using open source software to manage content for digital preservation?

Does the open and transparent nature of open source software translate into an inherent benefit for digital preservation? It strikes me that some of the kinds of factors noted in the Sustainability of Digital Formats points can be more broadly applied to the stack of software that one uses to manage copies for digital preservation. It strikes me that ideas from the sustainability of formats, like Disclosure, Transparency, Self-documentation, and External dependencies, are all potential inherent benefits to digital preservation work.

So, do you think there is an inherent benefit? Or, does the fact that many of these systems act as appliances mean that it doesn't really matter? Or, do you think there are particular kinds of open source software (say some tool you are using to normalize a set of files, or some platform you are using to manage content and metadata itself) where this matters, and other kinds of software where it would be irrelevant if it was or wasn't open source.

Trevor Owens

Comments

Answer by Chris Adams

Yes. The primary argument for me is the combination of two facts: as with other data software is brittle, requiring regular maintenance, and libraries are not generally major power brokers in the commercial software field.

Software companies have a strong incentive to build applications for as wide a market as possible, leaving libraries with very limited buying power, and as with other areas of preservation, decisions are made based on current market realities and whims. We've seen this with systems like Apple's QuickTime which had very complex functionality (e.g. embedded Flash & other rich, programable content) added and later removed after falling into disfavor — certainly the right business decision and definitely a security benefit but completely stranding the small number of people with a significant investment in content which actually used those features.

At the other end of the market, a small vendor may go out of business or have key personnel retire, leaving all of their uss stranded with no option other than hoping an alternative is available and that the migration path is not too rocky.

Open source software helps with several of these challenges:

  1. In the simplest case, it means that it's impossible for a vendor to simply disappear. It might not be cheap but at least you have the option of hiring a software developer to maintain or improve an open source application.

    Corollary: you really want to make sure that you actually save a copy of the source for anything you use if the current maintainer starts to seem shaky.

  2. There's frequently a huge difference between a format as specified and as practiced. Using an entirely open-source stack now means that you have a known reference implementation from the beginning rather than finding out later that your data really only works well with one particular implementation. If you are forced to migrate later, it can be extremely helpful to have a known working codebase for reference.

    On a related, open source software now almost always implies an open version control system which makes it much easier to determine where and how something stopped working. There's one other interesting aspect: the dominance of distributed version control systems in the open source world is particularly powerful because it breaks the assumption of a single privileged maintainer, making it safer for your organization to maintain local customizations against upstream releases. Handling local customization gracefully traditionally has been a weak point for commercial software vendors for various reasons but it's integral to working in the open source world.

  3. There's a significant related advantage: open source software is usually considerably cheaper and you have cost saving options if you're not paying support but can perform some work in house – this is a benefit to the extent that it either frees budget for more dire preservation work and avoids unpleasant surprises down the road.

All of these generally shift as a function of how close they are to your critical path: if you need a particular viewer for a file format, it's far more important than, say, a search engine on your storage system since the latter operates independently and can be swapped out far more easily.

This strongly rewards keeping your architecture simple and standard because you lower the chances that you'll need to start supporting something which has fallen out of popularity within the community. For example, storing items in a database is far riskier than storing the same content on a filesystem simply because it introduces a massive chunk of complex, possibly expensive, software which has to work simply so you can retrieve a file and the lack of a standard interface would mean that you need to develop or pay for support in anything which needs to access the data.

Comments

Answer by Nicolas Raoul

I totally agree with Chris Adams, and just want to add one thing:

Open Source software usually have better interoperability.\ There are a few reasons for that:

No secrecy: For instance, Microsoft has long been reluctant to disclosing what the structure of their Office files' data/metadata, whereas OpenOffice has always published the specification of their formats. Almost all open source software is the work of a community, and this community needs all specifications to be open, otherwise it can not work. There is no secret, whereas non-open-source companies usually have secrets that they use as a strategic advantage.

Preference for Open standards: When an open source software needs to store something or communicate over a network, it will use an open standard/protocol if one is available. Users of other software using the same standard can easily test and report incompatibities.

Number of protocols supported: Open source software tend to support more protocols/formats that their counterparts. For instance: Our company needed to import data from a specific rare format. The software did not support it, so we paid some company to implement it, and now it is part of the software. That would not happen with non-open-source software.

Comments

Answer by Euan

If you have specified your requirements clearly and thoroughly enough then it shouldn't matter whether the solution you use is open-source or not. For example, if you specify that any files created by the software you want to use must be in open formats and have an open-source rendering application available, then you shouldn't have too many problems with making sure you can access files created by that software in the future.

On the other hand, it is very difficult to be sure that you have specified your requirements sufficiently well to anticipate all possible issues. So it might be advantageous to make sure you have a solution with a significant degree of flexibility, which will often be an open source solution.

Comments

Answer by Michael Kjörling

There is one more benefit of most open source software compared to several commercial/proprietary software packages these days, with regards to ensuring continued and uninterrupted access to content. Mandatory software activation, particularly but not exclusively when taken in combination with proprietary file formats. This is being used by more and more large software companies, and means that even if you have a valid license, have the software media, hardware on which to run it etc., if the vendor decides to no longer support your particular version, you may very well be out of luck; you cannot install and use it independent of the software vendor.

Free/open source software can sometimes be a bit rough around the edges; it might lack the polished two-click installer; and the user interface might not be as well designed as that of a commercial software product. But you can document and create procedures around that, and the same version will still work, and work the same way, two, five, ten or even twenty years from now, assuming that you have a hardware and software environment within which it can run. It can be installed, reinstalled and re-reinstalled as many times as needed on as much new hardware as is needed. And should it become necessary, the environment (hardware and software) can often be emulated, by way of emulation, virtualization or API translation; all three are viable technologies in varying situations. All that completely independently of the person or company who originally wrote the software.

To gain the full value from this, however, you have to consider the entire software stack at least from the operating system on up. A good first step for simulating something like this could be to try to install a usable copy of the software on a new computer that has no Internet connection and without contacting the software vendors in any way. If you can do that, and not end up with an installation that stops working after a short while, then you're probably okay.

Comments

Answer by gmcgath

"Inherent" may be too strong a word. Open source provides the benefits which others have mentioned. Commercial software provides a commitment by a company that needs to honor it to keep attracting customers, while people doing open source can drift away. On the other hand, commercial companies can also lose interest in their product and hope that technology lock-in gets customers to keep renewing their licenses.

You have to look at all the factors, the most important of which is the quality of the software regardless of where it comes from.

Comments