Zombse

The Zombie Stack Exchanges That Just Won't Die

View the Project on GitHub anjackson/zombse

In what cases should a library or archive plan on emulation as part of a digital preservation strategy?

At what point, or under what conditions should a library or archive plan on emulation as a strategy for viewing and or making accessible particular kinds of content?

Trevor Owens

Comments

Answer by Euan

Trevor, I’m not sure if it was intended but your question could be interpreted in many ways. I’ll attempt below to answer most of the interpretations I’ve identified. The question was:

At what point, or under what conditions should a library or archive plan on emulation as a strategy for viewing and or making accessible particular kinds of content?

At what point should a Library or Archive decide what their digital object interaction or access strategy should be? Ideally ies and Archives should know what their interaction or access strategy will be before the objects that they will be handling are created. Knowing their strategy at such a point would enable them to provide guidance to creators that would:

  1. Enable the creators to create without unnecessary limitations on which software or formats they should use (formats can restrict functionality)
  2. Enable efficient and effective continued long term access to the objects
  3. Not be cost-prohibitive over the projected life of the objects to either the creator or future steward of the digital objects

In reality both Libraries and Archives rarely have a sufficient degree of influence over the creation of the content that they end up handling to be able to enforce the application of any guidance they might want to give. As a result it is worthwhile to consider other points at which Libraries and Archives might benefit most from deciding upon their interaction/access strategy for their digital objects. If emulation were to be enacted as a long term access/interaction (I’ve substituted interaction for viewing here as I suspect that is what Trevor is more interested in) strategy then the next best time for the Library or Archive to decide to do this is likely to be before they accept items for transfer to the Library or Archive. Making the decision at that point enables a number of key steps to be included in the transfer process that will make enacting the emulation strategy much easier:

\<>

Intended rendering applications/environments can be identified via the transferring agency or donor.

Software licenses and binaries can be transferred to the Library or Archive.

Standard rendering/interaction environments (e.g. “all of government desktops”) can be snapshotted and moved to emulated hardware for use in interaction/access the objects that are to be transferred.

All of these steps are much more difficult and costly, or impossible at a later date.

How late can a Library or Archive leave deciding what their digital object interaction or access strategy should be?

The answer to this question will depend on what the impact will be of a Library or Archive leaving their decision until very late. What might the negative consequences be for a Library or Archive if they were to leave making their decision about what their digital object interaction or access strategy should be till the point at which someone needs to view or access their objects? If they were to use emulation as their strategy at that point then:

  1. they might struggle to know which software to emulate to use for interacting with the objects (i.e. they may not be able to ascertain what the intended interaction environment should be).
  2. There may be no emulator available for the particular environment they want to emulate and:

    ​a) No documentation left to enable them to create a new emulator for that environment

    ​b) A significant cost may need to be incurred to create a new emulator for that environment

    c)No documentation left to enable use of emulated environment and effective interaction with the object by non-specialists.

What might the positive consequences be for a Library or Archive if they were to leave making their decision about what their digital object interaction or access strategy should be till the point at which someone needs to view or access their objects?

  1. Tools to identify original creation environment and associated default interaction environment may have been created and developed to a point of high functionality.
  2. Emulation services may be available at a low cost that are simple to use and inexpensive (e.g. submit the object and it is remotely emulated and provided for interaction via a browser or local app).
  3. End-user software documentation may be available and integrated with emulation tools and services as a “help-layer” enabling easy use of old software by users who have moved on to new interaction paradigms.

The consequences of leaving such a decision to such a late point when using other strategies will be different. When deciding which strategy your Library or Archive will use it will be worth considering the consequences of the decision with respect to all possible choices.

Under what conditions should a library ochive choose an emulation based solution as its digital object interaction or access strategy rather than an alternative?

There are a number of ways to answer this question. Firstly, a Library or Archive should always choose an emulation based solution as its digital object interaction or access strategy when there are no feasible alternatives. When might that be the case?

At what point does emulation become the only option as a digital preservation strategy?

There is potential for an emulation solution to become the only option for a Library or Archive to use as their interaction/access strategy for their digital content. Such a situation might arise if the following conditions are met:

  1. necessary interaction software still exists
  2. an emulator to run the software exists or can be constructed and tested using original compatibility tests (for more information on how to know when to trust an emulation solution see here.)
  3. using original hardware and software is not an option
  4. no alternative interaction or viewing tools are available (such as migration/normalization tools).

Many files have been created with structures that are not openly documented and for which the only tool available for interacting with or viewing the content captured in that structure is the original software. In such cases, assuming the documentation outlining the way the software writes the files cannot be acquired, using the original hardware is not an option, and the original interaction software can be loaded into an emulated environment, then emulation will likely be the only option left available.

It may be possible to use reprogrammable hardware such as a Field Programmable Gate Array to create the original hardware in programmable hardware. But for all intents and purposes this is effectively another (potentially faster) version of emulation.

When should a Library or Archive choose emulation as their interaction/access strategy only for particular kinds of content and another strategy for other kinds of content?

Digital object interaction/access strategies that rely solely on emulation will never be appropriate for interacting with or accessing object content that involves a hardware component. In such cases, for example when replicating the full experience of a video game as played on a particular console, emulation alone will not be enough for accessing/interacting with the content. Replica hardware, such as game controllers, may need to be produced in order to maintain the full content experience. The advent of inexpensive microcomputers such as the raspberry pi and inexpensive 3D printers such as the MakerBot may make this option much more viable and prevalent in the near future. A migration strategy that is able to be tested and confirmed for every object it is used to access may be more practical in some circumstances for use as a viewing/interaction strategy. For example if processing a lot of simple objects that have few components to them migration may be more economical. Unfortunately most migration tools and processes currently do not incorporate comprehensive testing procedures and anecdotal evidence suggests that migration processes can often destroy content. In such cases the processes are effectively failing at adequately enabling viewing or interaction with the object as the object no longer exists in the same form post migration.

Most good migration candidates are simple objects that often can still be accessed by current software such as text-based formats and simple images. As they can often still be accessed by current software there is a question to be asked as to why the Library or Archive would bother migrating them at all.

Under what circumstances would emulation be a preferable option to the alternatives for accessing or interacting with content?.

In almost all cases employing an emulation strategy for interacting with or viewing old digital content will be preferable than any alternatives. Emulation gives a richer, more authentic experience that can often evoke more emotion and feeling in the viewer than just seeing some raw data would. When coupled with options to enable the user to print the emulated content they are viewing, to copy text & images from the object into a modern environment, to print the content to a file, or to use the original rendering software to create a new file based on the old file but using a different save-as parameter (e.g. from a .doc file to a .rtf), the emulation experience is often a much fuller and more gratifying experience than alternatives.

When in a court of law in which digital evidence is being presented I would personally want to see the content that was presented using the original software, rather than seeing whatever content was left when the original file(s) was run through a migration process (i.e. see the actual content rather than manipulated/different content). I have blogged about this here. In fact this would be my preference in most if not all circumstances in which something important in my life hinged on some old digital content. Whether or not this is a widely held opinion still needs to be tested, however the Rules of the High Court of New Zealand were recently updated to include references to the need for preservation of “native electronic documents” when such objects are submitted as evidence. Native electronic documents or native file formats are defined as: “an electronic document stored in the original form in which it was created by a computer software program”. This may indicate a greater recognition of the need to enable access to/interaction with u versions of digital objects.

Emulation may turn out to be cheaper and economically “fairer” than alternative options.

Emulation may have two economic advantages over alternative methods:

Once emulation services are in place the processes that Libraries or Archives need to implement if choosing an emulation strategy will likely be less onerous than alternatives. The essential steps for implementing an emulation strategy in such an organisation are to know what envients they need for interacting with all their digital objects and to make sure a service exists that provides that environment. The organisations can then choose which business model to operate under to enable access to/interaction with the objects. They might, for example, have a few workstations in their buildings for users to use at no cost for interacting with the objects. And/or they may have a limited set of remote-access licenses. In such cases the end-users would have an option for no-cost access to the objects but if they want to they could also pay for a remote service themselves.

In addition to these optional business models and the low initial cost for Libraries and Archives, emulation tools services are likely to cost less to maintain over time. A single emulator can be used to maintain thousands of emulation environments (or more). The long term maintenance costs will relate to migrating the emulator to new host environments over time. To reduce this cost, initiatives such as Dioscuri the modular emulator offer mechanisms for lowering this cost by removing the need to migrate much of the code base. Most importantly though, the cost of migrating the one emulator to enable the continued emulation of thousands of environments, which in turn enables continued access to millions/billions of digital objects, is likely to be much less than the cost to continually migrate each individual digital file over time.

Furthermore with emulation you are always able to open objects structured with a particular format. With migration/normalisation you may have to decide how long you want to maintain a migration path for. So, for example, if in 100 years time someone comes to you with a WordStar 6 file that they found on a server of legacy objects, there may no longer be a migration path available for it. On the other hand, if emulation had been the chosen strategy for those files then they will still be accessible using the emulated environment used for the other WordStar 6 files that still exist in your Library or Archive.

This leads us to the second reason why emulation may be a preferable option for your Library or Archive: it may be economically fairer. Emulation, unlike continued migration or normalisation, enables Libraries and Archives to pass the access/interaction costs on to the generation that is using the objects. Emulation is applied just in time rather than just in case and so enables the cost to be born by the user (or the institution) at the time of use rather than by whoever is there whenever the files need to be migrated, or by the agency when the file is transferred and normalised.

This is a good point at which to address Normalisation. Functionally normalisation is the same as migration but conducted when the objects are first in the possession of the Library or Archive. Eventually all normalised files will also have to be migrated just as migrated files will eventually have to be re-migrated when the new files are no longer accessible. Any decent migration strategy should be migrating content to open formats and therefore should eventually get the same results as a (open standards based) normalisation strategy would and therefore the two strategies can be equated and over the long term fall-prey to the same criticisms mentioned elsewhere in this answer.

So to readdress the questions Trevor asked:

At what point, or under what conditions should a library or archive plan on emulation as a strategy for viewing and or making accessible particular kinds of content?

A Library or Archive should decide on their strategy as early as possible to gain the most economic benefit as possible. They should choose emulation in most circumstances as it provides a better outcome, but most definitely when they have no other choice. They will not be able to choose emulation alone as a strategy for all content, e.g. content that has a physical component, however if they do choose emulation then it is likely that they will be economically better off and the costs of implementing their strategy will be more fairly distributed to the users that actually benefit from it.

In general though the most important things a Library or Archive can do right now to keep their options open in regards to access/interaction strategies are as follows:

Emulation tools and services have the potential to provide an amazingly rich digital history experience. It is my hope that one day emulation will be the default and invisible option used for accessing older digital content. Objects will automatically be opened in their intended interaction environment and users will be able to extract data and gain insight from the original content in the intended context. Hopefully my answer here helps convince others to get behind this vision also.

Comments