stub

What to Preserve?

Underline assumptions, Keeping Codes, formats as malleable, etc.

The bitstream: “…the bitstream is a powerful abstraction layer…”

http://blogs.loc.gov/digitalpreservation/2013/12/bitcurators-open-source-approach-an-interview-with-cal-lee/

Use, only then are we known: http://theodi.org/blog/five-stages-of-data-grief

Also, Rothenburg, Bearman, UVC http://iwaw.europarchive.org/05/papers/iwaw05-hoeven.pdf KEEP VM http://www.keep-project.eu/downloads/training/09kvm.pdf etc. XCL has basically the same flaw.

Time travel - Passive history is needed too.

Preserve the Mona Lisa by duplication? Delete the original?

  • The Performance Spectrum, just the one. ** http://notepad.benfinoradin.info/2012/08/28/take-a-picture/

The Migration Line

Leading on from The Stack, and complementing the Performance and Information.

Migrations And Emulations

  • Strategies are where you draw the migration bubble on the stack.
    • Standards Reliance is the implied default.
    • Also, migrate to supported environment is a double bubble.

Rothenberg Vernacular (Middle english example ) is about perception side. o-ring data is about the extrapolation, not the data. very different major migration loss. Rotherberg vernacular extraction example, cut-paste from emulator, is a migration.

digital preservation for years. The most common ‘preservation action’ is porting. Calling it ‘source code migration’ underplays its importance.

http://drj11.wordpress.com/2013/09/01/on-compiling-34-year-old-c-code/

Backwards compatible, new versions run old stuff. Breaks in this, e.g. deprecation, are the issue. REF that high-change-freq. chart from last years iPres.

ref The First Preservation Strategy: Format…

Spectra

Performance and Information

To outline the two halves of the communication, the performance and the understanding, or reference another page covering this.

That the difficulties in preserving access to digital media arise primarily because the access the item is mediated, not because they are digital. The same can be said of needing a speaker of an obscure language to understand a book. You can either keep the language alive (emulation), or translate the book (migration).

That RI is of two kinds:

  1. Performance RI, that is the information required to reconstruct a particular interpretation of a digital artefact, i.e. to regenerate a signal to be interpreted by an actor outside the system. e.g. rendering a PDF.
  2. Perception/Understanding/ RI, that is any futher information required to support the actor in understanding the preserved signal.

That actor-RI is subjective. It’s consqeuences aare tjhat is should be layered rather that designed up-front, and turns into educational resources if you wait long enough

That RI is of two kinds:

  1. Performance RI, that is the information required to reconstruct a particular interpretation of a digital artefact, i.e. to regenerate a signal to be interpreted by an actor outside the system. e.g. rendering a PDF.
  2. Perception/Understanding/ RI, that is any futher information required to support the actor in understanding the preserved signal.

That actor-RI is subjective. It’s consqeuences aare tjhat is should be layered rather that designed up-front, and turns into educational resources if you wait long enough

IMPORTANT: Identifiers and distinguishing ‘this version created the item’ from ‘this is the intended interpretation’. e.g. DOC is tricky here, as writer version confused with reader version?

Then the choice is the spectrum of attempting to preserve the performance precisely, and therefore avoiding the interpretation side. OR attempting to preserve ‘the information’, and judging that we can judge whether a new performance achieves this.

What is a document etc: http://people.ischool.berkeley.edu/~buckland/whatdoc.html http://inkdroid.org/journal/2013/10/15/suzanne-briet-on-ada-lovelace-day/

http://www.theatlantic.com/health/archive/2013/02/our-comprehensive-living-archive-of-apples/273538/

Possible a War Story: http://gigaom.com/2014/01/14/the-search-for-the-lost-cray-supercomputer-os/

Continuity and Archaeology

Another spectrum, from continuous, sustainable access to leave it and dig it up later.

Example, ideally, we just keep the stuff and allow researchers to dig it up.

Permanent Access Digital Continuity …

Use and Usability

A fundamental aspect of known what we want to preserve with whether we are more concerned with the way a piece of software was used, or the ability to use the software.

  • Choosing the performances: reference wikipedia black out case study
  • The Performance Spectrum, just the one.
    • http://notepad.benfinoradin.info/2012/08/28/take-a-picture/

Contrast with physical? Is this a fundamental difference due to process nature of digital resources.

Diagram.

Examples.

Games, SecondLife, Facebook. Apps, dynamic websites etc.

Contrast with mature information carriers, PDF, JPEG, CSV, where we want to use the software that allows access to the data, but precise recovery of exact experience is ok?

Spectrum, web archiving somewhere in the middle. Partial functionality, partially user simulation.

A particular performance, the possible performances, the performances precisely.

Far end is full author environment stuff?

http://first-website.web.cern.ch/objectives/document-and-share-line-mode-browser-experience

Two dimensional?

Horizontal: Who to preserve? A particular user, some particular group of users, some average user, most users, all users, (any user). Vertical: Documentary - Simulation

When to Act?

Something about monitoring user needs being more important in web archives, but monitoring technical environment (although that’s basically users) also needed. As important as monitoring content?

How to Decide?

Lots of choices, but before leaping to conclusions, we need to consider who will decide.

  1. easy issue reporting for those that want to.
  2. building access channels together with research groups.
  3. anonymous monitoring of usage to determine preferences.

LINK TO A perfect digital preservation experiment

Designing a RCT for DP

  • imagine what a auitable RCT would look like
  • reverse engineer closest approximation
  • argue that TB (and actually pt) are cargo cult science post hoc rationalisation of the first bit only. e.g. modelled on paper, not true process, maps one experiment to one decision,which is unrealistic, as a theory is usually supported by multiple experiments.
  • Pt non random sampling as danger
  • Strategy, action and then implementation to be chosen separately?
  • Current framework evaluates whether the action was implemented well, rather than whether it succeeds in it’s aim of preserving digital material. The critical issues, of whether the significant properties you chose are truly significant, are poorly addressed.
  • The most important decisions - what strategies shall we consider and why? what significant properties will we use and why? - are not actually documented explicitly.
  • This is not a case of theory, but of values, as the basic ‘theory’ is known (not speculation to be tested). Which PA should be lead by rich understanding of the choices and evaluation should focus on distinguishing those choices rather than their implementation.
  • http://en.wikipedia.org/wiki/Design_of_experiments, http://en.wikipedia.org/wiki/Multivariate_testing, RCTs, A/B testing, Go back and check Delos paper See also trial registries http://blogs.worldbank.org/impactevaluations/what-can-we-learn-from-medicine-three-mistakes-to-avoid-when-designing-a-trial-registry-guest-post-b

Which should reveal the critical role of users etc.

LINK TO Credible Threats

Threats, Threat models not defined, see Making Plans.

Trust models omitted - trust is social. see also http://en.wikipedia.org/wiki/Source_criticism

  • http://www.niemanlab.org/2013/01/is-it-real-witness-builds-an-app-to-verify-user-submitted-content/
  • https://guardianproject.info/informa/
  • http://blog.witness.org/2013/02/citizen-video-for-journalists-verification/
  • http://www.signable.co.uk/legal

People, records and power: what archives can learn from WikiLeaks

https://twitter.com/petemay/status/406397838529007616

We can imagine many risks, but following Rosenthal, we use the concept of ‘threats’ to underling the importance of knowing the likelyhood. Risks formally defined as likely hood combined with severity of outcome, but language helps avoid confusion over common usage.

  • Format analysis - cf versions don’t matter, features do.
  • Hard disk quality risks - Losing half the array - CDROM of health data
  • c.f. Advice for public sector TNA?

Obsolescence

Given that is has proven possible to access the data, some might argue that the floppy disks and the data on them are not obsolete. I would say that this long, brittle chains of migration and extensive guesswork are exactly what obsolescence looks like.

Even though a more efficient way to access the disks themselves is now available, the problem shifts understanding how to use the disks and the software, and how to access the data within.

It is the slow death of a thousand ambiguities, rather than an sudden, jarring expiration. Obsolescence is approached, rather than attained, with the costs of access rising every step of the way.

Death of a Feature: https://groups.google.com/a/chromium.org/forum/#%21topic/blink-dev/_tHSX0IYXhQ

Real value/risk is still there:

  • CAD
  • That story about audio masters for Mac plugins
  • Keynote

Vendor/Consumer relationship

http://www.avpreserve.com/blog/preservation-fragments-1/

Vendor Obsolescence

IE6, SilverLight, Flash?

Vendors, communities and trust

-The enthusilasts will be the ones who save us, who are right no biolding the towers flog tepees gain information that thefuure will need to reconstruct t IRS past.

  • Also, many institutions hands are tied by the law or he fear of it. The enthusiatst carry on, generally safe from prosecution while non commercial.
  • So how do we trust them? How do we build this expertise into our systems
  • Earlier
  • A comment by a sceneor manager made me realise that, as things are now, vendor onsolcsnce is a real risk. The only route to establishing a access system relies on extren vendrs.
  • I took it for granted that an institution like the bl would maintain the people it needs to undrstans it’s own content. Surely no memory institution could expect to function without the people and code that embody the undrstansibg? We could not function without staff that understand paper, why should digital be different?
  • Economics, skills, etc
  • But also, contractual approach
  • or at least the wider comm

LINK TO Making plans

Communicating with the Future
Creative Commons Licence
Keeping Codes
by Andrew N. Jackson
comments powered by Disqus