Zombse

The Zombie Stack Exchanges That Just Won't Die

View the Project on GitHub anjackson/zombse

Is there any real distinction between File Fixity and File Integrity?

File fixity is a key concept in digital preservation, referring to the property of a digital file being fixed, or unchanged. File integrity is a much more broadly used term in the business world which seems to mean exactly the same thing, specifically that a file is still what it was. So, are these two concepts actually exactly the same? Or, is there any subtle and important difference between the concepts based on the digital preservation context vs. the business context?

Trevor Owens

Comments

Answer by Joe

I would say that fixity and integrity are related, but not the same thing.

The problem that we can run into is that the original file that was given to the archive may not be usable or readable over time. Something like the BBC's Doomsday Project comes to mind -- insisting on maintaining the original storage format means that over time, we risk losing the ability to read it.

For science archives, there's the concept of a 'media refresh' every so many years. In some cases, it's just a matter of writing to new physical media to reduce the risk of bit-rot. In some cases, there will be a repackaging of the data as well, where the files will be read, and written back out in some different container (it may be that lots of little tiny files are bundled up into a larger file so we're tracking fewer total records; antiquated file formats are replaced with ones currently in use, etc.)

Depending on the archive (and note that the definition of 'archive' in the science informatics community may not qualify as an archive to the LIS community), we may also 'add value' to the file by changing the metadata attached to the file. In some cases, it's additive (links to documentation, computing coordinates in alternate systems), but in others it may create a different interpretation of the data (eg, changing the time of the observation (correcting for clock drift), or the location (due to corrections in spacecraft ephemeris or telescope pointing))

In these cases, provided that the changes are properly documented, the 'integrity' of the file is still intact, but the 'fixity' is not. In this case, the 'integrity' is probably best explained in the McCumber model of information assurance, where we're protecting against changes that would make someone question the validity of the data, not necessarily against all changes.

Comments