What are the advantages and disadvantages of using ZFS for storage in a digital repository?

File systems typically divide the devices they address into discrete blocks. For instance on a system with 1024 byte blocks, a 1.5 kb file will be occupy two entire blocks. What makes ZFS different from other file systems like FAT32, Ext4, and NTFS is ZFS records a checksum for every block of data, for the table of block pointers, and up the chain to the entire file system. When a file is requested, ZFS automatically computes and compares the checksums for every block of the file before delivering it to the user.

In most digital preservation systems, fixity checks are performed on the file by a program accessing a database of checksums. ZFS does this on the hardware level. Is it a suitable replacement for file-level fixity checks? Are there digital repositories using ZFS?

Edit: I'm also wondering how device fixity checks compare to software fixity checks in terms of efficiency.

Nick Krabbenhoeft

digital-preservation
digital-archiving
digital-media

Comments

dsalo: How does a system like this detect tampering? Eve gets into the system and overwrites a file she shouldn't. In a file-based checksum system, it's detected. ZFS doesn't grok "files," just "blocks," so when Eve replaces the file, it just recalculates checksums on the system, no?
Trevor Owens: @dsalo As I understand this, ZFS style systems can resolve issues around bitrot. Aside from that it's just a drive. So all of the bit level threats that come from curitorial error and damage to physical media are in play. So systems for who can do what when and geo-replication are still necessary.
dsalo: Returning to the original question, however, the answer to "is it a suitable replacement for file-level fixity checks?" would seem to be "no," because Bad Stuff can happen to files that block-level checks won't catch.
Nick Krabbenhoeft: My motivation in asking was that it might be more computationally efficient to run fixity checks on the entire repository with a low-level system like ZFS than with a software solution. Thanks for pointing out the file-level problems, I'll have to think more if they can be accounted for with other systems like write once read many.
Martin Schröder: Use a database instead of a file system. If you have to use a file system, seriously consider ZFS. `:-)`

Answer by woliveirajr

The ZFS filesystem is, as it says, a filesystem: it's how a Operating System (OS) stores his files in some media, be it one single hard-disk drive of a RAID system.

ZFS tries to address some problems of data integrity that some other solutions couldn't handle well. But it does that at file level, and just at file level: the probability of reading a file and getting a corrupted content will be smaller.

So, using ZFS would be better because it would lead to less data corruption events that necessitate restoring from backups, etc.

But there are limitations: not every OS can use ZFS. For example, the ZFS license is incompatible with Linux license, so Linux doesn't have a ZFS implementation. And the same occurs with Windows.

And another point is that it's one thing to have a system that handles a digital repository, with all securities it implements (from access, logging of changes, backup control...), and another thing to just save archives in one folder and try to control the integrity there.

So, although it would be good if you could use ZFS in your system, it's not a solution to replace digital preservation system.

Comments

Nick Krabbenhoeft: Thanks for adding the problems of OS compatibility. Are there any file systems or storage devices with fixity checking that are more widely available/usable? I know Oracle's T10000C tape drive can be accessed with Linux, but there's always the danger of Oracles's capriciousness like with the now closed-source ZFS.
woliveirajr: you can take a look at: http://en.wikipedia.org/wiki/Comparison_of_file_systems . In the metadata table, you'll e seeking for the checksum/ECC column. It's said that ext4 have it, but I'm not sure that it's as reliable as ZFS would be. Maybe you could consider first your requirements (OS running, etc) and then search for the *best* filesystem. Since you might be worried about preservation, you should care when choosing a not-widely-used filesystem, because the lack of support might be a problem in the future.

Answer by Chris Adams

Filesystem level checks are a great foundation for an archival system for two reasons:

The filesystem should perform this check every time a file is accessed, making it impossible to silently receive corrupted data. If your higher-level software doesn't provide this, it's almost certain that you'll eventually experience a problem where something is corrupted and accessed before an audit catches it.
The filesystems can perform data-integrity checks ("scrubbing") in the background, often when the storage subsystem is otherwise idle, reducing the performance impact and need for periodic manual scans.
Filesystem based checksums can be far more efficient for random I/O because they're stored at the block level rather than the entire file. This is usually not a major consideration but in some scenarios where only part of the file is read it can be a significant performance benefit while providing full integrity checks (e.g. reading file metadata, preserving large databases, or JPEG-2000s which are used for thumbnails far more often full decoded).

(Note that while ZFS is the most famous option in this regard, btrfs and even NetApp's WAFL have some similar protections, albeit less thorough in the latter case)

This is not complete, however, as there are multiple other problems to be concerned with:

Higher level failures: e.g. if data is corrupted on the network, ZFS has no way of telling that the bits it methodically protected weren't actually the ones you wanted. Similarly, you can trust the bytes you read off of ZFS to be intact but you can't verify that they were received by a remote user unless you have some end to end checks (e.g. setting and validating HTTP Content-MD5 headers)
Human error: no filesystem protects against a human accidentally or intentionally overwriting a file, although snapshots can help at least in the former case.
Replication: knowing that my bits are reliable still doesn't tell me whether you and I have the same file, or that I have a full and complete copy of a multi-file item. Something like a bag manifest is still necessary to address this and #1, and hopefully at least detect #2.

Lastly, while filesystem-level features like ZFS's send/receive can also be very handy for efficient replication, compatibility is a big concern if you're worried about the longevity of a particular platform.

One other category of software to consider would be something like OpenStack's Swift which provides similar characteristics - full integrity checks, automatic redundancy management, user-invisible scalability - but does so using any available filesystem by providing the checks at a higher level. As with something like ZFS, it's not a replacement for a higher-level repository system but it is an interesting choice for a foundation which solves a number of hard low-level problems so you can focus on the higher-level ones.

Comments

Answer by Simon Spero

ZFS does not do checks at the hardware level, but the computation is fast enough to not make that much of a difference.

The design of ZFS makes it very inexpensive to create snapshots of filesystems; these snapshots are copy-on-write, so they don't waste much space in the storage pool. This makes them suitable for use as part of scientific data workflows; the source code, data, and libraries used for a run can be snapshotted, and the output data captured as well. This facilitates replication. Snapshots can be exported for remote archiving.

The big advantage of ZFS over storing a file checksum in a bag or database is that corruption is detected and repaired. The block/checksum hierarchy makes it easier to detect whether it is the checksum itself that is corrupted.

An alternative to checksums and mirroring is to use a FEC type coding systems that can recover the correct signal from corrupted symbols. See e.g. RAPTORQ (though note the patent issues - claims are only waived for IETF standard or experimental broadcast/multicast object delivery protocol not using wide area wireless). This might permit use in a massively distributed online archival storage mesh (e.g. using unused storage space on desktop computers, as an archival equivalent of fold-at-home).

BTW, although Oracle stopped releasing ZFS as open source, they could not take back the versions that were already released, and there is a vibrant open source community maintaining the code- For example MacZFS for MacOS. FreeBSD includes ZFS by default.

A native Linux port is being developed at Lawrence Livermore - the project lead is Brian Behlendorf of Apache fame. GPL stupidity prevents ZFS from being bundled with the software, but it does not prevent the software from being downloaded and run as a kernel module- for ubuntu, try:

\$ sudo add-apt-repository ppa:zfs-native/stable

\$ sudo apt-get update

\$ sudo apt-get install ubuntu-zfs

Not the most hilarious free/open-source license complaint (the license discussed is GPL incompatible for the same reason)- what's the GOOG motto again?