Zombse

The Zombie Stack Exchanges That Just Won't Die

View the Project on GitHub anjackson/zombse

What is the best format for storing digital photos?

What is the best format for storing digital photos, having digital preservation issues in mind?

There are a plenty of formats nowadays, such as JPG and PNG, but how standarized they are, having in mind that after 50 or 100 years they may cease to exist and be replaced by much better formats?

The ideal format for photo preservation should be good documented and relatively simple to implement, so that in worst case, when after those 50 or 100 years someone will find forgotten collection, and there will be no software opening such old formats, it would be relatively quick to implement the converter?

lechlukasz

Comments

Answer by Michael Kjörling

I would argue that if you want simple to read, then uncompressed BMP without a lot of bells and whistles is probably a good choice. It is an extremely simple format (basically a header specifying color depth, width, height and some identifying information, followed by the raw pixel data), and it supports up to eight bits per pixel per channel in three channels (red, green and blue) which is the same as PNG and JPEG. From what I understand, any BMP using less than 256 distinct colors also stores the color palette used, but for photos, it makes sense to use 24 bits per pixel to capture as much color fidelity as possible. The remainder of this answer deals primarily with 24-bpp BMPs, but the general principles should hold at any bit depth.

The lack of compression means the files are very large (a 15 Mpx photo in 24-bit color will be nearly 50 MB in size), but also makes them easy to interpret if only you have cursory knowledge of the format. About all you need to know is that there is a relatively small header followed by RGB byte triplets each specifying one pixel (or an appropriate number of bytes per pixel specifying indexes into the palette), and you can probably write a decoder by simple trial and error in a few hours. Given a set of uncompressed BMP files and preferably some clue what they are supposed to look like, implementing a BMP viewer from scratch is a fairly trivial task.

Another upside of uncompressed BMP is that minor data loss (other than in the header) leads to absolutely minimal degradation: only the exact pixels actually affected by the data loss are corrupted. And with some mathematical trickery and human trial-and-error, the header can probably be reconstructed from an otherwise complete image file, even if the image data itself is degraded or corrupted, as long as the number of bytes is correct.

The major downside of BMP is that the files are huge. For a full-color image of any substantial pixel size, the resulting file size in bytes can be approximated as 3 * H * W, where H and W are the height and width in pixels, respectively. As we know, image size expressed in megapixels is simply H * W, so for a given image size of Mp megapixels this simply reduces to 3 * Mp megabytes of image data.

BMPs can also store ICC color profiles, enabling the viewer to accurately render the colors as intended. That would be more complex to implement, and I'm not sure how widespread support for ICC-carrying BMPs is even today, but also would allow more faithful visualization of the data than is possible with simply RGB triplets.

Pretty much any other format which meets the requirement of being easy to read and long-lived in the face of advancing technology, compression algorithms and so on, is going to basically boil down to the same thing (a container plus raw, uncompressed bitmap data representing pixel color value triplets). Uncompressed TIFF for example falls into that category. So unless you have specific needs which cannot be met using BMP you would gain little by choosing such a more advanced format, and risk making it more difficult to understand for someone who has to reverse-engineer it. Better then to stick to the KISS principle: keep it simple, stupid. For preservation purposes, I'd take a stack of BMPs over a single multi-page uncompressed TIFF any day.

Comments

Answer by Courtney C. Mumma

The Archivematica project made format policy decisions based on a review of significant characteristics and decided upon the best preservation and access formats based on testing open source tool outcomes, availability of open source conversion tools and ubiquity of format rendering software.

In short, for raster images, we chose uncompressed TIFF for all but JPEG2000 and PNG, which will be retained in their original formats for preservation and JPEG for access. For vector images, we chose SVG 1.1 for preservation and PDF for access. Finally, for raw camera files we chose to keep preservation copies in their original format though we may consider DNG in the future. For access of raw camera files, we again selected JPEG.

You can find detailed discussions of our decision-making and testing here: Raster images: https://www.archivematica.org/wiki/Raster_images Raw camera files: https://www.archivematica.org/wiki/Raw_camera_files Vector images: https://www.archivematica.org/wiki/Vector_images

Comments