Zombse

The Zombie Stack Exchanges That Just Won't Die

View the Project on GitHub anjackson/zombse

How do you control for objects generated invisibly by OS X, Windows, Linux, etc?

I'm running some experiments with BagIt and noticing a surprising amount of hidden files being captured in my bags such as .ds_store, thumbs.db, and *.shs. Since these aren't essential files needed to understand the content of the bag, I'd rather not keep them around. Even worse, because some are generated systematically, my bags can become invalid when a thumbs.db or .ds-store sneaks in.

Is there a list of system files like thumbs.db that I can use to recursively search and delete before bagging?

How do I ensure that my bags aren't invalidated as they sit on a portable hard drive? (My planned Linux-based NAS is still a few months out.)

Nick Krabbenhoeft

Comments

Answer by peshkira

It is definitely not a complete solution, but one approach could be to check the example .gitignore files here: https://github.com/github/gitignore/tree/master/Global

You can skim through some of them (which seem relevant to your problem) and compile yourself a list with common files and extensions for the different operating systems. It could be a good start to filter the undesired files.

Comments

Answer by Andy Jackson

Once you have stripped these files out, and made a clean bag, I recommend making the whole bag read-only. This will prevent the OS from dropping these hidden files back in again.

Comments