Zombse

The Zombie Stack Exchanges That Just Won't Die

View the Project on GitHub anjackson/zombse

Which tools are most effective for identifying the format of ebook (mobi, epub) files?

Which tools are you using for identification of epub or mobi formats?

For my private library I save every book I've bought (either epub or mobi), removing drm. I would keep also detailed metadata about these files.

Fido, the tool I prefer, has some issues. Either jhove or fits are recognizing these files as bitstreams only. The only working tool seems to be epubcheck, but only for epub files.

raffaele messuti

Comments

Answer by Andy Jackson

During last year's file format ID hack, the British Library team came up with some Apache Tika signatures for some eBook formats (you can find them in this magic file). Although set up on Tika, these should be easy to port to Fido.

Comments

Answer by Christian Pietsch

I tend to use the command line. All modern computers except for those running Windows have the file tool which can be used like this:

$ file *.epub *.mobi
Natural Language Processing with Python - Steven Bird.epub: EPUB ebook data
pg8086.mobi:                                                Mobipocket E-book "Down_and_Out-_Magic_Kingdom"

Here, I used file to identify the format of all files that have the .epub or .mobi file extension, but I could have used the asterisk alone to identify all non-hidden files in the current directory. So in this little experiment, file successfully identified the two e-book formats, and for the mobi(pocket) format, it was able to extract the title (or a short form of it).

Comments