Zombse

The Zombie Stack Exchanges That Just Won't Die

View the Project on GitHub anjackson/zombse

How to detect invalid permutations of MARC fields, subfield, and indicator

I'm working on a web app that allows users to run checks on each MARC record in a MARC batch file, so they can quickly assess the quality of those records. The idea is when batches of records are received (like for shelf ready books), they can generate a report of records to inspect, based on likely indicators of poor record quality.

Some examples (from our cataloger) include:

Users need to be abel to create their own reports, so I'd like to create a form that helps users build queries by suggesting /validating valid permutations of Fields, subfields and indicators. It shouldn't suggest or allow users to create a check for 245\$q because that is (hopefully) not going to exist in any record.

The MARCXML schema doesn't validate to that level of detail. Can I find this data in a format I can use programmaticly?

bibliotechy

Comments

Answer by Simon Spero

Combinations that of presence or absence occur with very low frequency are almost always incorrect.

There have been some studies from UNT and OCLC; you can get more accurate results by on record type, tag/tag conditional frequencies, etc. Using something like Apriori style association-rule mining can be quite useful.

It is fairly easy to get a list of the theoretically valid tag/subfield combinations by parsing the LC pages (the format is quite rigid).

Many theoretically legal combinations are so vanishingly rare that they are almost certainly errors. Warning for combinations that have never been used in your collection has a fairly low false positive rate.

See e.g.:

Moen, W. E., Miksa, S. D., Eklund, A., Polyakov, S., and Snyder, G. (2006). MARC Content Designation Utilization Project - Frequency Counts for Books, Pamphlets, and Printed Sheets Records Created by Library of Congress. Technical report, Texas Center for Digital Knowledge, University of North Texas.

Smith-Yoshimura, K., Argus, C., Dickey, T. J., Naun, C. C., de Ortiz, L. R., and Taylor, H. (2010). Implications of MARC Tag Usage on Library Metadata Practices. Technical report, OCLC Research. Available at: http://www.oclc.org/research/publications/library/2010/2010-06.pdf

Comments

Answer by Bryan Baldus

You may be able to make use of the Perl modules MARC::Lint [1], MARC::Lintadditions [2], and MARC::Errorchecks [3], either by directly or by examining the code to develop the checks/validation you wish to perform.

[1] http://search.cpan.org/\~eijabb/MARC-Lint-1.45/

[2] home.comcast.net/\~eijabb/bryanmodules/MARC-Lintadditions-1.15.tar.gz

[3] http://search.cpan.org/\~eijabb/MARC-Errorchecks-1.16/

Please let me know if you have any questions on the modules.

Thank you

Comments