How valuable is it to talk about "non-MARC" data?

Does your institution talk about managing non-MARC data? If it does, what does it mean to you? Shouldn't we talk about this data in more concrete terms?

Ed Summers

marc

Comments

Joe: My group manages *no* MARC data, as we're not a library ... I'm probably the only one in my group that even knows what MARC is. Does that count?
Ed Summers: Do you talk about it as "non-MARC"? It sounds like you don't, for which you are lucky :-) Maybe this is just an oddity of my current workplace. Although a search appears to indicate otherwise...
Joe: no, because the others have no idea what MARC is. (they know who Marc is, though, as we share an office). Most of our metadata they refer to as 'FITS headers', (even when it's not in a FITS file) and it has some of the same problems as MARC (some fields being ambiguous as groups have interpreted it differently), and as there are 53 reserved keywords in the FITS standard, but you can add whatever you want, it gets really, really messy as different sets of keywords to describe the same/similar concepts.
Joe: Oh ... and your Google search is useless -- 'non-marc' will find any occurrence of 'non' with 'marc'. You need to wrap it in double quotes. (this will find occurrences of the phrase 'non marc' or 'non-marc')
Ed Summers: Thanks for fixing the "useless" search. So do you talk about non-FITS data then? Hopefully not.
Joe: FITS is a standard for scientific data, so 'FITS' == (data + metadata); 'FITS headers' == metadata; 'FITS data' would *probably* refer to the payload of the file (typically an image, stack of images, or a n-dimensional 'data cube' (which can be more than 3 dimensions)). But depending on the group of people, 'non-FITS' could be an insult (as in, 'we only care about FITS ... all the rest aren't useful'), but FITS is only widely used in imaging fields (astronomy, solar telescopes, medical imaging) those who deal in time-series data prefer CDF, NetCDF. The earth imaging folks use HDF or GeoTIFF.

Answer by dsalo

Phew. I think the answer to your question reflects organizational realities more than technological ones.

The libraries I've worked in had an entire department devoted to MARC cataloging. Anything else lived in the wainscoting -- the impoverished digital library, or the starved, staggering, scorned institutional repository. Whatever Those People Over There did, to a cataloger, was "non-MARC data," and you'd better believe it wasn't as important or as professional as MARC. Were Those People Over There even librarians? Some catalogers said not, frankly.

(There's a strain of rhetoric in the "whither libraries?" literature that bolsters this organizational dysfunction: it holds that MARC is one of librarianship's central mysteries, such that veering away from it means abandoning librarianship altogether. Michael Gorman and Martha Yee write in this mode, among others.)

The world is changing, of course. "Non-MARC data," even if we just consider it "library-created or library-held metadata in a form other than MARC," probably outweighs MARC data bit for bit by now. (This is a mildly unfair comparison, because MARC was designed for economy of storage, and XML was not. Even so.) Measuring by current-processing output, non-MARC data probably gives MARC a run for its money these days, considering how mainstream digitization has become, and how relatively simple and automatable it is to create much non-MARC data compared to MARC/AACR2/ISBD. (I grant you, PREMIS is a bear.)

But the linguistic markedness of MARC (see what I did there?) doesn't reflect production volume or holdings volume, much less a measure of impact on patrons or on the world. It reflects a MARC priesthood that still maintains considerable power over discourse within librarianship.

IMO, YMMV, etc.

Comments

Ed Summers: Thanks @dsalo. Now that you've said it I can see how *non-MARC* really says more about the organization and its view on librarianship than it does anything technical. There's an almost derogatory, look-down-your-nose flavor to calling data *non-MARC*, which kind of gets me riled up a bit :-)
dsalo: Oh, believe me, I completely understand! Been there, done that, been called "not a real librarian," whole nine yards.
Joe: I think 'non-MARC' is either a criticism of other standards (but they don't have the widespread adoption of MARC!), or a criticism of MARC (in the vein of 'anything would be better than MARC!')
ksclarke: Huh, I took "non-MARC" data as a positive. Like, we have all this MARC data we have to deal with and then we have all this non-MARC data which is so much easier to work with... :-)
ksclarke: Though I can see the we have all this MARC data and then we have all this other data (which isn't as "rich" as MARC), too, argument.

Answer by Joe Atzberger

Back when I was more active in them, the Koha and Evergreen communities routinely got questions about when those platforms were going to support "non-MARC" data. There were a few types of folks with these requests:

De facto catalogers who thought "MARC is too hard", by which they mostly meant "this cataloging interface sucks". They were rather correct. There should exist a cataloging interface that is intuitive enough to handle simple cases without the user knowing AACR or even that the system uses MARC internally (if it does). For example, something like Delicious Library 2.
Experienced catalogers or archivists who actually had specific formats in mind (e.g. DublinCore, EAD, RDA, etc.) or problems those formats are more tailored to address.
Early adopters of Koha who yearned for simpler times before MARC-centric code was introduced (pre-2.0, iirc).

The value of MARC is bibliographic data. MARC data is widely available for free (via z3950 or a web download), tools for manipulating it are widespread (often free), vendors can provide it, and any modern ILS can import it. No alternative format has comparable entrenchment.

The less bookcentric you are, the less you care about any of that. Arguably most libraries are finding themselves moving from more bookcentric to less, so this will keep coming up.

But to make any progress, yes, more concreteness is required. Crowdsourced tags are "non-MARC" data. LDIF is "non-MARC" data. EDI is "non-MARC data". OAI is "non-MARC" data. Etc., etc., etc....

Comments

Answer by Simon Spero

I would guess that there is more non-MARC data at LC place of work than there is MARC data, if nothing else, because of TIFF images of newspapers take up a heck of a lot more space even than MARC-XML.

Comments

Answer by Jenn Riley

In academic libraries today, "working with non-MARC data" is frequently a very different beast than "working with MARC data". Different underlying technologies are involved, as well as different approaches to creating, maintaining, and reusing that data. The former is often a shorthand or code for an entire skillset and approach to work by the individual performing that work. That's been the case for the last 15 years or so, but it is changing. (And a welcome change at that.) As more and more individuals become versed in more and more methods of creating and processing data, we'll lose all of that implied baggage. And on a slower pace, as the balance of MARC to non-MARC metadata evens out a bit, we'll have to find better language for these things. I look forward to that day.

Comments

Answer by trevormunoz

I agree with dsalo and Jenn Riley above but to address the last part of the OP's question---how we might talk about this in more concrete terms---there might be value in beginning to be more careful with our use of the terminology around MARC. "MARC" is a carrier/format, the content standards that govern what information gets conveyed around in MARC format are (usually) AACR2 /ISBD.

So, in the realm of other ways to talk about library data, we can talk about what information we record and how we serialize that data. Both are important discussions underway in library-land (as I know the OP and other commenters well know) and investing in the specifics of those discussions can be a helpful way to step over/around the unhelpful distinction between MARC and non-MARC.