Metadata and Metadata

Technicalities 42(6) Nov/Dec 2022: 12-15

Back in 1999, Marcia Lei Zheng published a foundational article on metadata, explaining and demonstrating the promises of the rapidly developing field of metadata. [1] Zheng compared three metadata standards—library cataloging (combined AACR2/USMARC), an early iteration of the Dublin Core, and the Visual Resources Association Core Categories for Visual Resources, version 2.0. The latter two were published in 1997, almost ancient by today’s standards. Zheng compared them data element by data element, examining each standard’s provision for controlling the specific attribute of the object being described. She chose items from a historical fashion collection for the test, which steered away from AACR2’s focus on bibliographic objects. The claim made for AACR2 was that it was applicable to any kind of form, a hubris that was inversely matched by its brevity and lack of specificity for “three-dimensional artifacts.” The strength of AACR2 was set aside for its applicability to “nonbook objects” (a term still widely bandied about by library catalogers but used by almost no one else), which evened the contest with relatively lightweight metadata standards.

I used that Zheng article for years as either a required or supplemental reading for an introductory course on knowledge organization. It accomplished a couple of things. It began in territory relatively familiar to students, that is library cataloging, and expanded to a consideration that there might be other standards one could use to describe resources. Zheng thus defanged the scariest part of metadata which was a bunch of new vocabulary to describe things in a digital environment. By starting with familiar standards and generic descriptive operations, Zheng transformed the new and unfamiliar into a relatively simple extension of what students already knew.

But she also brought in some new wrinkles, especially whether objects in one collection using a particular standard could be integrated with similar objects in a different collection that were described by a different standard. The challenges of metadata crosswalks (a term she did not have yet at hand) and of federated collections showed the promises and perils of metadata applications. At that point we still thought the limitations and challenges were primarily technical—primarily getting the standards right. General questions about representation, the limits of language, and the plurality of both documents and users—all issues in the “politics of metadata”—had not yet risen to the fore.

Almost a quarter of a century later and Zheng is still at it, with her returning co-author Jain Qin, with the publication of the 3rd edition of their textbook Metadata. She still occupies a central position in metadata research and instruction, and this book (along with the first two editions) solidifies that latter portion. The word that was used in review after review of the previous editions was “comprehensive”, and at 613 pages plus an accompanying website, this book is definitely that. The strength of a good textbook is that it consolidates a field’s terminology and phenomena, and it organizes the conversation around them, and this textbook does that. Most textbooks are known more for establishing or reinforcing convention and less for their invention; this book is not a major exception to the rule, but it is more inventive than most.

Like the previous edition, this book is organized in five parts:

I: Fundamentals of Metadata
II: Metadata Vocabulary Building Blocks
III: Metadata Services
IV: Metadata Outlook in Research
V: Metadata Standards

The first two sections will be generally familiar to those who have familiarity with cataloging specifically or metadata generally. There is a general tension in finding the right level of abstraction, not only in finding a common language to describe the variety of metadata standards and their purposes, but also in finding the level of language that is right for beginning students and still providing insight for advanced readers. The focus here in the first two sections is probably on students, and yet there is still a lack of clarity—the authors are really ambitious here. The first 50 pages (only the first third of the first section) introduces Dublin Core, VRA Core 3.0 and 4.0, the concept of ontologies, Resource Description Framework, metadata descriptions, syntax and semantics, and item- vs. collection level descriptions. If I have a bit of unwelcome advice here (and almost all advice is), it would be to slow it down and give a sense of narrative arc here, so we know why we’re talking about something here rather than just what we are talking about. Svenonious’ 2000 classic2 is famous for providing that arcing framework; it is nearly impossible to find a through line in Glushko’s 2013 attempt3 of a counter-part to Svenonius. My sense is that those who love metadata will be fine; the general student will struggle to make sense at times by putting pieces together into a whole.

One result of their framing is that some topics get picked up to discuss the aspect that fits in the framing, with the remainder of the topic to be addressed later. One example of this is FAIR (Findable, Accessible, Interoperable and Reuseable) principles. FAIR has ten entries in the index. It appears under 1.6 Principles, 6.5 Ensuring Optimal Metadata Discovery and Increasing Findability, 7.4 Metadata Quality Measurement Indicators: CCCD, and 9.5 Metadata and Data-Driven X, amongst other places. The scatter would be fine if there was some central place where FAIR was properly introduced. Unfortunately there are too many topics of importance of this nature, where a confused overarching structure is given priority over an encapsulating treatment over a series of topics. For newbies looking for a simpler introduction, Pomerantz⁴ is one of those slender little introductory volumes that offers a quick explanation.

Impasse #1: The Widening Gyre

And yet it is hard to fault the authors for this. The organization of the book is the result of two impasses that face not just the authors but the entire field. The first is the simple proliferation of metadata standards, and their diversification. Although the focus here is on libraries and other forms of cultural heritage, the number and scope of standards poses a challenge for authors of textbooks in this area: streamline and omit, or blaze a trail through an increasing thicket? TEI? HTML? SGML? XML? Most of these take up a lot of room in Glushko’s book; here fortunately, there isn’t as much. Zeng and Qin present quite a bit on XML schemas; there is relatively brief treatment of embedding metadata within HTML.

To my eye, most of the new content in this edition is related to the expanded treatment of linked data systems, especially a section in Metadata Services, title “Metadata as Linked Data.” There is a lot of promise in this section, and yet it’s still relatively underdeveloped. Why, for example, are the Getty Vocabularies treated as a linked metadata system versus other kinds of vocabularies? Because they can be linked as URIs? Is this the distinguishing feature of linked systems? For the reader with a good foundation on cataloging, I would recommend a dedicated text on linked data and use this volume as a reference for the many of metadata systems discussed. The center may not hold.

Impasse #2: A Delayed Reconsideration

And that leads to the second impasse. Since the introduction of computing generally and the development of the internet and retrieval technologies, we have not had a major rethink about the purposes and forms of cataloging. This dilemma dates at least since the development of MARC which was created to facilitate the printing and distribution of cataloging provided by the Library of Congress. It did not provide an answer to a question unasked at the time: “How can the computer improve the retrieval of bibliographic objects?” I don’t want this to be a presentist critique of early computational work done in 1960s; the groundwork for basic computational techniques was quite bare at the time. For example, there was no standard alphabet used in computer systems that came close to the range of glyphs encountered in even the smallest libraries, much less as the basis of a standard for the global exchange of bibliographic data.

But even with the later editions of AACR and the introduction of RDA, we have not had a good reconceptualization of metadata, which is amazing given subsequent developments in computation. Take a simple example. The representation of title pages though transcription has been a staple of cataloging theory for decades, and is rooted in rare book cataloging where we take note of fonts and line breaks in titles. We have simplified the rules over the years, but transcription remains an essential element of our work. If the title page is a thumbprint-styled shorthand for bibliographic identity, why don’t we take a mugshot of it? Let’s photograph the title page, perhaps the verso, and annotate it appropriately? For catalogers and users who are trying to verify if the copy they are holding is the thing they are looking for in the catalog, such images would be efficient and definitive. Title page sanctity might still be a principle, but we can accomplish it though other means. Maybe we don’t need title page sanctity; maybe we don’t need transcription. What is it that we are trying to do?

The varying scopes of application for different metadata standards, the ways they work and don’t work together… those might have provoked questions of what we are doing. But the sense is that each metadata standard (or better, various combinations of metadata standards) went off to its own corner, in libraries, in museums, with graphic materials, or in commercial proprietary systems, Each niche developed its own metadata solutions, and each solution has a niche. Some new application areas might have had options and alternatives, but mostly there wasn’t a whole lot of competition between the standards. We would be crazy to think Dublin Core represents a good solid way to do book cataloging in libraries. RDA and MARC, whatever their issues, are the clear option, and there are no other good alternatives.

Connecting to Linked Data

In this narrative, linked data plays both the villain and the hero. Linked data is one of those applications that stretches the center, perhaps distorting it beyond recognition. But perhaps it gives us the opportunity to rethink the entire enterprise. Linked data appears under the guise of technology, but really it is a way of encoding data with more sophisticated descriptive and retrieval semantics. “Show me books by Italian authors working in American universities.” “Show me medical studies featuring drug treatments for neuralgia.” “Show me the most highly cited papers published by the Modern Language Association.” What are the kinds of retrieval we would like to do? “Show me all the English language translations of novels by Balzac, by year of translation.” What kinds of things do we want to cluster, or to link? “How many books do we have by Canadian women vs. Canadian men?” Library users could audit our collections. Heck, librarians could audit their own collections.

Linked data could provide one of those potential ruptures where we get to invent the future. As we cast about for models, we might consider both Wikipedia and Wikidata as examples of new types of public engagement. I think that is my most significant disappointment with an otherwise strong textbook. Wikidata in particular has grown to be one of the more significant experiments in linked data. It’s approach to metadata is not to predefine a metadata element set, but more flexibly allows users to develop them on the fly. Wikidata is used widely as one of the sources of data used in the quick data overlays used in Wikipedia sidebars and published in Google results, such as dates of birth and death for people. Wikidata is also becoming widely used as the place one can go to reconcile multiple identifiers for various objects, including names of people and organizations in various authority databases.5

But best of all, it enables new kinds of interactions with users. For example, the Wikidata entry for The Rolling Stones is used in multiple Wikipedias. It is maintained by a group of volunteer enthusiasts. Although it is now one decade old, Reagle’s book is an excellent review of these efforts.6 Instead of cataloging book by book as they arrive in libraries across the nation and world, hoping that we get coherent displays in our various shared and local catalogs, we could have small semi-organized teams who organize the materials of Dickens, or Adrienne Rich, with rich faceted displays based on linked data. Of course disputes would arise, just as they do in Wikipedia. But wouldn’t that be more informative than just cataloging books? Right now, basically we wait for the Library of Congress to catalog a book; failing that, the first librarian to catalog the book wins. Is that a good way to allocate expertise? I would rather have a team of Albanian language enthusiasts try to organize the materials that sparks their passion, especially if that team involved assistance or even supervision from a trained librarian. I’ll volunteer to organize the music by Husker Du, at least the independently produced albums. We need more ennobling projects in the common interest in this world.

No fault to Metadata for the shortcomings in the discourse on metadata. Zeng and Qin have provided us with the basic examples and vocabulary so that we can have a conversation. They have documented the general state of the art, and given us some shape of the future through their treatment of metadata research and applications. I think the way forward is shifting away from viewing metadata as kaleidoscope of an array of technical possibilities in computational systems and instead toward developing new possibilities in retrieval and descriptive semantics. A good grounding in the technologies is the beginning of that conversation, and Metadata provides that grounding. The rest is up to us.

Works Cited

1.       Zeng, Marcia Lei. “Metadata elements for object description and representation: A case report from a digitized historical fashion collection project.” Journal of the American Society for Information Science 50, no. 13 (1999): 1193-1208. DOI: 10.1002/(sici)1097-4571(1999)50:13<1193::aid-asi5>3.0.co;2-c

2.       Svenonius, Elaine. *The Intellectual Foundation of Information Organization. MIT Press, 2000.

3.       Glushko, Robert J. The Discipline of Organizing: Core Concepts Edition. MIT Press, 2013.

4.       Pomerantz, Jeffrey. Metadata. MIT Press, 2015.

5.       Van Veen, Theo. “Wikidata: From ‘an’ Identifier to ‘the’ Identifier.” Information Technology and Libraries 38, no. 2 (June 17, 2019): 72–81. DOI: 10.6017/ital.v38i2.10886.

6.       Reagle, Joseph M. Good Faith Collaboration: The Culture of Wikipedia. MIT Press, 2010.

GREGORY H LEAZER

Inordinate Maps of Knowledge from the Bibliographers Guild

Metadata and Metadata