IMoKftBG_text-14 — Gregory H. Leazer

New Kinds of Queries

Technicalities 43(2) March/April 2023 : 9-12

We librarians and library users have a pretty well-developed sense of expectation when it comes to catalog use. We know what kinds of queries we can enter, and for all but the most naive users, we have some pretty well developed sense of expectation on retrieval results. This is not to claim that our displays, records or record-ordering make much sense to most users, but everyone has some expectation of basic features: you can ask for a book by author or title, and you will get records back. Topic searching is a little more difficult. That displays are supposed to group things by work is almost impossible. Much of the detail in bibliographic records is occult. The source of error is in poor user understanding of the system rather than errors in metadata or in poor software design.

Most users, no doubt, have developed this sense of expectation through years of experience, first as children, later as adults. That they are only taking advantage of a portion of the intended service is a constant focus of this column. But I want to turn from making better use of our service, and toward the prospect of developing new services.

Catalog Functions, Defined

User services provided by the catalog have changed little in the North American context since Cutter’s 1876 objects at the turn of the last century. In the intervening years there was clarification and new emphasis placed on the concept of the work in the development of cataloging codes, especially is the efforts by Seymour Lubetzky and the generally more formulaic work of the International Conference on Cataloging Principles held in Paris in 1961. The Functional Requirements for Bibliographic Records (FRBR) group, convened by the International Federation of Library Associations and Institutions (IFLA) in 1998 began with an examination of catalog functions, creating a conflict that caused Elaine Svenonius to withdraw from the group. The FRBR objectives are “generic user tasks”: 1

•         to find entities that correspond to the user’s stated search criteria (i.e., to locate either a single entity or a set of entities in a file or database as the result of a search using an attribute or relationship of the entity);

•         to identify an entity (i.e., to confirm that the entity described corresponds to the entity sought, or to distinguish between two or more entities with similar characteristics);

•         to select an entity that is appropriate to the user’s needs (i.e., to choose an entity that meets the user’s requirements with respect to content, physical format, etc., or to reject an entity as being inappropriate to the user’s needs);

•         to acquire or obtain access to the entity described (i.e., to acquire an entity through purchase, loan, etc., or to access an entity electronically through an online connection to a remote computer).

It is not clear why Sveronius felt compelled to quit the FRBR team. She published her book The Intellectual Foundations of Information Organization2 in 2000 after FRBR in which she critiques and revises the FRBR functions. She replaces the “Find” objective with a longer “Locate” function that “restores the finding-collocation distinction”: 3

1.       To locate entities in a file or database as the result of a search using attributes or relationships of the entities: 1a. To find a singular entity—that is, a document (finding objective) 1b. To locate sets of entities representing All documents belonging to the same work All documents belonging to the same edition All documents by a given author All documents on a given subject All documents defined by other criteria.

Furthermore she appends a new “navigation” function to the objectives: 4

• Tonavigate a bibliographic database (that is, to find works related to a given work by generalization, association, or aggregation; to find attributes related by equivalence, association, and hierarchy).

The strength of Svenonius’ statement is that it more explicitly states the kinds of entities that can be discovered either by searching by attributes or navigating by relationships. Those entities are documents but also sets of them that represent a work, edition, author-set or subject-set. Svenonius is always concerned about sets, which is both the strength of her approach and its weakness. Curiously neither FRBR nor Svenonius state which attributes may be used for search criteria; the last clear statement on attributes came in the 1961 Paris Principles, which, like Cutter, states that the user should be able to search by author, title, their combination, or “a suitable substitute for the title.”5 Svenonius does include two chapters on attributes of works and documents (chapters 6 and 7), but these are primarily descriptive attributes and do not appear to be the same kind of attributes that users enter as the basis of their search query as stated in Svenonius’ locate objective.

IFLA Responds, and Mostly Agrees

IFLA, for its purposes, decided to pull out the basic search functionality into a separate document in the BIBERAME/Library Reference Model, which makes some sense, even if that statement in its most recent form emphasizes continuity with previous statements on catalog function, though it calls them “user tasks” rather than “catalog functions.” It does grapple with Svenonius’ navigatefunction by adding an explore function in its spot: 6 IFLA LRM p. 16

Theexplore task is the most open-ended of the user tasks. The user may be browsing, relating one resource to another, making unexpected connections, or getting familiar with the resources available for future use. The explore task acknowledges the importance of serendipity in information seeking.

Despite the revision, the authors do not seem particularly comfortable with the exploretask. Perhaps it’s because the other functions—find, identify select and obtain—realy are catalog functions, as is navigate. Explore is less of a function or task, and perhaps more of a goal or objective. We will need to sort this out at a future time.

Despite the revisions, it is hard to spot the area of conflict between Svenonius and the FRBR authors, or amongst any of the various formulations because none of them demonstrate any significant depature from previous statements on catalog functionality, especially at the level of typical user experience and purpose. Despite the revisions, the profile on catalog functionality is pretty much the same from iteration to iteration. The revisions clarify the work retrieval function, or that we are sometimes retrieving individual items and other times sets of items. But these are all refinements—one can imagine that nothing really has to change in the catalog as a result of these statements. I do think that the navigate function (and to a lesser extent, explore) entails a catalog design that uses various bibliographic relationships and associations expressed perhaps as hyperlinks within the bibliographic record. That is the only new catalog design element that does not appear in Cutter, at least explicitly.

So why the fuss? I guess that is the whole point, that we as catalogers fuss over minor design elements, but we fail to think expansively about what it is that we really want the catalog to do, described as user interactions and outcomes. One of those new functions would be to support new kinds of queries.

What I Learned at the Datathon

A number of museums have recently engaged in a series of data explorations with the public by providing data about their collections to interested users and convening them in a public meeting. The National Gallery of Art (NGA) in Washington D.C., for example, held a datathon in October, 2019.7 The data was used to answer questions like “could images be grouped by their content to form meaningful groupings?” and “what kinds of art were acquired under the leadership of the various NGA directors?” Various teams wrote thir own questions and provided analyses and illustrations to answer those questions.

There was certainly an ad hoc element to the questions as groups has access to a broad array of collection data, and then asked what kinds of findings could be supported by the data. But some of the participants had a clear idea of significant questions that could also be supported by the kind of data we collect in our catalogs but do not typically query support. The questions they asked would make sense, with some translation, in the context of our libraries, like:

•         what are the gender or race distributions of authors in various subcollections (for example, poetry, artists books, sociology)

•         what types of materials (books, journals, digital resources) are collected, over time?

•         are there any fields or materials (for example, memoirs) that are initially promulgated by women, that are subsequently dominated by men?

•         what are the geographic sources of publication for various scientific fields?

Questions like this could reveal interesting patterns in cultural production overall, and also reveal biases in individual library collection practice. Answering these questions in the library would provide authoritative evidence of cultural predilections, in part because libraries have collected evidence on millions of books and other resources distributed not only over years, but decades and centuries. An example of these kinds of analyses is presented in “Diversity of artists in major U.S. museums”8 which grew out of the NGA datathon.

What Would It Take?

The data to answer these questions are largely in our catalogs, or at least it could be with the simple additions to our metadata programs. However, the challenging part will be, once again, to challenge librarians to consider the uses of such data as motivation for providing it. This is why I was so disappointed with the recent decision of the Program for Cooperative Cataloging’s Policy Committee to endorse the “Revised Report on Recording Gender in Personal Name Authority Records.”9 I understand the objections to sharing what may be in some cases private information regarding a person’s gender identity, and the admonition to not guess at an individual’s gender. And I also understand that people feel quite reduced and exhausted at debating this policy, and that we are not likely to revisit the policy any time soon. But the policy has the effect of limiting our ability to conduct these kinds of comprehensive bias audits of our culture and our collecting institutions with high quality data.

Answering these kinds of questions would also require a change in technology. As I have said, the data is there (or could be there) to support these queries. But our technology does not largely support it. The MARC format is surprisingly resilient to its us in open software setting—it is generally a difficult tsk to download 100,000 records on artist books and put that data into a format can be used by statistical software. And our online catalog just simply does not integrate authority data with bibliographic data even in routine use, much less a system that would channel the thousands of author names into an authoity file and look up the data associated with each of those names. Finally we could integrate catalog data with other kinds of system data such as anonymized circulation data, or with data drawn from other open system like Wikipedia or Wikidata. Such efforts would require opening up application programming interfaces (APIs) in our catalogs.

There has been relatively little development in catalog services since Cutter’s objects were originally published nearly a century and a half ago. The digitization of bibliographic data has largely been limited the restricted functionality of the USMARC format, which has proven to be a barrier to access and the public use of bibliographic data beyond the immediate tasks and functions of the the most basic catalog-based retrieval services. It is time to expose our metadata in support of new kinds of queries.

Works Cited

1.       IFLA Study Group on the Functional Requirements for Bibliographic Records. Functional Requirements for Bibliographic Records: Final Report. Munich: K.G. Saur Verlag, 1998, p. 82. doi.org/10.1515/9783110962451

2.       Svenonius, Elaine. The Intellectual Foundation of Information Organization.Cambridge, MA: MIT Press, 2000. doi:10.7551/mitpress/3828.001.0001

3.       Svenonius, p.18.

4.       Svenonius, p.20.

5.       International Federation of Library Associations. Report: Proceedings of the International Conference on Cataloguing Principles, Paris, 9th-18th October, 1961. (London: International Federation of Library Associations, 1962), p. 91.

6.       IFLA Functional Requirements for Bibliographic Records (FRBR) Review Group. IFLA Library Reference Model: A Conceptual Model for Bibliographic Information. (International Federation of Library Associations and Institutions, 2018), p. 16. https://repository.ifla.org/handle/123456789/40.

7.       "National Gallery of Art Collaborates with Researchers to Analyze Permanent Collection Data." National Gallery of Art, 2019, accessed Jan. 15, 2023, https://www.nga.gov/press/2019/datathon.html.

8.       Topaz, C. M., B. Klingenberg, D. Turek, B. Heggeseth, P. E. Harris, J. C. Blackwood, C. O. Chavoya, S. Nelson, and K. M. Murphy. "Diversity of Artists in Major Us Museums." Plos One 14, no. 3 (Mar 20 2019). DOI:10.1371/journal.pone.0212852.

9.       PCC Ad Hoc Task Group on Recording Gender in Personal Name Authority Records. Revised Report on Recording Gender in Personal Name Authority Records. (Program on Cooperative Cataloging, 2022). https://www.loc.gov/aba/pcc/documents/gender-in-NARs-revised-report.pdf.

GREGORY H LEAZER

Inordinate Maps of Knowledge from the Bibliographers Guild

New Kinds of Queries