The Information Retrieval Continuum


Technicalities 41(3) May/June 2021: 9-13

Welcome back to the third installment of this column in which various thoughts regarding the nature of knowledge and society and the means by which we connect those domains are dumped into a blender. We have not yet reached a creamy blend of those ingredients, so let us delve into the matter again. The first two entries in this column dealt with the pluralization of knowledge and the pluralization of the user, respectively; that is, not the issue of knowledge but the issue of knowledges, and not of the user but of users.

Our basic observation is not simply that we should consider, for example, that the user space is made up of multiple individuals, though the library has been a pretty quiet place in the time of COVID with no social groups. It is rather that there are different kinds of knowledge and users, a notion that is not perhaps so difficult in a field that divides knowledge into domains, and a cursory glance at the topmost organization of the Dewey Decimal Classification—the Tens and Hundreds—shows that we do not simply conceptualize the various subjects of knowledge into a single procession of concepts, but that we group them into kinds and categories that reflect disciplines and potentially other kinds of groupings. And we group users too, into children and adult service populations, or the undergraduate (vs. research) library on college campuses. This is the plurality theory: that there are different kinds of users, or different kinds of knowledge. An element of that theory is that different kinds of people have different information needs.

Three Tenets: Plurality, Political Economy and Political Pragmatics

Where we have plurality, we have at least the potential for politics as well. The political economy theory states that different groups receive different forms of treatment, service or support, often in ways that are not equitable. This theory says in the distribution of goods and services, some will get more than their share, and some less. If you are thinking “I hate microfilm and the library really burdens my community by only providing microfilm access to the journals that we read”, well, you are getting it. I am sure that microfilm is more strongly associated with some bodies of knowledge than others, and that some user communities are disproportionately burdened with that than others, and that is a problem, particularly when it is more than some random effect.

But a theory of political economy must contend a reoccurring homology in social practice, whether that occurs in the realm of information services, or in the structure of knowledge, or access to public education, the delivery of healthcare, or in access to housing or occupations. That homology is what Raymond Williams1 called “intention”, and paraphrasing here, failing to see it is failing to recognize a basic fact of our social reality. The concept of intention says that we see the same basic structure, the same winners and losers, in our various social forms and apparatuses. Our services and knowledge itself is built around a majoritarian core, occupying a central position, from which minoritized positions are differentiated and inferred. “Majoritarian” and “occupying a central position” should not be understood simply as which is the biggest group, but more broadly in terms of social and cultural power. The economy of our attention, our design, is that we develop services for majority populations, and then perhaps consider the needs of others. And when we do consider the needs of others, our understanding is less complete, and services are apportioned according to a model of diminishing returns. And that majoritarian and foregrounded interest comes with many names, but, according to various theories of social justice, the focus of cultural interest is in whiteness and other constructs of gender, sexuality, and distributions of global population. As an example, in our municipal library services, undoubtedly some neighborhoods will have better access to better libraries, others will have poor library services, and some will be marked by the absence of libraries entirely. Winners and losers. And where different kinds of people live in different neighborhoods, which seems to be a fundamental feature of American life2, then we know where the best hospitals, the best schools, and where the best libraries are. I do not like to admit it, but I admit as a kid we called those “good” neighborhoods as opposed to the “bad” ones.

And finally it is worth consideration, even though this is not typically what we mean about the “politics of metadata” or other similar “politics of …” constructions: we can ask, what are the pragmatic political ramifications in our electoral politics and our culture at large? Funding for public schools, for example, has become politicized in the sense that one political party generally supports increased per pupil funding, while another supports the diminishment of such funding and/or its diversion to charter and private schools. I sense hesitancy amongst librarians to go there, but at the national level one can discern clear party tendencies amongst Democrats and Republicans that are for or against federal funding of public library services. Of course that is reductive, and if this were the focus of the essay today, I would backtrack that comment somewhat, but also provide the evidence for it. Eisenhower was an important initial supporter; Clinton gave money but constrained, in important ways, the definition of library services. We can at least ask whether the way we advantage certain user populations and certain segments of knowledge serves one political position over another. Hold that thought.

Information Retrieval

So let us turn to that object of our tender professional affections, the instrument of our institutional mission and aspirations. No, it is not knowledge itself or our user communities. The name of this newsletter is Technicalitiesand its readership is technical service librarians, though interlopers are welcome. Our prior articles have addressed users and knowledge, but positioned between them is the domain of knowledge organization, and its supreme achievement is the object of the catalog. Oh! What mighty labors produced you!

The catalog sits in an intermediary position between users (ideally, of all kinds) and knowledge (ideally, of all kinds), and we can discuss those “ideally” clauses another time as we think about who and what is left out on a practical level. The catalog, Janus-like, structures knowledge in various ways so that users can access it. And vice-versa, it structures users in various ways so that knowledge can fit it. There is a deeper explanation as to what it means to “structure users” that I think I addressed with Patrick Keilty in 2018 [3]. But it can also be explained via Ranganathan’s second and third laws:

1.       Books are for use.
2.       Every reader his or her book.[4]

I like this idea that the librarian, via the catalog, rhetorically asks of each book “who should read you?” And also “with which other books do you belong?” But ultimately the catalog is simply a device that intermediates in media res between knowledge and users.

Following the launch of Google Search in 2018, there appeared a series of articles by leading cataloging authorities about the status of the library catalog.[5] This body of research also built on a series of articles that rhetorically positioned the online catalog as a disappointment6 and hard to use7. Experts were concerned about the usability of the catalog, and the appearance of Google made the catalog appear drab and dull, not just less efficient. Despite the historical prominence of the online catalog, for example, as the basis for sharing very large datasets, database elements with unfixed field lengths and even pioneering the use of alphabetical text in databases, the online catalog was hard to use effectively. And like perusing the Sears catalog, it was simply a list of things you could not have immediately in many online spaces. Google was hot and new—simple to use, and the actual document was just one click away! No more bothersome getting off of your chair and walking across campus to the library!

But first think of what the library catalog does well. While the user experience was always a factor (e.g., “the convenience of the user”), it was and remains foremost an endeavor dedicated to the thoroughness and accuracy of its data. Ooops! I mean metadata. Spare, clean, complete and precise, and based on design objectives of sober ambition like identification and “describe the basic features….” A basic, free, universal service, of global ambition and accomplishment! Seriously, it is a remarkable achievement.

On the other hand, Google has never been about the accuracy of its data. Nor about the thoroughness of its service (all the web pages it can find… SCOFF! Oh, and a lot of journal articles, and almost maybe all the books Google once scanned, but hey! We put all those articles on the web and we collected and cataloged the books that they scanned, and frankly, that was a lot more work). Have you used Google Scholar? Nice searchability, rankings, scope and convenience, but the metadata is terrible. When adding a citation to a manuscript, I often to download citation metadata from Google Scholar and paste it into the document. And when I have a complete draft, I spend hours fixing Google’s metadata. Incomplete, inaccurate, inconsistent styling… seriously, why do I bother downloading from them in the first place?

But Google was never about the accuracy of its metadata, it was about delivering a kind of experience where you could search, rank, explore, and search again all in one session. It always found something no matter what you entered. And it fixed your typos. No wonder librarians, looking at the catalog, grew fond of the Principle of Least Effort. Searching the online catalog was a drag, man, all those no hits and it is so easy to misspell “Tolkien” and not even be aware of it… it is not improbable in the freshman’s eye that this new huge academic library does not have fun books by “Tolkein.” Here’s a new principle: the catalog should never respond with “no results.”

So I think of information retrieval systems as a continuum, with metadata accuracy and standardization on one end, and an engaging experience with auxiliary services like auto-completion on the other end. One is brown rice and steamed unsalted broccoli, and the other is a happy meal. Of course these things are not in strict trade-off, we could maybe have healthy happy meals. But the larger point is that it appears we are in a trade-off situation, with other search and retrieval systems falling along the spectrum. Web of Science: great metadata, lousy coverage and search interface. Facebook’s or Instagram’s news and story matching: endlessly fascinating, and appealing, with terrible searching and dumb metadata.

Those Tenets, Again

A spectrum of information retrieval systems–it sounds almost like I am trying to claim that the various systems of information retrieval exist in a pluralized space, and I am. They vary by kind and nature. And I think they also exist in a realm of political economy. In the popular imagination when one thinks of a search engine or retrieval system, common thought goes to Google as the most definitive mechanism and forms our expectations about how a search engine should behave. This is a contentious issue. For decades, the performance metricpar excellence of retrieval systems has been the relevancy-based measures of precision and recall. But for the various platforms, from Youtube and Twitter, the metrics are media-based and include audience size, time on the platform (or channel in old new media parlance), and ultimately advertising revenue. “The United States is a nation adrift as truth is drowned in a sea of irrelevance” Huxley once said, and we could amend it by adding “spectacle” to the claim. Our plurality of systems is unfortunately judged by the criteria of success defined by the dominant player in the game. The criteria are not defined by a wide-scale civic interest in truth and accuracy, rather it is designed by corporate media interests. Speaking of search engines, Introna and Nissenbaum state “private ownership of communications technology [is] the single most important and consistent historical policy position that influenced the course of telecommunications development.” [8] That’s politics.

Introna and Nissenbaum also state that any ranking system “dictate[s] systematic prominence for some sites, [and] dictat[es] systematic invisibility for others.”[9]. I have cited Noble [10] in this regard before: in her example of searching for “black girls” on Google, the system responded primarily with citations that represented a kind of literature: pornographic representations of the query subject. Other kindsof knowledge pertaining to the topic were disregarded, such as health information, or poetry by Black girls. That structuring and hierarchicalization, to the extent that one body of literature occludes another…, that is the very defintion of political economy: some users or some kinds of knowledge matter more than the others, even to their exclusion. The lack of ranking in library systems, for which they are often faulted, is, in this perspective a strength, though it comes as a burden to users. Library systems retrieve everything that is a match, and require users to sort out the results. Simple match, no popularity-base metric that is intended to keep your eyes on the site. No progressively alarming “you might be interested in these additional books” like “Meghan Markle reveals her deepest secrets!”

And this issue, as current newsreaders are aware, has had an effect in the realm of practical party politics. While liberals can complain about the concentration of media ownership, the privatization of searching and market-oriented metrics for the assessment of these systems, conservatives have also given complaint. Google News, for example, has been accused of the preferential retrieval of news sources deemed hostile to the conservative agenda. Of course I believe the story is more complicated than that, but there is the basic complaint.

But of this, people across the political spectrum can agree: information retrieval is political, and is having practical political effects.

Works Cited

1.       Raymond Williams, “Base and Superstructure in Marxist Cultural Theory,” New Left Review, no. 82 (November 1973): 3-16, p. 7.

2.       George Lipsitz, How Racism Takes Place (Philadephia: Temple University Press, 2011) and Bill Bishop, The Big Sort: Why the Clustering of Like-Minded America Is Tearing Us Apart (New York: Houghton Mifflin Harcourt, 2009).

3.       Patrick Keilty and Gregory Leazer, “Feeling Documents: Toward a Phenomenology of Information Seeking,” Journal of Documentation (2018).

4.       S.R. Ranganathan, The Five Laws of Library Science (Madras, London: The Madras Library Association, 1931).

5.       See, for example, Karl V. Fast and D Grant Campbell ““I Still Like Google”: University Student Perceptions of Searching OPACs and the Web," Proceedings of the American Society for Information Science and Technology 41, no. 1 (2004): 138-46; Karen Markey, “The Online Library Catalog: Paradise Lost and Paradise Regained?” D-Lib magazine13, no. 1/2 (2007): 17-30; and Deanna B. Marcum, “The Future of Cataloging,” Library Resources & Technical Services50, no. 1 (January 2006): 5-9.

6.       Nicholson Baker, “Discards,” New Yorker 70, no. 7 (1994): 64-86.

7.       Christine L. Borgman, “Why Are Online Catalogs Hard to Use? Lessons Learned from Information‐Retrieval Studies,” Journal of the American Society for Information Science 37, no. 6 (1986): 387-400 and Christine L. Borgman, “Why Are Online Catalogs Still Hard to Use?” Journal of the American Society for Information Science 47, no. 7 (1996): 493-503.

8.       Lucas D. Introna, Helen Nissenbaum, “Shaping the Web: Why the Politics of Search Engines Matters,” The Information Society 16, no. 3 (2000): 169-85, p. 170 (emphasis original).

9.       Introna and Nissenbaum, “Shaping the Web”, p. 171.

10.    Safiya Umoja Noble, Algorithms of Oppression: How Search Engines Reinforce Racism New York: NYU Press, 2018).