Principles of Authority Control?
Technicalities 43(3) May/June 2023: 1, 9-12
I have written several times on the topic of authority control in this column, mostly in the tone of Marlowe in Heart of Darkness, as “a mysterious arrangement of merciless logic for a futile process.” Or perhaps quoting Kurtz is more direct though lacking specificity: “The horror! The horror!”
I have pointed out, following the lead of many others, particularly Sanford Berman and Hope Olson, that using majoritarian language can lead to bias and systems of exclusion. I also credit Hope Olson with the idea that we should interrogate some of the basic commitments or assumptions that are enshrined in our authority control practices. The critique all aims back to a crucial moment, made in the 19th century, to designate one name for an individual, organization or topic as “authoritative” and relegate all alternatives to secondary status. Merciless logic. And then to dump those alternatives into an authority file that is poorly integrated into the catalog so that users cannot have access to those names. Futile process.
Rereading that paragraph, I think lots of people, within our profession and without, will consider that the main problem with authority control is the lack of updating archaic terms and names, and that over time these names have become offensive, especially when we consider names for various kinds of peoples. I have at various times in this column argued the need to expand our thinking beyond the “nomenclature” problem. But rather than inveigle once again on this topic, I want to try a fresh angle. I want to come up with a set of principles for the construction of naming systems within knowledge organization and beyond so that we can consider whether we are exercising our professional judgement and responsibilities as protectors and guarantors of public knowledge, intellectual liberties, and developing services that serve as models of respect for persons including an inferred right to privacy.
Such a set of principles ought to be firmly grounded in ethics. Like much of the best work on the conceptualization of information and information services, those principles and ethics evolve out of a continuing set of professional library practice, unfolding for over a century in the United States and even longer elsewhere. Those practices have been strongly grounded in concepts of intellectual freedom and guaranteed by publicly supported organizations at the municipal, state, and national levels. More recently our frameworks have grown to draw more explicitly upon concepts of equity and justice, and are rooted in conceptualizations of information as culture and knowledge, and a continuing commitment to literacy, even as that concept has evolved and expanded.
But professional librarianship is just one source of inspiration for our set of principles. At UCLA the Department of Information Studies draws together threads from various sources like archival work, special collections traditions, data studies, digital humanities and social sciences, along with what used to be called “information science.” We can look to other fields of information to complement our own field.
The fields of data studies and artificial intelligence have in recent years become areas of active investigation for ethicists because of a series of confounding and morally ambiguous actions that have occurred in social media, the commercialization of user data, and exposures of consumer and citizen data. The networked nature of data and information work, including not just the various members of an organizational team but also coordination across multiple agencies, has also been an area of contemplation. How do you assign responsibility–either credit or fault–when actions are taken in general coordination (such as establishing and following a standard) but the individuals have never met each other? In large scale systems, the individuals might not even be aware of the actions taken by others, or which organizations have contributed to products.
Much of the discussion in data ethics will not be applicable to the kinds of information service with contemplate in libraries. Issues on the division of profits, for example, do not typically arise amongst non-profit organizations. But legal liabilities do. I wish we shared more of a collective set of responsibilities for providing high-quality professional cataloging services too.
What The Artificial Intelligence Crowd Has To Say
Floridi and Cowls1 have done us all a solid favor by creating a unified set of principles for artificial intelligence (AI) work. Building on his work on data ethics,2, 3 Floridi and Cowls compiled a list of five principles from a meta-analysis of AI ethics stake-holders. Those principles are: [p. 6]
1. Beneficence: “The prominence of beneficence firmly underlines the central importance of promoting the well-being of people and the planet with AI.”4
2. Non-Maleficence: This principle includes preventing infringements on personal privacy.
3. Autonomy: This principle is about the power to direct action, and the balance of decision-making between humans and machines.
4. Justice: “AI should promote justice” and “reduce the risk of bias”.5
5. Explicability: The actions of AI systems should be interpretable, transparent, and those designing AI systems should be held accountable for “the good or harm [the AI] is actually doing to society.”
How might we apply these principles to our work in authority control and the problems with our current system designs? A well-designed authority system would undoubtedly benefitsociety by improving effective and engaging retrieval systems. Organization, such as placing texts into meaningful sets, is the backbone of a retrieval system, and authority control is the principle means of creating that organization. The catalog, however, has lost its status as a foremost tool for retrieval not because commercial search systems are so much better, but because catalogs are so poor. Consider personal name searching on Google, for example. Effective set-making has three components: the meaningful assignment of items into sets, the arrangement of objects within the set, and the relationships amongst the sets. An author (etc.) search on Google often identifies texts by the person but cannot distinguish texts by a person and those about a person. It does a poor job of distinguishing between multiple names for a person, or the relationships between any one person and other people. Authority control, properly implemented, could be vastly superior to the services provided by Google. But perhaps effective authority control is unrelated to public beneficence? We have operated for decades under an assumption that good authority control contributes to good catalogs, and good catalogs contribute to public benefit. But we have been relatively unconcerned about the inverse, that is, whether poor authority control contributes to poor catalogs, and therefore fails to contribute to well-being.
Non-Maleficence
The principle of non-maleficenceis violated when our systems can be used to actively harm people. We have contemplated one example of this possibility as we explored the possibility of recording personal characteristics such as gender and deadnames for living individuals who chose to keep that information private. Such concern has led us to ban the collection of such information for all such individuals out of a concern for the potential harm it could do. I do not quite understand the justification for the universal prohibition unless the identification of gender contributes to a larger system of gender discrimination, but I think that is contradicted by the need, for the purpose of representation generally, to recognize the contributions of historically marginalized and oppressed peoples.
Non-maleficence is also the justification we use to describe the aforementioned nomenclature problem. People ought to be addressed by non-pejorative, unimposed names for themselves. And if different communities of users have different names for the same phenomenon, designating one name as “authorized” and the other as “unauthorized” promotes the language of one group over the others. And if our syndetic structure is poorly implemented, then communities who use the unauthorized term are excluded from the full use of the catalog.
We might also consider the use of authority data in commercialized or legal spaces. Could our authority control designs and data be used to, for example, persistently identify people across surveillance systems? The possibility of this seems low, as we control primarily public figures who participate in the world of print through authorship and publication. But such a concern indicates the need for ongoing possibility of derivative uses of our designs and data, and our roles as stewards of public data and privacy.
Autonomy
Autonomy in AI settings refers to balance and control of various human and machine processes. While authority systems do not exert autonomous decision-making, we often find ourselves yielding our individual human judgement to a system logic of “this is how we do things.” The merciless logic that confronts us as professionals is that incrementally, over time, we have ceded control of catalogs and authority control to online systems that are poorly designed. We behave as if our responsibility was dedicated solely to well-crafted metadata and ended somewhere prior to the actual implementation and use of that data. As a result, we have moved into catalog systems that use our data poorly. To a lesser degree we have also conceded control over metadata design per se. We have struggled to implement Resource Description and Access, much less a more substantial revision that centers the work and replaces MARC with a more flexible and open data standard. Cataloging has confronted the issue of bureaucratic lethargy over its own operations in the past, such a problem motivated the work of Andrew Osborn [7] and Seymour Lubetzky.[8] Autonomy could also require higher levels of professional training for users who create and enter metadata into our shared systems.
Autonomy might also refer to the possibility of opening authority systems to wider scope and participation, and the subsequent need to balance decision-making and the roles of various participants. If we envision, for example, an open and expanded system like that used by Wikipedia, then I think we need to make sure we create and maintain an open governance model that centers librarians and their professional responsibilities as guarantors of user privacy and the accuracy of our data. One concern for such a governing body would be to guarantee, for living and recently deceased people, that data recorded about them would be accurate and included only by their permission. To date, we have not yet created such an effective governance model for authority data and have conceded much of the responsibilities to the Library of Congress which continues to operate with a myopic institutional concern rather than one that is dedicated to the profession at large and balances out its decision-making apparatus in alignment with an open and participatory professional model.
Justice
Much of our current work, particularly with the nomenclature problem, is centered on justice. A theory of justice that operates on a premise that society is made up of different segments, such as one that derives from Rawls9, and the idea that special action must be undertaken to assist the least advantaged members of society. Instead, our naming systems have promulgated and reinforced social hierarchies by conferring disadvantage on marginal members of society and require high levels of information literacy in order to use those systems effectively. We can navigate around these issues, and we have argued in the past, by developing systems that treat all names for an individual or organization as comparable and using an arbitrary designator as the official entry for those entities. An appropriately designed naming system could promotediversity and human solidarity, rather than reinforcing linguistic hierarchies such as favoring, for example, poorly justified scientific and academic jargon in topical subject names.
Explication
Finally, explication is a concept that should be more fundamentally integrated into our professional responsibilities. We need to make our decisions—from the minute data provided in individual authority records to overall system design and use—more transparent. Floridi and Cowls indicate that our systems need to be both intelligible and accountable.10 I have often looked at authority records and wondered by what logic did we decide to prefer one term over another alternative, or why did we establish multiple bibliographic identities for an individual that, to my naive eye, was using a pen name in ways that appeared inconsistent. As we record our various decisions in authority records, we need to provide the evidence that we used to arrive at our various decisions, in ways that exceed the “source found/not found” notes that appear too rarely in our authority records. Instead, we have unfortunately built a confounding system, wrapped in bureaucratic red tape that makes it difficult to revise authority records as well as incrementally improve our systems. Instead of system improvement, much of our recent work is aimed at trying to demonstrate the brokenness of our systems to garner the attention needed to actually make changes.
There is more here to ask. In a future column, we will need to further consider how these principles can be translated into actual practice. And we need an additional assessment on whether these principles are sufficient and complete. I am less concerned about whether these principles originate with the AI community, but I do worry that they assume that the work is being done for profit and are aimed at balancing financial motives and social consequences. Our principles ought to clearly favor public benefit over private profit. That is one principle of librarianship that we thankfully have not encroached upon over the years.
Works Cited
1. Floridi,
Luciano, and Josh Cowls. “A Unified Framework of Five Principles for AI in
Society.” Harvard Data Science Review
2. Floridi, Luciano, and Mariarosaria Taddeo. “What Is Data Ethics?”. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374, no. 2083 (Dec 28 2016): 20160360. doi.org: 10.1098/rsta.2016.0360.
3. Floridi, Luciano. The Ethics of Information. Oxford University Press, 2013. doi:10.1093/acprof:oso/9780199641321.001.0001.
4. Floridi and Cowls, “A Unified Framework…”, p. 6.
5. Ibid., 7.
6. Ibid., 8.
7. Osborn, Andrew D. “The Crisis in Cataloging.” Library Quarterly 11, no. 4 (1941): 393-411. doi.org: 10.1086/615055.
8. Lubetzky, Seymour. “Capital Punishment for Catalogers?” Library Quarterly 10, no. 3 (1940): 350-60. doi.org: 10.1086%2F614781.
9. Rawls, John. A Theory of Justice. Cambridge, Mass.: Belknap Press, 1971. doi:10.4159/9780674042605.
10. Floridi and Cowls, “A Unified Framework…”, p. 8.