IMoKftBG_text-10 — Gregory H. Leazer

What José Martínez Ruiz Taught Me

Technicalities 42(4) July/Aug 2022: 8-11

“How many books does UCLA own by José Martinez Ruiz?”

I have been asking that question for over 25 years in a class I teach at UCLA.

We all know that a catalog should show what is available by a given author. It has been a explicit design objective for catalogs for two hundred years, and we can cite all the giants on the principle, from Panizzi to Cutter to Lubetzky to the Paris Principles to Svenonius to the IFLA Statement of International Cataloguing Principles (ICP). A principle of more recent invention, of significance here, is that authors who work under a pseudonym should have their works entered under that pseudonym. For authors that maintain separate identities for different contexts of use (for example, their works of mystery fiction versus their works of academic discourse), we create two or more separate identities. So there is some not-straightforwardness there, and if we are going to throw in those wrinkles, then we should make sure users understand what is happening.

Very few students ever get the right answer to my question about José Martínez Ruíz, and I confess to them, what I consider the right answer is really just my guess too because who can ever really know the answer to a recall question? I am agnostic about answering things like “how many eggs are there in the refridgerator?” Questions like “how many books are there by Martínez Ruíz in this twelve million book collection” ought to provoke so much anxiety that the words “right answer” should never be uttered.

While we are counting things, let us begin with a simpler but still confounding question. The University of California has ten campuses, so how many online catalogs does the UC system have? You might think optimistically the answer is one (one for the entire UC system), and cynically the answer could be ten (one for each campus). The answer is eleven; one for each campus, plus one for the University of California Office of the President. UCOP is sometimes referred to as the “eleventh campus” and sometimes as “UC Oakland” but it has no students and no library collection. They did maintain an eleventh online catalog called “Melvyl” named after that awful man, but finally sanity prevailed and now we have one catalog for the entire system called “UC Library Search.” Things are definitely getting saner.

But the battle is not proceeding equally well on all fronts, and the area of authority control is one troubling area. Given this is the mechanism by which we show what is available by a given author, we should make sure all is working well. It must be for this reason that this system is called UC Library Search and not UC Library Find.

So What is the Answer?

UC Library Search has a single text box on its web interface, like the web advertising company Google that we librarians sometimes mistakenly think of as competitor. Entering “Jose Martinez Ruiz” into that box results in over 150,000 results (row A in the table below). There is an “advanced search” option where I can limit the query to author/creator, and I get over 12,000 hits (B). Clearly these are not the answers, but can we pause on this for a second. These kinds of results are so routine now that we accept them. But they are not acceptable. As Nicholson Baker said1 over a quarter of a century ago, if you entered a valid query into a card catalog, that is, you looked up our author’s name with the right entry element, you found books by that author. If you used the wrong name, you saw a reference to the “correct” name. On the online catalog here we have some valid hits towards the top of our list, but some bad ones too. One common reply we gave to Baker at that time was the technology was still maturing. But at this point that excuse is expired.

On the sidebar we have the option to limit our results. One of the options there is by author/creator. If you expand that list, about thirty names down, you can find “Martínez Ruiz José” with 39 hits (C). (N.B.: the names without diacritics are how I have entered them as queries. Names with diacritics are as they appear in system displays or in the bibliographic items). It would be nice to have these listed by number of hits rather than alphabetically, or at least have the option. I have done searches where that scroll goes forever. You can also get there by finding one of the records for a book by him, and then clicking on the underlined and linked name, which also gets you 39 hits. About half of the class this year used one of these techniques and said that was the answer.

But that answer would be wrong. There is no clear indication that there could be another answer. But reading the bibliographic records, towards the top of the results, for example, is a citation for “Don Juan : La Andalucía trágica / por José Martínez Ruíz, Azorín.” Also on the sidebar is an entry for Azorín with 26 hits (D). Maybe Azorín is a pseudonym? Not very many students make the connection. Filter by that name, click on a citation like “Con Cervantes / Azorín [i.e. José Martínez Ruiz]” and you might notice that the “author/contributor” data element now reads “Azorín, 1873-1967”. Click on that and you get 403 hits (E). That is the same as an Author/creator exact match search.

Except that filter on the sidebar has not reset; I have essentially searched for “author/creator exact match = Azorín” and filtered the author/creator to “Azorín” (F) which seems redundant. But if I remove that filter I jump from 403 hits to 6,031 hits (G). I do not get it. If I filter for format = books, I get 730 hits (H). If I had time, I would subtract this 403 books from the 730 books, to see if those 327 books were by Azorín. What if I cut and paste what looks like the full access point and give that an exact author/creator search (I)? I get 550 hits, which is real close to what I got last year when searching UCLA’s old catalog, especially when I filter to books, 535 (J).

So:

Search Query Total hits
(A) Plain search = "Jose Martinez Ruiz > 150,000
(B) Author/creator contains = “Jose Martinez Ruiz” = 12,174
(C) Author/creator contains = “Jose Martinez Ruiz” = 39
AND filter = “Martínez Ruiz, José”
(D) Author/creator contains = “Jose Martinez Ruiz” = 26
AND filter = “Azorín”
(E) Click on “Azorín, 1873-1967” as an access point = 403
in a bibliographic record
(F)Exact match Author/creator = “Azorin” = 403
AND filter = “Azorín”
(G) Exact match Author/creator = “Azorín” = 6,031

(H) Exact match Author/creator = “Azorín” = 730
AND filter = “books”
(I) Exact match Author/creator = “Azorín, 1873-1967” = 550
(J) Exact match Author/creator = “Azorín, 1873-1967” =535
AND filter = “books”
I would really like to know what is going on with H minus J, and also with G minus I, but who has the time? G minus I, for example, are things with the creator “Azorín” but minus “Azorín, 1873-1967”. Are all of those books by Azorín? Some of them perhaps?

Who is Azorín?

But just to be clear, Azorín is indeed a pseudonym. We inferred it from a bibliographic record we retrieved when searching for José Martínez Ruiz. There was no explicit reference in the catalog.

There are explicit reciprocal references from one name to the other in each of the LC authority records. While he wrote using both his real name and a series of pseudonyms, eventually settling on “Azorín” but after some significant works. It is not clear to me whether he maintained consistent identities under each name; it seems he did not. The LC authority record seems to concede this with the 667 note “Author primarily known under pseudonym; consider this form as the ‘basic heading’ when assigning subject entry–CPSO, May 23, 2000.” Browsing through both (C) and (J), it looks to me that books filed under “Martínez Ruiz, José, 1873-1967” should have been filed under “Azorín, 1873-1967”.

To sum up, we have split the set of books by the author most commonly but inconsistently know as “Azorín, 1873-1967”, filed some under “Martínez Ruiz, José”, and no mention of his other pseudonyms. The sorting between the two categories does not have an discernible logic, but things that are clearly labeled “Azorín” are placed under “Martínez Ruiz”. That could be right, but without explanation it will confound readers. Then all the explicit coding of data in the authority records is dumped out the window, as the catalog makes no explicit reference from one access point to the other. The reader must infer it, and in my experience about 5-10% of MLIS students make the inference. The MLIS students are excellent, and by comparison to other students, such as undergrads, they are experts in searching catalogs. So the problem is not with the students; it is with the catalog. I feel like I personally have a lot of experience with these systems and I cannot say with confidence how many books we have by this author. (J) seems reasonable, if I add the things not indexed under his name but included in something like (C). I would say we could do a recall and precision analysis, but the path we took to arrive at “Azorín, 1873-1967” in the catalog seems so convoluted and unlikely that doing that idealized analysis would be to engage in an assessment based on a really unlikely scenario.

To understand the relationship between “Azorín” and “Martínez Ruiz” one really needs to consult a source other than the catalog. As in previous examples used in this column, Wikipedia seems to be that source. Searching either name will get you the same entry for him. Under the subheading “Publications” there is a link “Main article: List of works by José Martínez Ruiz” and clicking it give you an entry that includes a long list of titles by him:

Published under the pseudonym Azorín, unless otherwise indicated:

Cándido (1893). La crítica literaria in España. Valencia: Imprenta de Francisco Vives Mora.
Cándido (1893). Moratín. Madrid: Librería de Fernando Fé.
Arhimán (1894). Buscapiés. Madrid: Librería de Fernando Fé.
Ruiz (1895). Anarquistas literarios. Madrid: Librería de Fernando Fé.
...
Ruiz (1900). El alma castellana (1600–1800). Madrid: Librería Internacional.
Diario de un enfermo. Madrid: Librería de de Ricardo Fé. 1901.
La fuerza del amor. Madrid: La España Editorial. 1901.
La Voluntad. Barcelona: Henrich y Cía. Editores. 1902.

Search a name, either one, one click, and you have what looks like a definitive list of publications in chronological order. Some are even liked to Internet Archive sources since these are in the public domain. OK, the catalog is supposed to do some other things too that Wikipedia does not do. But I would trade those away if the catalog would do the things that it is supposed to do and Wikipedia does well.

Three Thoughts

I believe there are three consequences to this prolonged example. First, to do an analysis that mirrors what users have to go through to find answers to easy questions like “what books are available by a given person?” is detailed, speculative and tiresome. There are lots of pathways through the catalog, none particularly definitive, none that provide much assistance in wayfinding, and there is no clear point of conclusion to that activity that provides anything like a recognizably definitive answer.

Second, it is totally dispiriting to think about the authority records I created in my time as a catalog librarian. And it is even worse to think about the time I have spent with students on authority control. The concepts are valid, but the technology is not there. The promise of shared, sane, distributed, usable and used authority data just seems like a plan deferred.

Third, I think we need to re-orient ourselves. We have all been trained very thoroughly on data, and continue to train our students on a cataloging paradigm that is oriented very strongly toward data. We have done this to the neglect of thinking about the kinds of experiences we want our users to have, where those experiences include genuine acts of learning and discovery, perhaps with the side-effect where users and catalogers both discover the provisional nature of data systems, including bibliographic data. Instead of new statements like those from IFLA like the Statement of International Cataloging Principles, our time might be better spent on developing a series of user scenarios, for example, what is supposed to happen when a user enters the name of person into the catalog? We have a lot of bibliographic data from which we can infer some ideas what is possible and what we would like to make possible in our catalogs. Instead of standard data, we should give some thought to the standard services that we would like to provide, and which have deferred for too long. I suggest that when a user enters a person’s name, we provide them with a series of identities along with biographical and bibliographical information that describes and distinguishes them, like on Wikipedia. From there we link to coherent displays of works. I know a lot of this is already in our conversations about cataloging, but it is time we start making actual plans for ourselves and for the benefits of our users, and stop deferring on promises.

Works Cited

1 Baker, Nicholson. “Discards.” The New Yorker April 4, 1994: 64-86.

GREGORY H LEAZER

Inordinate Maps of Knowledge from the Bibliographers Guild

What José Martínez Ruiz Taught Me