October 24, 2003

Quarter Life Crisis's review -- part I -- diacritical comments

I was warned that starting a blog while managing a project might be a bit demanding -- and i haven't been as attentive to this as i have been to the project itself. So it goes.

I just had someone send the link to Sven-S. Porst's blog to me. It's the first substantial critique of the system I've seen outside our feedback mail, so i'll begin addressing some of the points brought up there.

Regrettably, our system is session-bound. We will likely restructure the session handling and system URLs once we know where the pilot system is headed, but so much of the query handling is in temporary relational database tables that links to result lists and so on may not be supported anytime soon. The most critical issue is getting session-free links to editions. So, unfortunately, no one can see the carefully linked examples in Quarter Life Crisis. My regrets.

Umlauts. Oh, ack. Actually, this is very easy to have all muddled unless you force all your users to use the same platforms and same clients. On my work PC running some version of Mozilla (not Firebird), I'm able to cut and paste handling of umlauts and Chinese scripts. I know i've tested using "Küng" and the Chinese character for Mao. I also know, for some reason, i can't cut and paste those Chinese characters on my OS 10.2 Mac running Firebird Mozilla -- even though they display beautifully. I just now tested using "Küng" -- *sigh* -- "no results." Off to feedback for this problem

Merged Records Also known as our FRBR-like title clustering. This is flawed, and is going to be very hard to get better without losing much of the positive aspects altogether. Porst's point is to examples of bad/sloppy cataloging (my guess, without holding the book in my hand). I'm trying to replicate Porst's search -- I believe it was "do carmo manfredo differential." I note these two results -- and have indicated in bold the distinguishing factor:

2. Differential Geometry Of Curves And Surfaces, by Manfredo Perdig Ao Do Carmo
7. Differential Geometry Of Curves And Surfaces, by Manfredo Perdigao Do Carmo

If i go to RLG's Eureka and search on ti=Differential Geometry Of Curves And Surfaces and pn=Carmo, i find three records. All three REALLY are the same edition: "Englewood Cliffs, N.J. : Prentice-Hall, c1976."

The largest cluster of records (and do go read about RLG's clustering at this point) gives a dimension of 24 cm and displays the tilde a as "~a." I will not digress to give a lecture about the age of the union catalog and some of these records, about the fact they're in EBCDIC with some character codings developed in house because UNICODE did not exist yet. See here instead. Suffice it to say that, over a quarter century, catalogers have struggled to key in non-ascii data with varied success.

So the largest cluster has 39 holdings. The next largest has 2. I am struggling to see what the difference is. I remain mystified about the purpose of LCCNs (despite innumerable lectures) -- the LCCNs differ between the two records, and, yup, the LCCN is used to distinguish between editions. Someone dropped the 0 in LCCN: 75-22094. The third cluster has only 1 holding. Here the name is spelled with a "ã" and the dimension of the book is given as 23 cm. The spelling of the name does not affect the RLG clustering -- but the difference in the physical description kept this one edition from clustering with the 39. The correct expression of the tilde-a, however, was normalized correctly as a "a" whereas the incorrect expression of the tilde-a in the largest cluster was normalized to a space-a. Thus, the two appear to have two different authors and did not form the same title cluster.

Well, it's nice to fob all that off on poor cataloging, bad data, GIGO. There are places we can improve on the title clustering, even with good data. The first result if you search on Inferno can point that out.

As far as the German translation "missing" -- it's not in our Union Catalog. Unfortunately, if it was, it would likely not cluster into either of those titles because it is unlikely a cataloger would have established a uniform title for the work. I am NOT a cataloger, but i believe that the uniform title for this book should be the original Portuguese title. (Caveat later.) Thus every cataloged edition should have the title page title (in English or German) in the 245 field, and the Portuguese title in the uniform title field. However, catalogers have to make effort judgements. I am not surprised that for a mathematical text the uniform title has not been used. It is not used, in general. I do regret that the German translation isn't present. I wonder about the original Portuguese -- since the note on the English edition says that this is a translation "of a book and a set of notes, both published originally in Portuguese," whether the published book and separate(?) notes are among the five Portuguese works RedLightGreen displays for this search. (And if the English translation is of two separately and previously published works, the rules for uniform titles get a little out of control. And i won't begin with 7XX added entries.)

Posted by judielaine at October 24, 2003 06:15 PM | TrackBack
Comments

Thanks for your detailed reply

Sessions: I didn't notice that unfortunately.

Umlauts: Once you go UTF-8, things should work pretty well in most modern web browsers. I had the impression that this is mostly a server side problem.
Actually I couldn't resist trying now, and searching for 魚 (fish) was successful in Safari, Camino (Mozilla derivated browser), and Opera on the Mac. It even searched successfully in IE/Mac as well but the text re-appearing in the search field was broken there.

The rest sounds like it's more about cataloguing that I really wanted to know - Quite complex.

I didn't want to claim this is easy. I just found the do Carmo example quite interesting as I have seen the English, Portuguese and German editions in libraries. (In fact RLG seems to list a Spanish edition as well). However, the German edition, while claiming to be the same book, is actually lacking the last chapter.

It seems next to impossible to know all this information for any book and then present it adequately. Still, it would be very handy to use it.

Posted by: ssp at October 25, 2003 11:29 AM