October 24, 2003

Quarter Life Crisis's review -- part II -- LC Subject Headings

In Quarter Life Crisis's section Refining, Prost writes:

The catalogue also supports refining searches, which is a very good idea and can be very helpful. Unfortunately the Subjects provided seem very random. When searching for Vector Bundles Why can I choose Vector Bundles – Congresses but it’s not related to Vector Bundles? Why can’t I simply limit my search results to a broad area like Mathematics, say? For example in this search, all the top results are maths related. Still the top Subjects to come up are completely unrelated.

The "Refine Search" section is based on the primarily LC Subject headings. It's tempting to believe these are hierarchical. They're not. They're *linear*. They are designed for *DRAWERS*. ANALOG. Little cards.

(deep breath -- sorry for the rant)

We spent MONTHS trying to use Recommind's automated classification to create something more intuitive. We tried the fragments of the subject headings, we tried using Dewey and LC Class Numbers. (Designed for collocating *books* on *shelves.* Physical object. One to one relations. More ANALOG.) It didn't really work. We had problems with granularity, problems with antique classifiers (where books on contemporary Iraqi politics have Dewey class numbers that put them "under" Archaeology -- which makes sense if you want your books on Mesopotamia and Ancient Persia to be near Iran and Iraq), and problems generating labels.

In the end, we do a rapid analysis of all the subject headings associated with the first hundred works that are returned. Catalogers gave subject headings of Vector Bundles – Congresses with the understanding that they'd be right behind the books on subject cards for Vector Bundles. They knew that the researcher intent on discovery would keep flipping back through the cards. No point on creating two different cards for the user to flip by, was there? And there was certainly no point of labeling this book "Mathematics" -- only books that address the whole broad subject area would get a subject heading like that.

...rambling about an example use and some bugs and forthcoming fixes...

I don't know much about vector bundles, so i'll switch to something i do: halo nuclei. The disambiguation problem is clear to me in this case. There are books about the structures of galaxies -- halo galaxies and the nuclei -- the thick centers -- of galaxies. Not my interest. I want halo nuclei -- stable nuclei that are believed to have a density distribution that falls off much less rapidly than the lighter isotopes of the same element. (Usually the next lighter nuclide is not stable.) Many of these elements are important in nucleosynthesis, though (which occurs in stars).

These "related" subjects, then, are the frequency ranked subjects assigned to the titles in the result set. If i'm looking for halo nuclei in the context of nucleosynthesis, i can limit by astrophysics and then check the refined list of subjects. I notice a bug i thought we'd gotten rid of -- astrophysics remains as "refine search by" subject. I check all the remaining subjects to see if i can limit on nucleosynthesis -- i can.

We're planning on changing the "show more subjects" listing to an alphabetic list. The current list puts the most frequently occurring at the top. Without numbers, it's hard to understand the ordering. [We don't display any numbers because we do the analysis across the first one hundred results -- so if twenty-five of the first one hundred results have "astrophysics" as a subject, i might get forty results when i select that limit.]

Posted by judielaine at October 24, 2003 10:14 PM | TrackBack
Comments