January 15, 2004

Sorting by Date vs Limit to Recent

I was just asked if there was a way to sort the results by date:

No, there is no way to sort by date. We are about to release a function on the advanced search to limit results to works with recent publications. As this interface is not designed to support known item searching, limiting to works with recent publication seems to be the main need for undergraduates to discover the key academic resources for their topic.

One of our design choices to support "performance" is only return the "most relevant" (to the keyword) results, up to a fixed number of editions which collapses to a varying number of title clusters. I fear that sorting by date might create a sense that the result set was complete.

To construct an example, i'll search on "george fox speller," because i know there was a speller written by this founder of the Quaker sect that isn't exactly kept in print. I find four of our title clusters for these. Careful examination will reveal the slight differences in the titles that prevented all fourteen editions from clustering together in one title cluster.

1. Instructions For Right Spelling And Plain Directions For Reading And Writing True English With Several Delightful Things Very Useful And Necessary Both For Young And Old To Read And Learn, by George Fox
6 editions published between 1673 and 1743 in English.

2. Instructions For Rightspelling And Plain Directions For Reading And Writing True English With Several Delightful Things Very Useful And Necessary Both For Young And Old To Read And Learn, by George Fox
5 editions published between 1683 and 1702 in English.

3. Instructions For Right Spelling And Plain Directions For Reading And Writing True English With Several Other Things Very Useful And Necessary Both For Young And Old To Read And Learn, by George Fox
2 editions published in 1769 in English.

4. Instructions For The Right Spelling And Plain Directions For Reading And Writing True English With Several Delightful Things Very Useful And Necessary Both For Young And Old To Read And Lear, by George Fox
1 edition published in 1743 in English.

Now, if i search on "George Fox," limit to the author George Fox by selecting the Author name on the left hand side, i have a list of 144 titles and only one of the speller titles shows up. We're happy with result like this because, assuming our use case, a non-specialist who wanted to learn more about this founder of Quakerism, the first ten results here are excellent, except for an opera by someone else named George Fox.

I can't prevent[*] anyone from inferring that they've discovered all the titles by George Fox (and, in fact, there is at least one instance of this title) but functions like sorting by date and such are best handled by Eureka. There we've made design choices which support the needs of the serious scholar.

[*] Maybe a little yellow caution icon right next to the number of results when we believe there are relevant results not yet returned. I wonder if i can talk folks into adding that. The help text could simply say, "This search was so general that we were not able to retrieve all of the results. Try your search again using the + to require a specific word or quotes to ..." Or maybe only obsessive people who go to the end of a result list need to see this -- so it's at the end of results that are too large....

Honestly, users don't seem to go much past the first page....

Posted by judielaine at 03:48 PM | Comments (0) | TrackBack

January 13, 2004

RLG in EContentMag.com

RedLightGreen gets a brief mention by EContentMag.com when they name RLG as one of the top 100 EContent companies. Regrettably, the "Union Catalog on the Web" concept name has been shortened in press blurbs to "Union Catalog" project., so RedLightGreen becomes synonymous with the Union Catalog. No.

I do like how they said, "(note that eponymous acronym)," about RedLightGreen.

Posted by judielaine at 08:06 AM | Comments (0) | TrackBack

January 12, 2004

ISBN database

This free ISBN database is interesting. We've observed that in our user testing a number of students recognize the ISBN as the "right" way to search for a particular book -- it's their known item search key. (We're not really set up for known item search, but we'll probably tune RedLightGreen to do known item searching using ISBNs only.)

I was at a meeting on Friday that discussed something we couldn't quite label: ONYX for libraries might be one way to express it, bibliographic record enrichment might be another. The LC has some "publisher descriptions" of new books up. It would be lovely if these could be retrieved in XML form boy LCCN or ISBN. Both a project like this ISBN catalog and like RedLightGreen could benefit.

I continue to ponder, though, whether there's a way to leverage this information up to the work-like level. Maybe there's some cut-off: if there are less than N (where N is like 10) editions (or RLG clusters), assume tables of contents and such are useful across all editions. If there are more than N, point to those specific editions for detail.

Posted by judielaine at 11:41 AM | Comments (0) | TrackBack

January 05, 2004

Quiet

We're down until 9 January, loading the 2003 records into the system. This has happened with many, many hitches already. In mid-December, we loaded data into our DB2 database during our overnight outage periods (and ran over into the morning at times). That did not go well, and it appears that some of the data is scrambled. If one accesses a row in some memory range, the database shuts down. We have to fix that, as well as load data into the Recommind database. Under "normal" operating conditions, we should be able to add data while the system is live. Unfortunately, we're not under normal conditions yet.

We intend to be up in time for folks to view the system at the RLG booth at ALA.

Posted by judielaine at 02:19 PM | Comments (0) | TrackBack