December 14, 2004

Google Print: Library

With today's announcement by Google, Google checks out Library Books, Jim Michalko has a response, "Welcome the elephant to the room" that addresses how

Since April 2003 RLG has had a cooperative agreement with Google in order to explore the impact of making library materials and bibliographic research information more accessible and visible to the general population of interested students, faculty, and information seekers through deeper connections with the range of Google services. We have discussions in progress regarding the new opportunities presented by the expansion of Google Print that we hope will add to these services' usefulness to the academy.

There's also a sentence that addresses my almost complete focus since April:

RLG 2.0: RLG's massive reengineering process to move all of its applications and databases off the mainframe moves into high gear. RLIN cataloging tools are substantially updated and released as RLIN21™, in a nod to MARC 21. The new hardware and software environment is installed. DB2 designs and processes, successfully explored while creating RedLightGreen, get seriously tested and expanded in the course of creating the new databases for experienced researchers and technical processing staff.

Rather disenheartening to see it in just so many words. But so it goes. Once we get this new DB2 version of the Union Catalog done, we will begin redoing RedLightGreen's database to improve speed and facilitate more elegant connections between editions, tweaks to ranking, all sorts of things i'd like to tweak!

Posted by judielaine at 12:47 PM | Comments (0)

March 03, 2004

Our latest official news

                                                                      
RLG's RedLightGreen is now mid-way through its pilot year.  I want to let   
you know about recent improvements we have made and new information available  
to you for communicating about this project with your community  of users.    

As you know, RedLightGreen launched as a live system in September 2003 at    
www.redlightgreen.com, and is moving into its second semester of testing.  
More members will be joining the effort to investigate future directions  
and promote awareness of the resource.  Current pilot partners are Columbia  
University, New York University, Princeton University, Swarthmore  College,  
and the University of Minnesota.                                   
   
The Andrew W. Mellon Foundation has awarded RLG additional funding to continue  
development and to ensure availability of the service at no cost to users  
through September 2004.                                            
  
Promotional texts and graphics are now freely available to any library for 
promoting RedLightGreen on its campus.  We invite you to create links to 
RedLightGreen and use these materials in any way you'd like for reaching 
the widest possible audience.  Just click the Information for Librarians 
link within RedLightGreen.                                                  
  
Other news about the system:                                                
                                                                             
    RedLightGreen now has an additional 1.5 million records, providing even  
    more material for users to "search, find, and get."                      
                                                                             
    The interface has been redesigned with a new color palette to give it a  
    more updated look and feel.                                              
                                                                             
    The new "Your List" feature enables students to create a citation list   
    that is persistent across sessions.                                      
                                                                             
    This spring we will be conducting studies with registered users from     
    partner campuses to prioritize continued improvements based on what      
    they most value and desire.                                              
                                                                             
                                                                             
We are always interested in hearing about your experience using RedLightGreen.
Let us know via the feedback link within the system, or contact RLG program 
officer Merrilee.Proffitt  at xxxxx.     

Posted by judielaine at 12:18 PM | Comments (0) | TrackBack

February 04, 2004

Updates

Did i mention we promoted a new interface, with a new "cool" feature called "Your List?" The "Your List" feature is why we prompted for a default citation format -- something that didn't make sense in the registration flow before this. The "Your List" is something other folks had asked for -- we wish we had it in place at launch.

The colors are brighter -- "younger" -- too.

And we updated the data right before ALA. As RLG migrates the full Union Catalog to a new infrastructure, our ability to add new records into RedLightGreen in a sensible manner is hampered by a cascade of dependencies and priorities. As soon as the Union Catalog is in its new home, i hope we'll be able to make a more sensible connection to the continually updated records. Until then, we'll have to make do with these batch loads.

Did i mention i had the flu?

Posted by judielaine at 10:38 AM | Comments (3) | TrackBack

From RLG's Executive Briefing

*** Andrew W. Mellon Foundation Supports Further Work on RedLightGreen

In December RLG received an appropriation from the Andrew W. Mellon Foundation for continued support of the RedLightGreen project. The Foundation's award enables us to embark on improving and promoting RedLightGreen to an audience of institutions and individuals that can make this service self-sustaining. User studies at the current pilot partner institutions will inform the kinds of service extensions and business models that offer the most value to RLG members, customers, and sponsors.

This significant grant is the third awarded by the Foundation for the RedLightGreen concept and execution. The first, in June 2001, supported planning stages; the second, in April 2002, enabled most of the development for the pilot service launch.

The pilot site has been in use since last September, and on January 27 we implemented a number of enhancements that reflect what we've learned from users over the first four months: see http://redlightgreen.com. The upcoming February "RLG Focus" will contain an article on "what we've learned since launch."

Posted by judielaine at 10:10 AM | TrackBack

December 16, 2003

LSI explanation

This article -- http://javelina.cet.middlebury.edu/lsa/out/cover_page.htm -- is about LSI, the same technique used in Recommind's software. We use LSI in conjunction with our structured data to help link together the structured, rigid language of the cataloging to the terms used in titles and notes (and responsibility statements).

Posted by judielaine at 05:44 PM | Comments (0) | TrackBack

October 03, 2003

User Studies

I should point out Günter Waibel's article Letting Users Show the Way in RLG . I personally would never refer to ranking algorithms with the phrase, "setting in motion a higher intelligence," but otherwise the article does a great job describing how early user tests with paper wireframes helped us sort out features.

Posted by judielaine at 07:00 PM | Comments (0) | TrackBack

September 24, 2003

Launch

Just a brief note to mention we've launched. http://www.redlightgreen.com

The performance isn't as snappy as we want, and we've got a list of possible problems and solutions: IO saturation on the fiber channel connecting the DB2 database and the raid storage, garbage collection in the java virtual machine that runs the search engine, some improvements in the design of some of the data tables.... In the mean time we may put an interstitial "Please wait" message in.

Just a favor -- if you register, please note in your "Major" or field of study "tunabreath" or "blogger" or "information science." We've got a pretty strict privacy policy in place, so this is one of the few items that will be recorded. It'd just be nice to know.

Thanks!

Posted by judielaine at 11:48 AM | Comments (0) | TrackBack

September 08, 2003

Crunch

The DB2 tuning week pointed out that we're just asking way too much. I think of the way we're presenting data as doing the three or four searches an expert might do when using Eureka as a discover tool all at once. Of course, an expert knows which way to narrow their search -- we have to narrow in multiple dimensions, so it's thirty or forty searches all at once.

We can normalize all the subjects to improve performance. Currently we do group by's and unique's on textually normalized subject strings (diacritics, case, most punctuation stripped away). We'll likely assign each unique subject string a key and replace the subject string by the key in the index table. This seems really stupid when you're aware how many unique subjects exist. However, the data manipulations of long ints will be much faster than the manipulations of the excessively long strings we allow for subjects.

But to build that -- how long? So, we might launch without much subject analysis at all. No disambiguation. No representative subject for each work. (I am annoyed by how novels often get "Large Print Edition" subjects as they rarely have subjects so the large print edition's subject floats to the top.)

I'm not happy about dropping these features -- they seem necessary for the engine to be useful. Yet, for launch....

Today is training on the Recommind MindServer.

Posted by judielaine at 07:35 AM | Comments (0) | TrackBack

September 02, 2003

Sanity and Chocolate

We've spent the past month upgrading to DB2 vs 8. Version 8 of DB2 didn't talk to the DB2 vs 7 client, despite many assertions from IBM that "the client and database are one version foreword and back compatible." Oh -- you've gone to 64 bit vs 8 -- no, that won't work. Then we had to upgrade the client, but the client wasn't compatible with the version of the Websphere application server we were running. We needed to upgrade that.

Then there was a set of poorly installed patches to Solaris which filled up a root drive, bringing down our development system.

Last week Recommind delivered their database, but we were having a hard time keeping the full database up with version 8. I think we're finally beginning to start testing the Recommind engine plus our full database of books.

I have occasional bouts where i forget to breathe. I was looking for Hans Küng's Does God Exist?. (It came to mind because i am going to alter our disintegrating edition al la Tom Phillips.) "Kung god" -- amusing results, not what i'm looking for. "Hans Kung" -- look at all the works with Hans Küng as author, but can't find my title. "Does god exist" -- It seemed that we only had one work with the title "Does God Exist?" when there should have been -- and i go examine Eureka -- ten or eleven.

Ah-ah! I'm searching the test database, with only 5% of the books records. So, back to the full database. "kung god" -- this time, i get the note that there are 285 results, but none are displayed. "kung god exist" turns up


1.Existiert Gott, by Hans Kung
6 editions published between 1979 and 1991 in 2 languages.
Primary Subject: God
[score=1.0, rank count=88, order=5.47733681447821]

Well, yay.

But before i can think about the frustrating implication of non-English language uniform titles for American undergraduates, i have to search on my name, so my SQL expert can try to figure out why "kung god" never displays a result.

"judith" turns up many results about horses, particularly books i remember from my childhood, like Misty of Chincoteague. Since i grew up with horses, this seems oddly appropriate, but a little too close to mind-reading to be believed. I expect it's part of a caching problem we've seen all day.

The invisible results from "kung god" have to do with the CJK aggregator existing in our "work" data -- something that is used internal to RLG to facilitate CJK searching.

Meanwhile, another search on "judith" turns up the The Books of Common Prayer, Confessions of a saint, Cicero on the nature of gods, and "Kusumanjali, by Udayanacarya" in the first five results.

I could get a swelled head over this!

Posted by judielaine at 06:40 PM | Comments (0) | TrackBack

August 11, 2003

From RLG's executive summary


*** UNIVERSITY OF MINNESOTA JOINS REDLIGHTGREEN PARTNERS ***

RLG's Mellon-funded project code-named "RedLightGreen" is moving forward to pilot use. The entire RLG Union Catalog (except for the serials and archival-and-mixed-materials files) has been converted to RedLightGreen format and loaded into the database that will be at the heart of the system.

The University of Minnesota will join Swarthmore, NYU, and Columbia in testing how their undergraduates will react to a freely available and custom-tailored version of RLG's union catalog. The RedLightGreen database and discovery system will be promoted on the partner-testers' campuses during the upcoming academic semester.

For more information on this project, which has great potential to transform online catalog use, see http://www.rlg.org/redlightgreen/ and contact Merrilee dot Proffitt at notes dot rlg dot org.

Posted by judielaine at 03:48 PM | Comments (0) | TrackBack

August 04, 2003

Total number of "editions"

39,468,186 "manifestations" or "editions" will be in the RedLightGreen total data set. A little over three fourths of these are from RLG's Books file and will comprise the initial data set at launch. Other formats will slowly be added to the Recommind MindServer in evening/over-night loads.

The scare quotes are there because each record truly represents a RLG cluster. For purposes of copy catalogers, RLG's clustering algorithm is extremely conservative. The preference is for a record not to cluster if there is a slight but possibly relevant difference. For an average user, this produces some odd "repeats," even beyond the "repeats" that different editions cause. ("Repeats" was the term used by a student test subject confronted by listings for different editions.)

Posted by judielaine at 03:30 PM | Comments (0) | TrackBack

July 20, 2003

Welcome to Tuna Breath

I've been leading the project known as RedLightGreen since I arrived at RLG in April of 2001. We started off with internal brainstorming sessions, discussing what may be of interest in the Union Catalog to a web audience who did not have access to a library that contributes records. These sessions also introduced me to my co-workers and were the beginning of my education in cataloging.

Since then we've continued on this "marmite" project -- we have bibliographic data, how can we package it to satisfy a need that doesn't undermine the current uses of the data? We recognized that we were all book-loving, data-loving folk -- how representative were our interests, anyway? We'd love to drill down deep into the data, twist projections of it around, come up with a project like mapping the publication of books against a timeline to watch the spread of the press around the globe. That might be a novelty for a large number of folks and fascinating to a few academics, but it wouldn't help us towards creating a site that would become self-sustainable past the grant period.

We've developed RedLightGreen to test the waters of self-sustainability. When we launch at the end of the summer, we will begin observing the use of the system in as much detail as possible in the hopes of improving it and coming to a better understanding of the information seeking behavior of undergraduate.

I'll be writing about the data and analysis here, as well as other design choices as I confront them in the final implementation of the project.

I hope you enjoy reading.

Posted by judielaine at 09:36 AM | Comments (0)