December 07, 2004

Problems

We've entered a period of instability that we're trying to understand. The problem is at the interface between our application and the mindserver, and it's intermittent.

Intermittent errors are the worst.

Posted by judielaine at 06:53 AM | Comments (0) | TrackBack

May 04, 2004

Root Drive Meltdown

Our Product manager just sent this to our partners.

You may have noticed that RedLightGreen is down, and might be wondering what the prognosis is.

Suffice to say, it's a nontrivial problem that involves hardware. Hardware is being replaced as quickly as possible, but it's very likely that RedLightGreen will be down through tomorrow (Wednesday). The home page for RedLightGreen will reflect our best guess as to status.

Regretfully, we hadn't gotten around to having a hot spare root drive on this particular machine (RedLightGreen, as a pilot project, is a little lower in priority than our subscription services.)

It does bring up one of the problems with bookmarking our system. Links to "[www.]RedLightGreen.com" will point to our outage page right now, but any links that have servelet path elements will break. We have some code to fix that, and we're testing it now....

Posted by judielaine at 05:24 PM | Comments (0) | TrackBack

January 05, 2004

Quiet

We're down until 9 January, loading the 2003 records into the system. This has happened with many, many hitches already. In mid-December, we loaded data into our DB2 database during our overnight outage periods (and ran over into the morning at times). That did not go well, and it appears that some of the data is scrambled. If one accesses a row in some memory range, the database shuts down. We have to fix that, as well as load data into the Recommind database. Under "normal" operating conditions, we should be able to add data while the system is live. Unfortunately, we're not under normal conditions yet.

We intend to be up in time for folks to view the system at the RLG booth at ALA.

Posted by judielaine at 02:19 PM | Comments (0) | TrackBack

October 31, 2003

Upgrade & Promotion

We've added a number of bug fixes and a few labeling revisions to RedLightGreen this evening. During the "overnight" outage we're also fixing the most perplexing bug -- the one where you can't search on a single word unless it can be separated into a root and suffix(es).

We're going to be promoting at our partner institutions next week -- i expect we'll begin to see real undergraduate usage.

We are also still suffering the DB2 JDBC bug that crashes Websphere -- waiting for a DB2 fixpack on that....

Happy Halloween!

Posted by judielaine at 06:07 PM | Comments (0) | TrackBack

October 28, 2003

Unicode client woes

So, first, the appearance that we can't handle UNICODE rankles. RLG has been involved in the Development of UNICODE over the last decade, and before that with other character encoding schemes. I have much respect for my colleagues here, and i hate to embarrass them.

So, we do have a known bug: when you see a "Related Subject" or "Related Author" with non-ASCII characters in it, the JavaScript creating the link chokes on the UNICODE. We're working on a fix for that. Other than that, this system is all UTF-8. Which isn't to say it's perfect, but there are interpretation issues on the client side -- and i think they're even worse than the simple rendering variants across browsers.

Last week, i could not cut and paste 毛 澤東 with my Mac. They displayed just fine in Firebird, but no luck cutting and pasting. This worked just fine on my Win 2000 box at work (Mozilla 1.0.2). I've upgraded my Mac to 10.3 over the weekend. I can now cut and paste those characters into the RedLightGreen search box and have a perfectly happy search.

So, now to reconsider the diacritics. I spoke with our local UNICODE expert, Joan Aliprand (the first named author of The Unicode Standard, Version 4.0. and here for RLG's involvement with the UNICODE Consortium in 1994). Joan pointed out that there are (at least) two ways to express many diacritic characters -- the character may be a composite of two codes or it can be the distinct code for the character. We chose composite characters for our translation.

We chose to configure the MindServer to be completely character set and language agnostic -- we have too many languages and character sets in our data to lock them down. The MindServer, then will match UTF-8 character to character -- it's not smart enough to decompose or recompose characters with diacritics.

When Porst tried his Jänich search, it's possible that the umlaut-a was a single character (although, when i search "janich," i find that Wissenschaftstheorie als Wissenschaftskritik has a German record where Jürgen has an umlaut-u but Janich has no diacritics). My diacritic sample search is for Hans Küng.

On my 10.2 Mac i had lousy results, which contradicted my testing on my Win 2000 box. Today i can repeat the search. First, because i'm resisting learning how to type anything but what's on my American keyboard, i search for "kung." When i view the details, i see "Ku[]ng" where the [] is the box character used to indicate "yeah, that's a proper character but i can't display it." (It displays just fine on the Win 200 box.) It's clear that this is a composed diacritic character -- the diacritic is the character that's not displaying: I paste that into the search box, and the u-umlaut displays. ("Küng")*sigh* The search works just fine.

Now, i want to demonstrate the frustration due to the two different UNICODE encodings. I suspect that the "Küng" i searched for earlier had a single u-umlaut character. I cut and past from the blog -- voila, no results. I replicate this pattern on the PC, as well.

So, what to do? RLG's practice, developed over the past twenty some years, was to normalize Latin Character set text strings in indices, removing diacritics. (Our ugly, punctuation-stripped title-cluster labels demonstrate the normalization that was in use.) RedLightGreen built upon that normalization effort, using both the original encodings and the normalized indexes to create the record MindServer indexes. Should we index again, adding a duplicate access point with the diacritics represented the other way ("disk is cheap")? Should we expect the underlying tools (DB2, MindServer) to recognize that the composed and single character diacritics are synonyms?

As we migrate the whole Union Catalog, we'll be figuring it out -- and maybe it's already in a spec somewhere. Eventually, it will percolate into RedLightGreen. Until then, we've a frustrating situation with respect to diacritics.

Posted by judielaine at 11:37 AM | Comments (0) | TrackBack

September 27, 2003

Egg everywhere

The current Recommind engine, when it has as much data in memory as this one has, has a little problem with garbage collection. The simple model, garbage collect frequently and often, essentially creates tiny outages every minute or so. Last night we restarted the engine in order to switch over to garbage collection every night. Recommind's new version will obviate the problem, but for the next few weeks we're choosing the overnight outage over the frequent micro-outages.

From top: Memory: 24G real, 370M free, 28G swap in use, 40G swap free

So, we also ran a back up of the DB2 database last night. The previous backup, it turns out, was significantly smaller. That may be because we haven't deleted some of the test tables we made for determining whether clustered indexing would improve performance. I doubt we've had *that* many folks register.

I am aware of this size differential because last night's back up failed, hanging the system from 3 am to 5:45 am. That's when my spouse tapped me and said, "Did you get up when your 3 am alarm went off?" The system then hung for another handful of minutes as i tracked down the phone number of the DBA. He made more space and began the backup all over again. I would have liked to postpone until midnight tonight, but apparently we would have had to block all registrations all day at best. The full backup took about three hours and created two parts each 199.98 GB in size.

So we're back up now.

I spent the time looking at the resolved addresses of the denied parties. We had sent out a message to all the RLG Member representatives yesterday at 4 pm. I had a nice sense of how global the membership is as i watched denials from Oxford to Hawaii.

If anyone sees Murphy, please tell him to throw the book at someone else for a while.

Posted by judielaine at 11:11 AM | Comments (0) | TrackBack