39,468,186 "manifestations" or "editions" will be in the RedLightGreen total data set. A little over three fourths of these are from RLG's Books file and will comprise the initial data set at launch. Other formats will slowly be added to the Recommind MindServer in evening/over-night loads.
The scare quotes are there because each record truly represents a RLG cluster. For purposes of copy catalogers, RLG's clustering algorithm is extremely conservative. The preference is for a record not to cluster if there is a slight but possibly relevant difference. For an average user, this produces some odd "repeats," even beyond the "repeats" that different editions cause. ("Repeats" was the term used by a student test subject confronted by listings for different editions.)
Posted by judielaine at August 4, 2003 03:30 PM | TrackBack