Posts Tagged ‘database’

Open data and cost recovery

Sunday, September 30th, 2007

I don’t just find Peter Brantley’s post about the Library of Congress’ rate for charging for data interesting due to my own work, but also because of the similarities to the parcel data “brouhaha” here in California. I’ve posted before about the court case to determine whether the digital descriptions of the parcel boundaries* are subject to the California public records law. Santa Clara county has charged a great sum for the data in the past, but if the data falls under the public records law, the records should be “provided to anyone requesting them for no more than the cost of duplication.” Santa Clara is appealing the decision on Homeland Security grounds. * parcels are the land units on which property taxes are assessed.

In looking for an update to the appeals story, i found this 2007-05-07 article from before the decision. It notes that Santa Clara county hired an outside consultant to examine the cost and after that study, officials noted that the fees might be dropped from $250,000 to $22,000. At least the $22,000 is the same order of magnitude as the copyright registration database. On the other hand, my spouse got the parcel data (for noncommercial use) for San Mateo county (immediately north of Santa Clara on the San Francisco peninsula) for a buck. (The cost of the CD on which it was distributed.)

Brantley reports that the copyright renewals database is congressionally mandated to be made available “at a charge of production and distribution cost plus 10%,” and reports that cost is ” $55,125 to obtain the retrospective online database, and $31,500 for a current-year subscription that must be annually renewed, for an entry cost of $86,625.” Assuming “production” doesn’t describe running the Copyright Office but production of the distribution copy, it’s somewhat difficult to understand how data distribution — not a live database serving hundreds of concurrent users but collection of records in MARC format — could run $50,000 in the digital age. However, the LOC notes a “recent cost savings,” so perhaps the new prices will be reduced by an order of magnitude or two.

The restriction Christine encountered of “for noncommercial use only” did stir up my memory of a different federal agency, the National Weather Service, and a noncompete clause proposed in 2005 by Senator Santorum. [Information Week overview, and opinions from the WeatherUnderground's Director of Meterology in April, May, and June of 2006.] I wonder if the amended law had gone into effect whether these forecasts would have been available under a FOI request.
Freedom of Information, copyright registration, public data, cost of data

Cooperative Cataloging: from music to location

Tuesday, July 17th, 2007

I remember when CDDB came out. It was great, even if a bit empty, and it simply felt good to find a CD that hadn’t been identified before and transcribe some data. (Note that one wasn’t ripping CDs at this point, just playing the CD in the SGI Irix box at the lab.*) At some point when i wasn’t paying attention, the cooperative community database seemed to became corporate as Gracenote. I’m not sure if there’s a lesson there or not. I suspect that in using licensed iTunes i’m passing some fee back to Gracenote, and that’s probably a good thing, because managing large union catalogs does take a great deal of resources. *cough, cough* The result of this build-up and development is that when a friend sent me an mp3 with just a title this morning, i was able to open it in iTunes and suddenly have all the metadata available to me. Networked reference databases and metadata rock, and in the (commercial) music world, it just happens.

I was reminded of that initial “wow” feeling of the CDDB (predating IMDB and newcomer NNDB) when reading about Andrew Turner’s Flickr Zone Tagr building on the Zone tag database (err, not ZTDB, thanks). Like TagMap this is another use of the flickr tags of geolocated (hence zone) photos. Andrew’s implementation has that same flavor as the early CDDB — cooperative cataloging of photos instead of CD tracks. Will i be able to take an emailed photo someday, view it, and have a background network query match buildings and skylines to a global 4D place model, extract a location and time range, pull on the descendant of the flickr zone tag database and get probable labels? “Oh, this was taken in Daytona in 1977 — that little one must be my cousin.”

Well, maybe not the 4D model part, maybe not the 1977 match, not for a while. But give us the cameras with the GPS coordinates embedded in the EXIF data….

* Never fear, scientists ask pay back for their time goofing off in the lab. Now that all the music CDs are labeled, you can start on labeling galaxies. No word on whether this project is meshing up with previously mentioned automatic astronomy photo metadata generator astrometry.net.

data entry, 4d cities, astronometry.net, astrometry.net, digital libraries, image models, spatial models,neogeography, geocoding, tagmap,flickr,data entry, authority files