Posts Tagged ‘JCDL JCDL’

Friday morning sessions

Friday, June 22nd, 2007

Agreeing to Disagree: Search Engines and their Public Interfaces, presented by Frank McCown with Michael L. Nelson
This paper described an experiment to see whether the search APIs offered by Google, MSN, and Yahoo differed from the results given by the web interface. There are some fascinating results in this five month study, available on-line. Broad conclusions include that the MSN search API produces the most stable and similar results to the web interface, that the indexes may be smaller than the web interface but are just as fresh, and there are big changes in the top ranked between API and WUI with Google, not so much with Yahoo. A note that Google’s backlink results seem pretty stale, while MSN and Yahoo are more fresh.

A good question was how did the researcher justify the violation of terms of service by screen scraping from the web interface. Frank described how he tried to get permission but could not reach someone to give (or not) that permission. He described trying to do no harm, limiting use to below that allowed by the API, and so on. The larger question of the very closed nature of these search engines — indeed, the research was driven because of the lack of information about the resource — their increasing impact in the academy was not directly addressed.

Static Reformulation: A User Study of Static Hypertext for Query-Based Reformulation, presented by Michael Huggett, co-author Joel Lanir
An interesting experiment on how most effectively follow the “scent of information” comparing keyword search to a browse of a cluster of computationally similar (based on keywords) documents. It was a vary controled and constructed experiment, The broad takeaway for myself was the distinction between task differences of when browse and search are more effective in retrieving articles. The trick here, it seems, is in constructing the browse of similar items. I’ve never found Google’s “similar” to support that; in bibliographic data classes and subject headings are the manualy constructed to provide that browse network. It’d be interesting to see the study repeated using that body of “similarity.” (I note that the Recommind engine esentially built collections of similar bibliographic records and that fed into the result sets.)

Later in the afternoon there was
“Effects of Structure and Interaction Style on Distinct Search Tasks”
presented by Robert Capra, coauthors Gary Marchionini, Jung Sun Oh, Fred Stutzman and Yan Zhang

This compared the hand-crafted, high information density front page of the Bureau of Labor Statistics (I suspect that Edward Tufte would approve) to two of faceted browse interfaces. This test had a couple of different tasks, but didn’t find a particular improvement between the two different interfaces. Admittedly, though, all three were browse tests. Users noted how they missed the search interface and were frustrated.
A Rich OPAC User Interface with AJAX, Jesse Prabawa Gozali and Min-Yen Kan, presented by min-Yen Kan
My work with RedLightGreen gave some depth to my admiration for this presentation of an OPAC interface using AJAX with a MySQL or Lucene backend which is intended to replace an Innovative interface (for discovery, presumably). Goals were to create a way to compare different detailed results and dynamically change sort order. A particularly elegant feature is how users can determine whether a particular item is listed in different search results (tabs for the search history). The interface can be reviewed at http://opac.comp.nus.edu.sg I ponder scalability, of course, and the difference in the RedLightGreen Google-like ranking as opposed to the traditional sort choices.

The Large Scale Collections session was rather engaging for me. (”Large-Scale Collections A New Generation of Textual Corpora: Mining Corpora from Very Large Collections” Gordon Stewart, Gregory Crane** and Alison Babeu; “Subject Metadata Enrichment using Statistical Topic Models” David Newman**, Kat Hagedorn, Chaitanya Chemudugunta and Padhraic Smyth; and “Organizing the OCA: Learning faceted subjects from a library of digital books” David Mimno** and Andrew McCallum; where **presenter) Greg Crane (of Perseus Digital Library), in a similar vein as his talk in the panel on Thursday, spoke about very rich tools needed for scholarly work in textual corpa. I reflect on a conversation I had with a colleague in RLG Programs, Monday before i left, about what tools are needed on top of these large collections of thext (Google, Open Content Alliance, Gutenberg). Greg has a long wish list! Of particular interest to me are the edition comparison and management needs, but Greg brings up an idea that — i’ve heard this before, does it go back to Vannevar Bush? Or Greg at a previous JCDL? — books should talk to other books. A book should link out to a concordance, a phrase that is referred to in other works should link to those references (ah, for trackbacks to Shakespeare). Greg frames this in the great humanities question of “how do we understand human expression” and notes that Western culture has perhaps done a good job understanding western culture, but not broader global human expression. (A hint, yesterday, that issues of “Homeland Security” might be better addressed by better cultural literacy.)

The next two talks had to do with statistical assignment of items to topics or classifications, a similar process to the latent semantic indexing (LSI) we applied to the union catalog with Recommind. The labeling of the topics still seems to be manual (which isn’t surprising but one can always hope for miracles). What was particularly interesting was, to support parallelizing the process, David Mimno describes in “Organizing the OCA: Learning faceted subjects from a library of digital books” applying the classification process on a page by page process (generate in book classification) and then classifying those across a much larger corpus. That page by page classification, though, then assigns each word to different classes. In a sense, each word on the page is classified, and inturn disambiguated from different meanings of the same term. It’s clear that there’s a springboard here to Greg Crane’s wish list.

JCDL, JCDL2007, search, latent semantic indexing, LSI, machine categorization, OCA, Open Content Alliance

Publish or Perish — but publish how?

Thursday, June 21st, 2007

Keynote: “Sorting and Classifying the Open Access Issues for Digital Libraries: Issues Technical, Economic, Philosophic, and Principled” John Willinsky, University of British Columbia (But, as announced, very soon to be Stanford)

John Willinsky’s talk, an acknowledged “preaching to the converted,” was lovely and inspiring. It was not just his enthusiasm for the mission of libraries, access and preservation, but also his faith in the result of democratic access, his faith in humanity to — in general — do the right thing. He’s involved with The Public Knowledge Project.

“We have not yet begun to plumb the depth of public interest in research”

Why are Open Journals, Open Data important? JW provides three rights which open access supports, the principles on which we should rest our support.

(1) The Right to Know, a human right included in the Universal Declaration of Human Rights. Open publishing online makes information available to everyone. He describes a discussion with policy researchers in Ottowa: what resources had they used in the past? They’d call the faculty members they had in the past — research by cronyism is how JW referred to it. (And i suggest that there’s evidence that journalists essentially have their network of folks they call.) Now though, the Ottowa policy researchers search on-line finding the open access research. Building on a poster from last night, he refers to the “fingerprints” of ideas (as opposed to citation “impact”) and notes as motivation to researchers that if they wish to leave their fingerprints on the future they improve the possibility by publishing openl.

(2) The Right to Participate, more journals filling out the spectrum of authority and audience opens the possibility for more participants in scholarly discourse. It’s hard for me to pull out particular issues here — i am such a convert to the value and process of open participation. Influenced by my thoughts about the participation of women in physics, from the value open software development has brought to me, to the education and pleasure reading openly published blogs, novels, graphic novels, articles, movies…. I do what i can to give back in that economy.

I can’t say JW added to my understanding much here, but there was a tangential comment about the use of the open journal technology: he hopes that the efficiencies such a tool would bring would support editors nurturing authors, going beyond the right to participate to improving the quality of that participation.

Personally, i find a little nudge here — should i be participating in scholarly discourse? (An appeal in my mind for more time, more energy, a longer life….)

(3) Academic Freedom, there are several aspects here. Part of it is the mechanics of the open journal software which keeps workflow records. JW never quite spells it out, but i believe that the unstated point here is that one can challenge a journal on biased acceptances and rejections. The story JW tells is of a Canadian medical journal and the run of events where the journal published an examination of how Plan B prescriptions were handled by pharmacists (somewhat critical review), the protest from the professional association of pharmacists, the medical association’s firing of the editors. JW notes, who would have thought that one would need to have academic freedom protected from an academic society?

I thought of that in the evening at the very sparsely attended community meeting. (I’ve been trained by attending Meeting for Business as a Friend, i suppose.) Someone spoke up about the proceedings — they should be open. Oh, ACM and IEEE would never allow it. Well, ACM (IEEE?) authors can publish papers on their own sites.So, tell us where your paper is and we can link to it from the meeting website.

Not — we *are* the ACM (or IEEE). We need to carry Willinsky’s call back to our professional societies. We need to take on that model he offered, where libraries support journals (not purchase them) and have our societies work to transition to a different publishing model.

I suppose i should write a letter.

Related: Directory of Open Access Journals and Public Library of Science.
JCDL, JCDL2007, open access, open publishing, open journals