Web2.0 Data Entry Evening

I’ve been interrupted in going through my notes from the May and June events by a nasty cold. Last night, feeling better but not great, i explored a few beta services i hadn’t done much more than glance at this spring. First, there was Spockwhich bills itself as improving people searches. It appears to have some crawl of some social networking sites and Wikipedia as a seed, and then allows users to tag, add relevant websites, and describe relationships. Know a great deal about some historical figure? You can check the tags made by the “Spock Robot,” vote to make some stronger, add others, add relationships. As best as i can tell, the relationships aren’t automatically reciprocal: if i indicate George Fox is related to Margaret Fell, Margaret Fell isn’t automatically related to George Fox.

This site, which had sent me email to remind me of its existence, reminded me of Metaweb’s FreeBase as i was tagging my friend Kurt Bollacker. Freebase goes beyond people and has a richer set of controlled attributes, and relationships are reciprocal (although it seems to take some moments to propagate). It seems seeded by a wide variety of structured data sources, including Wikipedia.

To see the comparative richness: George Fox at FreeBase and George Fox at Spock. When i checked FreeBase, it said George Fox was the author of the screenplay Earthquake. While i haven’t read George Fox’s Journal, i know enough to doubt it was any inspiration for a 1970s disaster movie. I’ve removed that link. (To be responsible, I’ve clicked through to to the movie, created a new person of type “film writer” named George Fox, and removed the “type” of “film writer” from George Fox, born 1624.)

It’s worth comparing these to George Fox at OCLC Identities. The search for George Fox there doesn’t seem to turn up the filmwriter, but does make me wish i could quickly click next to some of the results and say, “These are all the same person.” Spock doesn’t seem to allow that (Kurt has three entries) whereas it seems Freebase assumes a little too much (all George Foxes are the same). And i suppose another new effort in people search worth mentioning is the NNDB, with no George Foxes, but some fascinating relationships and attributes.

One notable function of these web 2.0 systems like Spock and FreeBase is how they do try to control data entry. When i was selecting a relationship in Spock, I typed “Co” and “co worker” and “co worker and friend” appeared. That some control is needed — and it seems FreeBase has the mechanisms in place to get problems like this fixed — was that later i found i typed a letter more and was presented with “coworker.” Another place where using users to fix the “authority file” to disambiguate among terms would be useful would be the venue list at Upcoming.

I don’t think you could pay me to do data entry, but there i was, spending the evening putting in alternate names for Stevens Creek. If you find this interesting, let me know, and I’ll pass on an invite to Spock and FreeBase, while supplies last.

Spock, people search, Kurt Bollacker, Metaweb, Freebase, web2.0, oclc identities, identities, disambiguation, authority files, nndb,data entry,

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , ,

3 Responses to “Web2.0 Data Entry Evening”

  1. Brendan Says:

    Though I don’t know the details of the heuristics used, I believe the data folks at freebase run scripts that crawl the wikipedia data. So, the script might crawl a list of films and notice that the film “Earthquake” has a screenwriter “George Fox” listed. Since there is only one “George Fox” that the script knows of, it takes a guess and connects the two. The heuristic failed, in this case. I don’t know the stats, but I think these kinds of scripts succeed quite a bit more than they fail and the heuristics will only get better.

    What you did, manually correct the data by adding a new “George Fox” and removing the connection between the film and the “other” “George Fox” is the antidote to the limitations of heuristics. That’s why the open, collaborative dimension of freebase.com is so important.

    So, thanks to your little bit of effort, as a developer I can now ask freebase:

    [{
    "id":null,
    "name":"George Fox",
    "type":"/people/deceased_person"
    }]

    which is in the Metaweb Query Language asks “Give me the unique id of ‘George Fox’, the dead one”

    or

    [{
    "film":[],
    “id”:null,
    “name”:”George Fox”,
    “type”:”/film/writer”
    }]

    “I mean the screen writer, and show me what films he wrote”

    response:

    [{
    "film":["Earthquake"],
    “id”:”#9202a8c04000641f80000000054b626e”,
    “name”:”George Fox”,
    “type”:”/film/writer”
    }]

    Thanks!

  2. Dale Fox Says:

    George Fox the screenwriter was my uncle, and he did work on the Earthquake screenplay as well as many others including the Godfather and Jeremiah Johnson.

  3. Arthur Axelman Says:

    Dale Fox, I’m afraid you have misinformation about your uncle’s credits. I was the agent for Oscar winning screenwriter Edward Anhalt and he and original writer John Milius were the only writers on Jeremiah Johnson. As for Godfather, it is well documented that Puzo and Coppola were the only writers. George Fox was a non screenwriter whow brought in by producer Jennings Lang to adapt Puzo’s draft of Earthquake to make it less expensive and shorter. He was hired because he was uncredited as a screenwriter and therefore very cheap. Uncle George Fox however did write a story that has never been made and we were in touch several years ago to try to get it produced. I would like to know what happened to him. Contact me at namlexa!aol.com

Leave a Reply