July 24, 2003

Securing Privacy

I've been working on drafting our policy for keeping users' behavior private and secure while allowing us to learn a great deal from their clickstream. Assuming we have a moderately large number of registered users, we can go a good distance in ensuring privacy, particularly for those users who don't stand out from the crowd or for those who don't volunteer any information about themselves.

The last group is obviously fairly anonymous -- but is it because they're lazy or because they value their personal information?

The Apache log is a weakness and a strength. We'll keep a standard Apache log which will capture IP values along with a time stamp. We'll run that through a standard log analysis package, looking for initial referrers and observing the IPs in aggregate, and then destroy the logs. This is a strength, in that we can offload the IP address logging to Apache, not the application. While the log exists,though, the timestamp and IP address linkage is present, so the IP could be correlated to the clickstream logs.

But those users who do stand out from the crowd -- the one registered user with a 32789 zip code (picking on Winter Park, Fl, because it's far from our pilot institutions, yet i know someone who will try the system from there), the smart aleck who gives a major of hemp basket smoking -- they've lost it. As i struggle to decide whether we should only record the first three zip digits, possibly crippling an analysis of how far someone will range for a book, i wonder to what lengths anyone else goes to truly secure my privacy. "We're not going to track you as an individual, just as part of the demographics and average use" many privacy policies read, but nothing in that means that the data isn't there were it to be requested.

Posted by judielaine at July 24, 2003 06:29 PM | TrackBack
Comments