August 09, 2006

AOL Data Real-World Logs Experiment Yields New York Times Privacy Proof

"A Face Is Exposed for AOL Searcher No. 4417749" is the New York Times' proof of concept of privacy invasion from search data:

Ms. Arnold, who agreed to discuss her searches with a reporter, said she was shocked to hear that AOL had saved and published three months' worth of them. "My goodness, it's my whole personal life," she said. "I had no idea somebody was looking over my shoulder."

You can just see the upper levels of the policy and punditry elite digesting this concept, as it becomes valid for them. There's a teachable moment happening right before our eyes, where conventional wisdom is being changed. Concerns about the implications of data retention, search logs, privacy invasion, etc, are suddenly moving from the outer reaches (ie. civil-libertarians) of polite society, to be respectable issues-of-the-day.

For unique material which is not being said dozens of times over by other people, I'll point out that Daniel Brandt at GoogleWatch has been making this case for years now, and even running "Scroogle", an anonymizing search proxy. This supports my points about activism - without media support, without a certain level of insiderness, you will talk forever about an issue, and not make any (or very little) progress.

By Seth Finkelstein | posted in google | on August 09, 2006 11:00 AM (Infothought permalink)

Seth Finkelstein's Infothought blog (Wikipedia, Google, censorware, and an inside view of net-politics) - Syndicate site (subscribe, RSS)

Comments

Why is everyone so ignornat of simple search proxies. There has to be 20 out there.

Easy and free heres one. http://www.blackboxsearch.com

Posted by: Dude at August 10, 2006 12:59 AM

how will searching behind a proxy stop this? the issue isn't the searchers IP address, which wasn't included in the data released, it's the private nature of the searches that makes someone trackable.

Posted by: Graham at August 11, 2006 12:23 AM

In the AOL case, the issue is that for each discrete user, their searches were grouped together under a single ID number over a three-month period. Over three months, the chances that you've entered traceable search terms are fairly good. A proxy would have prevented AOL from knowing that any two searches came from the same person.

Posted by: Daniel Brandt at August 12, 2006 12:35 PM

I just noticed today, scripts and css are being served from www.google.com as well as from other subdomains - I mean on other random sites such as blogs. So now if you want to prevent Google from tracking oyu across the web, it is no longer sufficient to refuse Google cookies and black-hole pagead2 and google analytics and such. You can either filter www.google.com as well, and not be able to get to the main page; or not filter it and let Google get hits from your IP on various other sites.

So thanks for mentioning SCroogle - that is one possible solution. I consider it unsatisfactory tho and will have to find or write some sort of surfing accessory that can discriminate and filter files from 3rd party sites (Mozilla browsers claim to be able to avoid 3rd party images but they really don't).

Posted by: Steve at August 12, 2006 12:37 PM