March 03, 2004

Free porn, Google, spam, Internet censorship, and the Supreme Court

[Yes, this post really seriously concerns *all* the topics listed, it's truly that _tour de force_]

The Supreme Court just heard arguments on another Internet censorship law, "COPA", ( Ashcroft v. ACLU, 03-218). The Boston Globe reported:

Ordinarily, US Solicitor General Theodore B. Olson prepares for an appearance before the Supreme Court by acting out his argument before a pretend court. This time, for a case about the Internet, he added a new twist: searching online for free porn.

At his home last weekend, Olson told the justices yesterday, he typed in those two words in a search engine, and found that "there were 6,230,000 sites available."

The top lawyer who represents the Bush administration before the Supreme Court said the search's results illustrate how pornography on websites "is increasing enormously every day," a central point in his argument for saving an antipornography law that was enacted six years ago but has yet to go into effect.

Now, let's do something often unrewarded in this world - think. What search did he do exactly? It seems to be the following search in Google:

That gives me now "about 6,320,000" results, close enough, the total number returned often varies a bit.

Now, what that search means is roughly the number of pages containing the words "free" and "porn" anywhere in the entire page (or links with those words). This blog entry will qualify as one of those results as soon as it is indexed. I don't think this blog entry is proof of how pornography on websites "is increasing enormously every day,", much less the need for an Internet censorship law.

I've written about the problems of Google and stupid journalism tricks before. But, sigh, nobody reads me, so this won't get reported. Anyway, the story gets even better.

I started digging down into the results to see if I could find some non-sex-site mentions before the Google 1000 results display limit (Yes, Mr. Olson, there are more than 1000 sites devoted to sex in the world, that's true). Google's display crashed stopped in the high 800's! That is, displayed at the bottom, for:

In order to show you the most relevant results, we have omitted some entries very similar to the 876 already displayed.
If you like, you can repeat the search with the omitted results included.

The number varies, but it's been under 900.

Joke: Hear ye! Hear ye! Instead of "6,230,000 sites available", there's really uniquely less than 900! At least, according to Google.

Now, this is the Google display crash from bugs in the Google spam filtering. Google has cleaned-up their index so the crash is not happening on the first screen of results. But it's still in their results display code. Usually, people don't see the bug in practice, since the crash has now been pushed very far down in the sequence of results.

But here I had a reason to go looking out as far as I could, and ran into the crash in a bona-fide real-world situation. Not just a trivial query too, but one with profound implications for Censorship Of The Internet.

[Update 3/4: Michael Masnick brings to my attention that what I thought was the old Google spam crash is now reduced to duplicate-removal processing on the 1000 results display limit - the point is still that I can use fallacious superficial search "logic" to assert there's less than 900 sites, because Google "says" so. But the technical reason is not quite what I wrote originally]

Humor: If the evidence from a Google search was good enough to be used to justify censorship when it said "6.2 million", why isn't it good enough to justify no censorship if on further investigation it says less than 900? That is, if you thought it was valid before, with a big number, why isn't it valid now, with a small number? (garbage in, garbage out)

Look at me, I'm a journalist (or grandstanding lawyer) - Google says there's no practically no porn on the net!

By Seth Finkelstein | posted in google , infothought , legal , spam | on March 03, 2004 09:52 AM (Infothought permalink) | Followups
Seth Finkelstein's Infothought blog (Wikipedia, Google, censorware, and an inside view of net-politics) - Syndicate site (subscribe, RSS)

Subscribe with Bloglines      Subscribe in NewsGator Online  Google Reader or Homepage


Brilliant! Hard to argue with your logic seth - although we know that logic has not sat well with US law makers on this issue before, nor worryigly the Supreme's if CIPA was anything to go by.

Posted by: Scott at March 3, 2004 11:00 AM

Seth, your search returned more "sites available" than did Olson's, so "jokes" aside, I don't know what you're crowing about.

Posted by: Jack Stephens at March 4, 2004 01:56 AM

An overwhelming majority of those six million pages are automatically generated to increase some porn site's PageRank. Many such sites use scripts that generate a near infinite amount of pages for spiders to find. This does not mean that any human has ever visited the pages, nor that they actually contain free porn.

Posted by: Ilari Sani at March 4, 2004 04:10 AM

Google limits all the search results they'll show to the top 1000 only. They claim that they're not trying to give you a list of all results, but the most relevant - and if it's not in the top 1,000, then it's not relevant. So, while the rest of the points you make are good, that last point doesn't hold up.

Posted by: MIke at March 4, 2004 05:29 AM

This post might get a lot of traffic. And it might annoy some site listed at the top Google page for those search terms by pushing it off by tomorrow.

Mr. Olson's statement is quite obviously wrong already because Google indexes pages, not websites.

He might still be right about the assertion that online porn is increasing. However, to qualify as evidence for that point he would need to give comparisons to the number of pages found in earlier searches.

Posted by: Karl-Friedrich Lenz at March 4, 2004 09:08 AM

I have not seen that Olson mentioned Google by name in his testimony. Did I miss this? For Seth to argue that "very similar" in Google is the equivalent of "non-existent" truly is a joke.

Posted by: Jack Stephens at March 4, 2004 11:59 AM

Mike: Correction made, thank you.

Karl-Friedrich: So far, around 200 readers. Fifty+ from a mention on
That's indeed a lot of traffic for me. Sigh. Why do I bother?

Posted by: Seth Finkelstein at March 4, 2004 01:54 PM

why dont u customise the layout :/ the default one sucks

Posted by: yam at March 4, 2004 05:03 PM

The thing that your post fails to cover is that there are other search phrases for which some of the other 6,000,000 pages will show up for.

I still think that using one search on a single search engine database as a basis for censorship law is one of the dumbest things I have ever read.

free porn free porn free what?

Posted by: aaron wall at March 8, 2004 04:12 AM