November 28, 2007

Debunking linkfest - Long Tail, Infringements, Comcast, etc.

A collection of proof as to why being right is no match for being popular:

A Critical Reader's Guide to The Long Tail (Tom Slee)

Debunks the "Long Tail". Never confuse someone making a bit of money off you with you making a bit of money.

Adam Curtis: The TV elite has lost the plot | The Register (Andrew Orlowski)

Debunks well, a lot of stuff. Not all of which I agree with, but thought-provoking all the same.

Fringe Infringements (Jon Garfunkel)

Debunks some claims in a legal paper that's making the rounds. The problem is that if you don't come up with some sort of attention-grabber, you won't be heard.

DOCSIS vs. BitTorrent (Richard Bennett)

Debunks some of the technical matter about the Comcast network throttling blogstorm.

Old Media May Not Be Dead, But Traffic From It Is (Jeneane Sessum)

Debunks certain mistaken readership ideas.

By Seth Finkelstein | posted in cyberblather | on November 28, 2007 05:22 PM | (Infothought permalink) | Comments (2)
November 25, 2007

Aftermath of Conservapedia / Homosexual page statistics prank

Here's the readership statistics on my debunking of the Conservapedia Homosexuality story. From my site logs:

Distinct IP's - 4199

boingboing.net update - 2316, plus a few hundred from various BB syndicated feeds
unknown - 504
crookedtimber.org blog comment - 389
scienceblogs.com ("pharyngula") blog comment - 126
stumbleupon.com - 89

It speaks for itself. There's nothing more for me to say that I haven't already said too many times :-(.

Interestingly, the Conservapedia people seem very unconcerned with the prank. It's a good reminder of how there can be very little overlap between groups. While bloggers were hyping and hyperventilating over it, the site's main discussion page is not full of drama. One amusing comment about a jump in (real) site traffic, directly under a short section about the statistics page issue, has:

Looking at the [site traffic rank] spike, I'm wondering if there was any specific coverage of Conservapedia that brought all the traffic here - has anyone seen or heard of anything of the like? Or did we just have a very lucky day? Well, all the hard work put into Conservapedia was more than mere "luck", of course. :)

Why, yes, there has been some specific coverage recently which would drive traffic, I think it's something newfangled called "flogs".

By Seth Finkelstein | posted in statistics | on November 25, 2007 11:59 PM | (Infothought permalink) | Comments (9)
November 21, 2007

Conservapedia, Homosexuality, and pranked statistics

The "Conservapedia" Homosexuality statistics story being echoed, where the right-wing Wikipedia-style site "Conservapedia" allegedly has nine out of its top ten most popular pages being against homosexuality, cannot be correct. That is, whether by accident or design, the alleged statistics don't pass the sanity test (I know, I know ...). The site's "Most viewed pages" statistics supposedly are:

1. Main Page [1,902,822]
2. Homosexuality [1,542,919]
3. Homosexuality and Hepatitis [516,745]
4. Homosexuality and Promiscuity [420,172]
5. Homosexuality and Parasites [387,876]
6. Homosexuality and Domestic Violence [351,556]
7. Gay Bowel Syndrome [343,273]
8. Homosexuality and Gonorrhea [331,090]
9. Homosexuality and Mental Health [277,119]
10. Homosexuality and Syphilis [265,042]

Except this makes no sense. While the "Homosexuality" page itself might be highly ranked, the "Homosexuality and Hepatitis" page is short and has been in existence only since October 17. There's no way something like that would a legitimate third-most popular page, even for raving homophobes.

And the top ten doesn't have "Bible"? Or "Jesus Christ"? [update - better: any other controversial topic?]. Those are supposedly less popular than "Gay Bowel Syndrome"?? That's ridiculous (I know, I know ...). Either a spider has run amok or someone is deliberately inflating the pageviews.

Of course, this post will have near-zero effect on a story "too good to check". Let's hear it for the self-deluding nature of the bogosphere and the futility of trying to be heard :-(.

By Seth Finkelstein | posted in wikipedia | on November 21, 2007 09:28 AM | (Infothought permalink) | Comments (16)
November 17, 2007

Google penalties for link-selling, and A-listers vs. Z-listers

The latest Google slapping of paid links has generated an intriguing aspect of "class struggle", as the intermediaries from Z-listers complain it's unfair to penalize those blogs for selling links, while not penalizing A-lister blogs which having sponsors "thank you" posts with links, essentially also paid link selling.

While neither side of that battle cares what I have to say (and it's probably not the best idea for me to get between them), it's an interesting question - what's the difference between paying for posts, and posts with links to an A-lister blog's sponsors? Perhaps surprisingly, I actually do see a difference. While the thank-you link posts are by no means completely pure, there's a lesser level of search gaming there than the individual placements for paid links. As a minor detail, typically having several links on a page dilutes the PageRank being sold. It is indeed some selling of PageRank, but not as much as a post which is devoted to a specific advertiser.

But much more importantly, it's not just the PageRank being sold, but also the sale of keywords in the links. That is:

"We'd like to thank our sponsor, BigCo [link]" is one thing, but

"We'd like to thank our sponsor, BigCo [link], which sells uPods [link], Niagra [link], and mome hortages [link]" would be quite another.

Now, you can push this if the company is named "Buy Niagra", but in general, the difference works in practice. Companies ranking higher for their own name is not a big problem, and while the extra bit of PageRank to distribute over their site is indeed ill-gotten gains, it pales in comparison to the keyword link issues.

Besides, nothing stops Google from going after the A-listers selling PageRank at some future date, after they've worked out the bugs (which seems to be substantial) from handling the Z-listers' keyword-selling.

By Seth Finkelstein | posted in google | on November 17, 2007 11:50 PM | (Infothought permalink) | Comments (8)
November 14, 2007

My _Guardian_ column on Yahoo / China / journalist Shi Tao / network data

http://www.guardian.co.uk/technology/2007/nov/15/comment
Do you know who's using your data?

"As search engines and social networks collect more and more user data for business purposes, governments will find that data more and more useful for their investigatory purposes"

I didn't have a good title in mind myself, so this time I'm fine with the one they created :-)

I rather like this phrase I coined for the column:

"The price of total personalization is total surveillance."

By Seth Finkelstein | posted in press | on November 14, 2007 07:30 PM | (Infothought permalink) | Comments (12)
November 12, 2007

Revealed: Wikia (Wikipedia-Model) search bought "Grub" crawler for $50K

[Scoop! News - not an echo!]

A while back, Wikia search, the much-hyped search effort by Jimmy Wales to do a for-profit search engine via a Wikipedia-like model (i.e. free labor from the public), acquired the "Grub" crawler. In terms of following the money, from the Securities and Exchange Commission company filing for the previous owner (LookSmart), it's been disclosed that Wikia bought the "Grub" crawler for $50K:

Grub

The URL for Grub and certain source code were sold for $50,000 on July 12, 2007 A gain on sale of assets of $50,000 was recorded in the three months ended September 30, 2007, which is included in other operating income (loss), net.

Compare Grub's purchase by LookSmart at $1.3 million

Grub, Inc.

In January 2003, the Company acquired intellectual property rights from Grub, Inc. and an individual for total consideration of $1.3 million, consisting of $0.6 million cash payment, including $12,000 of direct costs and the issuance of 217,000 shares of LookSmart common stock valued at $0.7 million.

[Hat tip to Gary Price for that last bit of info.]

Now, the $50K price might be connected to the Wikia / LookSMart ad deal, so it may not be the full story in itself. But there's definitely a money lesson in here somewhere, even if it's not clear what it is.

By Seth Finkelstein | posted in wikia-search | on November 12, 2007 03:59 PM | (Infothought permalink)
November 08, 2007

Google New Pagerank In != Pagerank Out Changes And Google's Statements

Regarding Google's recent PageRank shake-up, where I conjectured that Pagerank In != Pagerank Out, I realized that an article a few weeks ago from Danny Sullivan (Official: Selling Paid Links Can Hurt Your PageRank Or Rankings On Google) had actually reported this effect from Google itself. I'd read the post at the time. But the implications weren't clear in the way they now make sense in retrospect (my habit of discounting oracular Googlese led me astray). Quoting the article, my emphasis:

More and more, I've been seeing people wondering if they've lost traffic on Google because they were detected to be selling paid links. However, Google's generally never penalized sites for link selling. If spotted, in most cases all Google would do is prevent links from a site or pages in a site from passing PageRank. Now that's changing. If you sell links, Google might indeed penalize your site plus drop the PageRank score that shows for it.

Note penalize is not the same as dropping the PageRank score that shows for it. So Google can drop the PageRank score that shows for it, WITHOUT penalizing the rankings of the site.

So I pinged Google, and they confirmed that PageRank scores are being lowered for some sites that sell links.

In addition, Google said that some sites that are selling links may indeed end up being dropped from its search engine or have penalties attached to prevent them from ranking well. [... snip]

By using PageRank decreases (something Google first experimented with in the SearchKing case in 2002), Google can hurt the perceived value of buying links from a particular site without harming core relevancy.

So "without harming core relevancy" apparently means what I've thought of as PageRank-In != PageRank-Out.

The market for paid links just got a whole lot more complicated :-).

By Seth Finkelstein | posted in google | on November 08, 2007 06:48 PM | (Infothought permalink) | Comments (12)
November 06, 2007

Debunking - Congressman Adrian Smith NOT blocking blogspot bloggers

If anyone wants to follow yet another fear-and-loathing blog story, the so-called Congressman Adrian Smith and blogspot blocking is a case-study. Quick explanation: The hosting provider for (some) House Congressional websites blocked access from referers from blogspot.com, as a spam issue. One blogger noticed this with one congressman - and we were off with the narrative that the Congressman must fear the awesome power of bloggers, rather than it being some sort of a technical glitch. Echoing and echoing, because in the bogosphere you GET ATTENTION!!! by repeating such a story, versus possible personal attack by saying it's false.

So now the unfortunate hosting company is running around chasing the various echoes, posting a technical explanation, which is both 1) not read, since it's down in the comments which are viewed by a small fraction of readers 2) being treated presumptively as a cover-up.

Remember, it's all "conversation" (link omitted for self-preservation), and the mainstream media gets things wrong too ...

By Seth Finkelstein | posted in cyberblather | on November 06, 2007 09:59 AM | (Infothought permalink) | Comments (5)
November 02, 2007

The TimesSelect Reader (Jon Garfunkel) - NYT paywall vs. Google and bloggers

The TimesSelect Reader is Jon Garfunkel's "8 parts and 21,000 words" examination of the New York Times having a premium, for-pay, service and

... whether the Times lost influence, or audience, or money over the last two years. Many entries in the blogs have been long on speculation and short on data. We have tried to fill in the data gaps here.

Readers here might particularly want to examine the section on TimesSelect, SEO, and Google:

Google may need the Times, but the Times is starting to rely on Google even more. Marshall Simmonds told me that 25% of the traffic to nytimes.com comes from all search engines.

Note also TimesSelect & Foreign Correspondence

Perhaps Friedman is more popular because he often tells us what we want to hear. Kristof tells us what we don't want to hear.

It's all an immense amount of work, deserving of extensive attention (which, not coming from an A-lister, it won't get - in fact, even if it did come from an A-lister, it probably wouldn't be read, though it'd be talked about).

By Seth Finkelstein | posted in google | on November 02, 2007 11:59 PM | (Infothought permalink) | Comments (1)