November 17, 2008

Daniel Brandt (Scroogle, Google Watch) on Google ranking anomalies

[Below is a guest post from Daniel Brandt, who gives his experiences and speculations below. His views are of course his own and not necessarily my own, but I do believe them worth hearing]

There is definitely some sort of filtering going on in Google's rankings for certain keywords. It took 18 months for any of the pages on my wikipedia-watch.org site to rank better than 200 deep or so for any combination of keywords from those pages. During this time, Yahoo and Live.com were ranking the same pages well for the same terms.

When I test terms on Google, I test with a multi-threaded private tool that checks more than 30 Google data centers on different Class Cs, and shows the rank up to 100 on each one. I can see changes kicking in and out as they propagate across these data centers. The transitions can take several days in normal cases, as when a new or modified page is appropriated into the results.

Wikipedia-watch.org has been a website now for 36 months. During the first half of that period, no pages ranked higher than 200 deep or so, even if you used two fairly uncommon words from that page to search for it (this is documented at wikipedia-watch.org/goohate.html). During the second half of that period, after it took about four months to settle into the transition, the deeper pages ranked okay, and were on a par with Yahoo and Live. But there was still one glaring exception to this rule: the search for the single word "wikipedia" failed to turn up the home page in the first 100 results almost all of the time during this second period.

When it did show up, it always ranked within the top 15. When it didn't show up, it was always greater than 100. There was never anything in between, and I've been watching this curiosity for the last six months now. For the first five of these months, it might kick in for a few hours on all data centers, and then disappear. This happened several times. Twice it kicked in for a few days, and then disappeared from the top 100 again. During the last 30 days, it has been in about half of the total time, for several days each time, and then disappeared again for days. It's always one or the other -- in the top 15 or not even in the top 100. Meanwhile, the deep pages have ranked okay the last 18 months, and have been stable this entire time.

This behavior is something I'm seeing only for the home page, and only on Google but not on Yahoo or Live. It happens almost exclusively when the word "wikipedia" is the solitary search term, or maybe this one word and another term that's also on that page. If you add a third term you begin ranking reasonably well for my home page, presumably because the search is now specific enough to override the filtering. By the way, this home page has a PageRank of 5 and Yahoo counts 3,500 external backlinks to that home page (there's a counting tool at microsoft-watch.org/cgi-bin/ranking.htm). You cannot use Google to count backlinks, because for years now, Google has been deliberately suppressing this information.

I should also add here that for three years running, another site of mine, Scroogle.org, had a tool that compared the top 100 Google results for a search with the top 100 Yahoo results for that same search. This may come as a surprise to some, but the divergence was consistently 80 percent for all searches. In other words, only 20 out of 100 links showed up on both Yahoo and Google for any search, and the other 80 on each engine were unique in their top 100. The overall quality of the results was about even for each engine. To put this another way, there's a lot of wiggle room for a particular engine to vary the top results, and still look like they're providing the most relevant links.

To make this long story shorter, I believe that there is some sort of backend filter that affects which top results are shown by Google. This actually makes some sense, since most searchers never go beyond the first page of results (at 10 links per page). This means Google's reputation and ad revenue depend heavily on the utility of that first page. A filter that favors recency is one component of this, because Google jacks up recent forum and blog posts (and increasingly even news posts). Everyone expects this by now. Static sites such as wikipedia-watch.org must compete in this sort of environment.

In addition to the recency factor, I think there is filter weighting based on what I call "newbie searches." A newbie search is grandpa or grandma searching for single words such as "wikipedia" or "email" that normally return millions of results, which of course is useless to the searcher. Such searches are stupid to begin with, but Google must cater to stupidity in order to push ads, since ad revenue is 99 percent of total revenue. There might even be some sort of rotational weighting for newbie searches.

And call me a tin-foil hatter if you must, but I also believe that "hand jobs" are involved in tweaking this filter. In other words, there is a political dimension to it as well. Regrettably, I cannot prove this. We need more transparency from Google, and we need it now, before the situation becomes even more suspicious.

By Seth Finkelstein | posted in google | on November 17, 2008 04:23 PM (Infothought permalink)
Seth Finkelstein's Infothought blog (Wikipedia, Google, censorware, and an inside view of net-politics) - Syndicate site (subscribe, RSS)

Subscribe with Bloglines      Subscribe in NewsGator Online  Google Reader or Homepage

Comments

I guess you can add me to the Renold's Wrap brigade. We are also seeing things that suggest "hand jobs" in how google approaches certain areas of their search algorithm. To wit, have a look at how [bastard] behaves in searches with various levels of filtering.

Posted by: Tony Comstock at November 17, 2008 06:37 PM

I think this is absolutely correct (at least the parts we can verify, and even the speculations are probably true as well) but the problem is that most people are not yet in a frame of mind where they can even consider that there's a problem with Google.

Too many people think that if some product or service is the industry leader, then it *must* be giving people what they want or need the most. And similarly that if sites are showing up top-ranked in Google, then it must be because Google is serving its users efficiently, therefore those sites must be what users are most likely to want. This idea that "the marketplace works" is an article of faith among libertarians but a lot of people who do not call themselves libertarians, still seem to believe it to some degree.

I spend most of my Slashdot articles trying, as a general rule, to warn people about possible "market failures", meaning, not stock market crashes, but places where the marketplace will not deliver the best result. When AOL was going to charge senders a half a penny per message in order to bypass their spam filters, lots of defenders like Esther Dyson said that, hey, if it's a bad idea, the marketplace will let it fail automatically. It took a huge PR effort by groups like the EFF to get people to realize there was a real problem here, that market forces wouldn't solve the problem on their own because users wouldn't know about the messages they were missing.

Hammer it home wherever possible: the marketplace does not *necessarily* deliver the optimal result. Under certain conditions, it will. When those conditions are not met, it might not. You're doing great at making that point in this particular situation. Maybe if enough people hear it enough times, they'll have more healthy skepticism about the results that the marketplace delivers.

Posted by: Bennett Haselton at November 17, 2008 06:56 PM

Google rankings are fascinating to watch.

Does anyone actually believe that the majority of humans searching for soap are looking for the computer protocol?

If you look at both the wikipedia and the Google side of SOAP over the last few weeks, I believe that there is some manual tinkering going on at Google to 'juice up' the computer protocol.

Posted by: Duk at November 17, 2008 09:01 PM

Seth, if this is spamming your blog, please feel free to remove the comment. However, I've just thrown together a little YouTube video that demonstrates how my little wiki site (Google PageRank 4/10 for the home page) has managed to grossly manipulate Google (and Yahoo) search results, with no undue effort on our part. It's to our site users' advantage, so I am definitely not complaining. But, still, it is a queer phenomenon to see a tiny, four-sentence page about the Industrial Age appear #1 on a Yahoo search that returns over 20 million results. Here is the video:

http://www.youtube.com/watch?v=7DRjLm9990w

Posted by: Gregory Kohs at November 18, 2008 09:16 AM

I think people are worried about the wrong things.

1) it is google's site and they can do what they want, unless you will edit your site(s) to suit wikipedia and others with an interest if they don't like what you show
2) if you really think they are an effective monopoly in need of restraint you should go to the Federal Trade Commission
http://www.ftc.gov/
and get them regulated as a monopoly
3) you have demonstrated evidence of manual adjustment but not indicated the harm done
4) do others (yahoo et al) do anything similar

Yahoo dodgy practice 1: down here in Australia, yahoo has tied up with one of the 3 big tv channels. They don't list a website for the "lead" shows, they say go to yahoo and search for it. No manual tweaking?

It may be a fight worth fighting. But for what an open admission by google that yes they manually edit results for "commercial reasons" - have a blacklist - for stuff that they don't think people want. What outcome do you want - a published blacklist? Published algorithm/code?

If you believe they are abusing "monopoly" power - address it that way.

"We need more transparency from Google, and we need it now, before the situation becomes even more suspicious. "
Major corporation abuses power of system for profit and suppress dissent is hardly news.

Be more concerned that many people are too stupid to do anything other than google it. Unfortunately fixes for that are hard to come by.

Posted by: tqft at November 20, 2008 01:03 AM

I don't think google will roll over and die on this.

http://www.theregister.co.uk/2008/11/20/the_google_monopoly/

"We canceled the deal with about one hour to go before a lawsuit was going to be filed against our deal," Schmidt said. "We concluded after a lot of soul-searching that it was not in our best interest to go through a lengthy and costly trial which we believe we ultimately would have won."

"It's not that an antitrust suit would suddenly clarify things for advertisers. The nature of the system is that it's far too complex for Joe The Advertiser to grasp - even with algorithms in hand. But if Google were forced to open its virtual books, we would know whether it's limiting impressions for advertisers low on the totem pole, creating a kind of artificial scarcity where the prices are cheap. And we would know just how much the Mountain Viewers can juice revenues with a few algorithmic tweaks."

Posted by: tqft at November 20, 2008 09:06 PM

"The nature of the system is that it's far too complex for Joe The Advertiser to grasp even with algorithms in hand"

I'm having trouble grasping why algorithm treat [penis] and [clitoris] so differently. Under strict filtering there are 30,000,000 for [penis] and zero for [clitoris].

Sounds like more than a handjob to me -- more like digital FGM!

Posted by: Tony Comstock at November 21, 2008 03:41 PM

For gmail users enjoying some of the gmail features
what alternatives are there to switch to from gmail ?... that offer similar features without the problematical aspects of gmail

There's didn't appear to be alternatives recommended !... at
http://www.gmail-is-too-creepy.com/

Posted by: thezzak at November 24, 2008 02:17 AM