November 17, 2008
Daniel Brandt (Scroogle, Google Watch) on Google ranking anomalies
[Below is a guest post from Daniel Brandt, who gives his experiences and speculations below. His views
are of course his own and not necessarily my own, but I do believe
them worth hearing]
There is definitely some sort of filtering going on in Google's
rankings for certain keywords. It took 18 months for any of the pages
on my wikipedia-watch.org site to rank better than 200 deep or so for
any combination of keywords from those pages. During this time, Yahoo
and Live.com were ranking the same pages well for the same terms.
When I test terms on Google, I test with a multi-threaded private tool
that checks more than 30 Google data centers on different Class Cs,
and shows the rank up to 100 on each one. I can see changes kicking in
and out as they propagate across these data centers. The transitions
can take several days in normal cases, as when a new or modified page
is appropriated into the results.
Wikipedia-watch.org has been a website now for 36 months. During the
first half of that period, no pages ranked higher than 200 deep or so,
even if you used two fairly uncommon words from that page to search
for it (this is documented at
wikipedia-watch.org/goohate.html). During the second half of that
period, after it took about four months to settle into the transition,
the deeper pages ranked okay, and were on a par with Yahoo and
Live. But there was still one glaring exception to this rule: the
search for the single word "wikipedia" failed to turn up the home page
in the first 100 results almost all of the time during this second
When it did show up, it always ranked within the top 15. When it
didn't show up, it was always greater than 100. There was never
anything in between, and I've been watching this curiosity for the
last six months now. For the first five of these months, it might kick
in for a few hours on all data centers, and then disappear. This
happened several times. Twice it kicked in for a few days, and then
disappeared from the top 100 again. During the last 30 days, it has
been in about half of the total time, for several days each time, and
then disappeared again for days. It's always one or the other -- in
the top 15 or not even in the top 100. Meanwhile, the deep pages have
ranked okay the last 18 months, and have been stable this entire time.
This behavior is something I'm seeing only for the home page, and only
on Google but not on Yahoo or Live. It happens almost exclusively when
the word "wikipedia" is the solitary search term, or maybe this one
word and another term that's also on that page. If you add a third
term you begin ranking reasonably well for my home page, presumably
because the search is now specific enough to override the
filtering. By the way, this home page has a PageRank of 5 and Yahoo
counts 3,500 external backlinks to that home page (there's a counting
tool at microsoft-watch.org/cgi-bin/ranking.htm). You cannot use
Google to count backlinks, because for years now, Google has been
deliberately suppressing this information.
I should also add here that for three years running, another site of
mine, Scroogle.org, had a tool that compared the top 100 Google
results for a search with the top 100 Yahoo results for that same
search. This may come as a surprise to some, but the divergence was
consistently 80 percent for all searches. In other words, only 20 out
of 100 links showed up on both Yahoo and Google for any search, and
the other 80 on each engine were unique in their top 100. The overall
quality of the results was about even for each engine. To put this
another way, there's a lot of wiggle room for a particular engine to
vary the top results, and still look like they're providing the most
To make this long story shorter, I believe that there is some sort of
backend filter that affects which top results are shown by
Google. This actually makes some sense, since most searchers never go
beyond the first page of results (at 10 links per page). This means
Google's reputation and ad revenue depend heavily on the utility of
that first page. A filter that favors recency is one component of
this, because Google jacks up recent forum and blog posts (and
increasingly even news posts). Everyone expects this by now. Static
sites such as wikipedia-watch.org must compete in this sort of
In addition to the recency factor, I think there is filter weighting
based on what I call "newbie searches." A newbie search is grandpa or
grandma searching for single words such as "wikipedia" or "email" that
normally return millions of results, which of course is useless to the
searcher. Such searches are stupid to begin with, but Google must
cater to stupidity in order to push ads, since ad revenue is 99
percent of total revenue. There might even be some sort of rotational
weighting for newbie searches.
And call me a tin-foil hatter if you must, but I also believe that
"hand jobs" are involved in tweaking this filter. In other words,
there is a political dimension to it as well. Regrettably, I cannot
prove this. We need more transparency from Google, and we need it now,
before the situation becomes even more suspicious.
By Seth Finkelstein |
posted in google
on November 17, 2008 04:23 PM
I guess you can add me to the Renold's Wrap brigade. We are also seeing things that suggest "hand jobs" in how google approaches certain areas of their search algorithm. To wit, have a look at how [bastard] behaves in searches with various levels of filtering.
I think this is absolutely correct (at least the parts we can verify, and even the speculations are probably true as well) but the problem is that most people are not yet in a frame of mind where they can even consider that there's a problem with Google.
Too many people think that if some product or service is the industry leader, then it *must* be giving people what they want or need the most. And similarly that if sites are showing up top-ranked in Google, then it must be because Google is serving its users efficiently, therefore those sites must be what users are most likely to want. This idea that "the marketplace works" is an article of faith among libertarians but a lot of people who do not call themselves libertarians, still seem to believe it to some degree.
I spend most of my Slashdot articles trying, as a general rule, to warn people about possible "market failures", meaning, not stock market crashes, but places where the marketplace will not deliver the best result. When AOL was going to charge senders a half a penny per message in order to bypass their spam filters, lots of defenders like Esther Dyson said that, hey, if it's a bad idea, the marketplace will let it fail automatically. It took a huge PR effort by groups like the EFF to get people to realize there was a real problem here, that market forces wouldn't solve the problem on their own because users wouldn't know about the messages they were missing.
Hammer it home wherever possible: the marketplace does not *necessarily* deliver the optimal result. Under certain conditions, it will. When those conditions are not met, it might not. You're doing great at making that point in this particular situation. Maybe if enough people hear it enough times, they'll have more healthy skepticism about the results that the marketplace delivers.
Google rankings are fascinating to watch.
Does anyone actually believe that the majority of humans searching for soap are looking for the computer protocol?
If you look at both the wikipedia and the Google side of SOAP over the last few weeks, I believe that there is some manual tinkering going on at Google to 'juice up' the computer protocol.
Seth, if this is spamming your blog, please feel free to remove the comment. However, I've just thrown together a little YouTube video that demonstrates how my little wiki site (Google PageRank 4/10 for the home page) has managed to grossly manipulate Google (and Yahoo) search results, with no undue effort on our part. It's to our site users' advantage, so I am definitely not complaining. But, still, it is a queer phenomenon to see a tiny, four-sentence page about the Industrial Age appear #1 on a Yahoo search that returns over 20 million results. Here is the video:
I think people are worried about the wrong things.
1) it is google's site and they can do what they want, unless you will edit your site(s) to suit wikipedia and others with an interest if they don't like what you show
2) if you really think they are an effective monopoly in need of restraint you should go to the Federal Trade Commission
and get them regulated as a monopoly
3) you have demonstrated evidence of manual adjustment but not indicated the harm done
4) do others (yahoo et al) do anything similar
Yahoo dodgy practice 1: down here in Australia, yahoo has tied up with one of the 3 big tv channels. They don't list a website for the "lead" shows, they say go to yahoo and search for it. No manual tweaking?
It may be a fight worth fighting. But for what an open admission by google that yes they manually edit results for "commercial reasons" - have a blacklist - for stuff that they don't think people want. What outcome do you want - a published blacklist? Published algorithm/code?
If you believe they are abusing "monopoly" power - address it that way.
"We need more transparency from Google, and we need it now, before the situation becomes even more suspicious. "
Major corporation abuses power of system for profit and suppress dissent is hardly news.
Be more concerned that many people are too stupid to do anything other than google it. Unfortunately fixes for that are hard to come by.
I don't think google will roll over and die on this.
"We canceled the deal with about one hour to go before a lawsuit was going to be filed against our deal," Schmidt said. "We concluded after a lot of soul-searching that it was not in our best interest to go through a lengthy and costly trial which we believe we ultimately would have won."
"It's not that an antitrust suit would suddenly clarify things for advertisers. The nature of the system is that it's far too complex for Joe The Advertiser to grasp - even with algorithms in hand. But if Google were forced to open its virtual books, we would know whether it's limiting impressions for advertisers low on the totem pole, creating a kind of artificial scarcity where the prices are cheap. And we would know just how much the Mountain Viewers can juice revenues with a few algorithmic tweaks."
"The nature of the system is that it's far too complex for Joe The Advertiser to grasp even with algorithms in hand"
I'm having trouble grasping why algorithm treat [penis] and [clitoris] so differently. Under strict filtering there are 30,000,000 for [penis] and zero for [clitoris].
Sounds like more than a handjob to me -- more like digital FGM!
For gmail users enjoying some of the gmail features
what alternatives are there to switch to from gmail ?... that offer similar features without the problematical aspects of gmail
There's didn't appear to be alternatives recommended !... at