A huge hornet's nest has been stirred up with the posting of some private emails about supposed plans by a group, "isra-pedia", to use various tactics on Wikipedia to favor pro-Israel viewpoints in various disputes. There's alleged leaked group mail (the host website is untrustworthy, but Wikipedia administrative discussion provides some evidence that the group mail is authentic). My favorite part:
Every time you see a Hamas person makes an outragous statements (like Jews came from apes or kill the jews) you write a small article about that peroson (google his name to find more ) and bring the quote from memri.
why doing all that ?
because google is wikipedia friend - 3 days after you created the article google the person's name again and voila your article will be the #1 in google for that name.
It's by no means news that Wikipedia's Google rank can be used to go after people. But it's nice to have it stated so bluntly and with such obvious intent.
Now, the plans outlined seems to have been more somebody's idea of a good manipulation scheme than anything which they were able to do. But maybe this is merely amateurs who couldn't pull it off, and got caught.
Expelled Exposed is a rebuttal to an anti-evolution films. There's what I'll call a "Google-lobbying" campaign around the search rankings:
We need to get the NCSE's [National Center for Science Education's] counter-site to the hideous little propaganda film, Expelled, to rank higher in the search engines. The way to do this is for lots and lots of you to link to the Expelled Exposed site with the word Expelled.
Note I'd say this isn't a "Google-bomb", since the target site wants the high ranking itself. And Google's algorithmic changes to defuse the bombs aren't applicable here, since the words appear extensively on the site.
On the other hand, I don't know if they'll be enough interest to have much impact, unless it becomes a cause-celebre. We'll see.
If it mattered, I'd write about other stuff, but I saw this post by Jeneane Sessum on Google ranking for a post about hamsters:
... because you are a blogger of some renown, Google makes sure your free hamsters post comes up on the FIRST PAGE of google search results for the term Free Hamsters, and that the image of your free hamster babies (who are now long since gone, as Google's memory long outlives a hamster's puny 2-3 year lifespan) will remain forever in the number one spot for Google image results ...
No I am not kidding you. A near seven-year blogging legacy, and the most traction I've gotten on any one post [...] is my baby hamster post.
Though I suspect that over the whole English-speaking world, many more people are interested in hamsters than anything having to do with "Web 2.0"/blog-marketing/etc. :-). It sorts of puts it all in perspective ...
But that ranking is actually an interesting result. At #7 for [free hamsters], #2 for ["free hamsters"]. And the page itself has very few links. Somewhere in Google's mind, it thinks this is somehow very relevant to free hamsters. More so than many pet stores which naively might be thought to dominate such a search. Very strange.
Under-echoed Google items which have crossed my screen:
"Mr. Google's Guidebook" - A long post by Tom Slee explaining in literary-story style some of the problems with herd-mentality aspects of using link-popularity. Sites which are popular then become more popular, leading to a entrenched dominance of early winners (by the way, Google in particular and search experts in general do know about this issue, and try to add in some other factors, but that leads to other problems, etc.).
Competing books: What Would Google Do? (answer: index them and sell little ads on search) - Siva Vaidhyanathan notes that, including him, there's four books coming out on (my phrasing) the Google-and-society book bandwagon.
"Google's riches rely on ads, algorithms, and worldwide confusion - Cade Metz has an extensive irreverent piece on theses topics
The Externalities of Search 2.0: The Emerging Privacy Threats when the Drive for the Perfect Search Engine meets Web 2.0 - Michael Zimmer, "... this paper argues that the drive for Search 2.0 necessarily requires the widespread monitoring and aggregation of a users’ online personal and intellectual activities, bringing with it particular externalities, such as threats to informational privacy while online." (as I've put it: "The price of total personalization is total surveillance."). It's part of Special issue of First Monday: Critical Perspectives on Web 2.0, which is all probably of interest. And yes, I love the title of Søren Mørk Petersen's article there: Loser Generated Content: From Participation to Exploitation
Tom Slee on the [Barbie] Google results and how they've changed now:
... this search is basically owned by Mattel. Clicking the top link takes you to a pink page with "Think Pink" written in the middle of it, and the majority of the sites feature pink prominently.
No more defining the cultural symbols of our day for you, nine-year-old girl! Quit the self-aware political discourse and get back to dressing that doll in gender-appropriate colours (as selected for you by Mattel).
In other words, people who point to Google results as some sort of mass-mind or harbinger of popular will, often neglect (or wilfully ignore) that there's quite an industry around them. And that industry interacts with other senses of industry, bringing us back to where we were before in terms of corporate control of media.
Shelley Powers on the spurious story claiming Google hijacks errors page
What really surprised me about this story, though, is that if people are so quick to accuse Google of 'evil' behavior in an innocuous situations like this, why was the idea of Google helping to bail out Yahoo to keep the latter out of the hands of Microsoft seen as a "good" thing? I would think a search engine monopoly in the hands of Google would be potentially more evil than Google providing useful features for default 404 error handling.
This environment is confusingly inconsistent at times.
It's a bit like how Libertarians will argue that the government is intrinsically incompetent and corrupt, but can be trusted with nuclear weapons which might literally destroy civilization as we know it. Or perhaps in general it's little things that people can see make for far better attention-getting articles than big abstract problems which are hard to conceptualize.
Also, connecting back to the "AutoLink" incident a while ago, I think there's a theme of "Don't Touch My Stuff!". You can take over the world, but don't touch the stuff. Which is actually a pretty common reaction.
http://www.guardian.co.uk/technology/2008/jan/24/searchengines.wikipedia
Even search engines have an axe to grind
"Wikia Search tries to draw on the fear and doubt stemming from the dominance of Google"
I've tried to pack a lot into this column, everything from the $50K price for the "Grub" crawler" to pointing out how the politics of search can be used for free labor. I also bent over backwards not to even seem to be using the column to retaliate against Jimmy Wales's conduct, and he ends up only being mentioned in specific for identification (sadly, as far as I've ever seen, it's never done me any good to be morally better my attackers in terms of not abusing power, but I think I read too many comics books as a kid with Good triumphing over Evil - it doesn't work that way in real life).
[User Generated Content! Let's call this a guest-post, taken from the comments in the DEBUNKING "Google Hijacked" - The Sky, err, The Internet, Is NOT Falling! thread. Note the views and opinions expressed below are those of the writer, not me, though I am broadly in agreement on many points]
Brett Glass here; you may remember me as a long time columnist for magazines such as InfoWorld, BYTE, and PC World. I'm now (among other things) running an ISP, and think that people should think about what Rogers [ISP in Canada] is doing from an ISP's perspective. I've posted some of the text below to the comment sections of a few other blogs, but want to post it here too because it's relevant.
Network neutrality means not using one's control of the pipe to disadvantage competitive content or service providers. For example, if you're a cable company that offers VoIP, network neutrality means not blocking customers' use of other VoIP providers.
Network neutrality does NOT mean that a provider can't "frame" pages (as do many providers -- especially those like Juno which provide inexpensive or free service) or send them informative messages via their browser.
Let's step back and take a dispassionate look at what Rogers is really doing here. They need to get a message to a customer. Like any experienced ISP, they know that there's a good chance that e-mail won't be read in a timely way, if at all. (We, as an ISP, find that our customers constantly change their addresses -- often after revealing them online and exposing them to spammers -- without any notice, and often let the mailboxes that we give them fill up, unread, until they exceed their quotas and no more can be received.) The Windows Message Service once worked to send users messages, but only ran on Windows and is now routinely blocked because it's become an avenue for pop-up spam. Snail mail? Expensive and slow... and the whole point of the Internet is to do things faster and more efficiently than that. Give users an special program to display messages from the ISP? Users have too many things running in the background, cluttering their computers, already -- so no one could blame them if they didn't install it. (Also, many users won't install an application for fear of viruses, and alternative operating systems likely would not run the software.) Display a different page than the user requested? Perhaps, but that certainly comes much closer to "hijacking" than what Rogers is doing. Display a message in the user's browser window (where we know he or she is looking) along with the Web page, and let the user "dismiss" it as soon as it's noticed? Excellent idea. A wonderful, simple, unobtrusive, and (IMHO) elegant solution to the problem.
Now comes Lauren Weinstein -- known for drawing attention to himself by sensationalizing tempests in a teapot -- who has never run an ISP but seems to like to dictate what they do. Lauren claims that the sky will fall if ISPs use this nearly ideal way of communicating with their customers.
Contrary to the claims of Mr. Weinstein's "network neutrality squad" (who have expanded the definition of "network neutrality" to mean "ISPs not doing anything which we, as unappointed regulators, do not approve"), this means of communication does not violate copyrights. Why? First of all, the message from the ISP appears entirely above, and separate from, the content of the page in the browser window. It's not much different that displaying it in a different pane (which, by the way, the browser might also be able to do -- but this is better because it's less obtrusive and unlikely to fail for the lack of Javascript or distort the page below). The display can't be considered a derivative work, because no human is adding his own creative expression to someone else's creation. A machine -- which can't create copyrighted works or derivative ones -- is simply putting a message above the page in the same browser window.
It isn't defacement, because the original page appears exactly as it was intended -- just farther down in the window. And it isn't "hijacking," because the user is still getting the page he or she requested.
What's more, there's no way that it can be said to be "non-neutral." The proxy which inserts the message into the window doesn't know or care what content lies below. The screen capture in Weinstein's blog showed Google, but it just as easily could have been Yahoo!, or MySpace, or Slashdot. For the same reason, it can't be said to be an invasion of privacy, because the software isn't looking at the content of the page above which it is inserting the message.
In short, to complain that this practice is somehow injurious to the author of the original page is akin to an author complaining that his book has been injured by being displayed in a shop window along with another book by someone he didn't like. (Sorry, sir, but the merchant is allowed to do that.)
Nor is what Rogers is doing a violation of an ISP's "common carrier" obligations (even if they were considered to be common carriers, which under US law, at any rate, they are not). Common carriers have been injecting notices into communications streams since time immemorial ("Please deposit 50 cents for the next 3 minutes"). And television stations have been superimposing images on program content at least since the early 1960s, when (I'm dating myself here) Sandy Becker's "Max the burglar" dashed across the screen during kids' cartoon shows and the first caller to report his presence won a prize. (The game was called "Catch Max.") And in the US, Federal law -- in particular, Section 230 of the Communications Decency Act -- protects ISPs from liability for content they retransmit whether or not they are considered to be common carriers. They do not lose this protection if there happens to be other content from a different source in the same window on the user's PC.
There are sure to be some folks -- perhaps people who are frustrated with their ISPs for other reasons -- who will take this as an opportunity to lash out at ISPs. But most customers, I think, will recognize this as a good and sensible way for a company to contact its customers. Our small ISP is looking into it. In fact, because the issue is being raised, we're adding authorization to do it to our Terms of Service, so that users will be put on notice that they might receive a message through their browsers one day. I suppose it's possible that a customer might dislike this mode of communication and go elsewhere, but I suspect that most of them will appreciate it. In the meantime, let's just say "no" to regulation of the Internet.
[I wrote this for a mailing list, before the story started spreading all
over the
usual places. I didn't even get
through there
]
Regarding Lauren Weinstein's post on "Google Hijacked -- Major ISP to Intercept and Modify Web Pages"
This is apparently not quite the danger it may appear at first glance.
The product at issue, PerfTech, seems to have been around AND USED for a while, for example:
http://www.codeamber.org/news/PR020205_2230_code_amber_perftech_press.html
Code Amber Utilizes PerfTech to Reach ISP Customers
February 2, 2005
"Code Amber (http://www.codeamber.org) and Wide Open West (WOW!) Internet and Cable last week delivered an Indiana Amber Alert to customers in the neighboring state of Ohio, enabled by a product deployed in WOW!'s network that allows the Internet provider to deliver bulletins directly to the screens of its browsing subscribers."
A look at http://www.perftech.com/press.html shows this is hardly a stealth application - they tout advertising-insertion as a *feature*, for subsidized ISP services.
Also, http://www.perftech.com/images/Press_Rls_5_26.pdf is one file with an example using *Google* ... dated March 26, *2004*.
Now, it strikes me as a very obnoxious product. But I'm so
tired of the "The Sky, err, The Internet, Is Falling!" paranoia
every time an ISP or teleco does something, anything, that can
be twisted into service for the buzzwords of Net-you-know-what.
Again, can't we be better than that?
Echo: Not dead yet: the newspaper in the days of digital anarchy by Bill Keller, executive editor, New York Times. Key passage (my emphasis):
Google News and Wikipedia don't have bureaux in Baghdad, or anywhere else. With a few exceptions, they do not, in the cold terminology of the 21st-century media business, create content Wikipedia's policy actually forbids original material; it is a great mash-up of secondary sources. Wikipedia and Google aggregate information from, well, from us. From the Times, from the Guardian, and from a lot of less dependable sources. They can pool reporting from hundreds of news outlets but what if there aren't hundreds of news outlets? Or what if many of them are simply unreliable? And how would you know? Here's an experiment you can perform at home: If you are inclined to trust Google as your source for news, Google yourself.
He's been getting raked over the coals for not making nice with the web evangelists who want to sell data-mining the audience to his company. The point he's trying to make is that aggregation isn't magic, and garbage-in, garbage-out. But sadly, in the bogosphere, nobody (with a large following) wants to hear.
The latest Google slapping of paid links has generated an intriguing aspect of "class struggle", as the intermediaries from Z-listers complain it's unfair to penalize those blogs for selling links, while not penalizing A-lister blogs which having sponsors "thank you" posts with links, essentially also paid link selling.
While neither side of that battle cares what I have to say (and it's probably not the best idea for me to get between them), it's an interesting question - what's the difference between paying for posts, and posts with links to an A-lister blog's sponsors? Perhaps surprisingly, I actually do see a difference. While the thank-you link posts are by no means completely pure, there's a lesser level of search gaming there than the individual placements for paid links. As a minor detail, typically having several links on a page dilutes the PageRank being sold. It is indeed some selling of PageRank, but not as much as a post which is devoted to a specific advertiser.
But much more importantly, it's not just the PageRank being sold, but also the sale of keywords in the links. That is:
"We'd like to thank our sponsor, BigCo [link]" is one thing, but
"We'd like to thank our sponsor, BigCo [link], which sells uPods [link], Niagra [link], and mome hortages [link]" would be quite another.
Now, you can push this if the company is named "Buy Niagra", but in general, the difference works in practice. Companies ranking higher for their own name is not a big problem, and while the extra bit of PageRank to distribute over their site is indeed ill-gotten gains, it pales in comparison to the keyword link issues.
Besides, nothing stops Google from going after the A-listers selling PageRank at some future date, after they've worked out the bugs (which seems to be substantial) from handling the Z-listers' keyword-selling.
Regarding Google's recent PageRank shake-up, where I conjectured that Pagerank In != Pagerank Out, I realized that an article a few weeks ago from Danny Sullivan (Official: Selling Paid Links Can Hurt Your PageRank Or Rankings On Google) had actually reported this effect from Google itself. I'd read the post at the time. But the implications weren't clear in the way they now make sense in retrospect (my habit of discounting oracular Googlese led me astray). Quoting the article, my emphasis:
More and more, I've been seeing people wondering if they've lost traffic on Google because they were detected to be selling paid links. However, Google's generally never penalized sites for link selling. If spotted, in most cases all Google would do is prevent links from a site or pages in a site from passing PageRank. Now that's changing. If you sell links, Google might indeed penalize your site plus drop the PageRank score that shows for it.
Note penalize is not the same as dropping the PageRank score that shows for it. So Google can drop the PageRank score that shows for it, WITHOUT penalizing the rankings of the site.
So I pinged Google, and they confirmed that PageRank scores are being lowered for some sites that sell links.
In addition, Google said that some sites that are selling links may indeed end up being dropped from its search engine or have penalties attached to prevent them from ranking well. [... snip]
By using PageRank decreases (something Google first experimented with in the SearchKing case in 2002), Google can hurt the perceived value of buying links from a particular site without harming core relevancy.
So "without harming core relevancy" apparently means what I've thought of as PageRank-In != PageRank-Out.
The market for paid links just got a whole lot more complicated :-).
The TimesSelect Reader is Jon Garfunkel's "8 parts and 21,000 words" examination of the New York Times having a premium, for-pay, service and
... whether the Times lost influence, or audience, or money over the last two years. Many entries in the blogs have been long on speculation and short on data. We have tried to fill in the data gaps here.
Readers here might particularly want to examine the section on TimesSelect, SEO, and Google:
Google may need the Times, but the Times is starting to rely on Google even more. Marshall Simmonds told me that 25% of the traffic to nytimes.com comes from all search engines.
Note also TimesSelect & Foreign Correspondence
Perhaps Friedman is more popular because he often tells us what we want to hear. Kristof tells us what we don't want to hear.
It's all an immense amount of work, deserving of extensive attention (which, not coming from an A-lister, it won't get - in fact, even if it did come from an A-lister, it probably wouldn't be read, though it'd be talked about).
I sat out the Great Google PageRank Massacre Of October 2007 during last week, where several sites, including some high-ranking blogs, saw their PageRank displayed as dramatically lower than usually (the best example was the front page of YouTube supposedly going down to a score of 3/10, a level which can usually easily be achieved by a minor blog - that was an amusing proof that at least some changes were not due to Google hand-editing results). I thought I'd wait for the data to settle before examining it. What was so interesting during the initial part of the uproar was The Silence Of The Googlers (i.e. the people who work for Google). Not a peep, and that spoke loudly.
Also significant, nobody seemed to reliably report any ill-effects from the change. Given that blogs were affected, there was of course plenty of noise, but nothing major.
Then the oracle of 'plex spoke, saying:
The partial update to visible PageRank that went out a few days ago was primarily regarding PageRank selling and the forward links of sites. So paid links that pass PageRank would affect our opinion of a site.
Going forward, I expect that Google will be looking at additional sites that appear to be buying or selling PageRank.
I speculate that Google has now formalized what they've been doing crudely before, and separated the quantities of PageRank-for-ranking and PageRank-for-transmitting. Before, if a site had a high "in" PageRank, that meant the site had a temptation to sell it. Now, a site's "out" PageRank may be minimal, now matter what the incoming linkage. As a bonus, displaying the "out" PageRank will make the displayed data even more confusing.
Danny Sullivan - Official: Selling Paid Links Can Hurt Your PageRank Or Rankings On Google
"If you sell links, Google might indeed penalize your site plus drop the PageRank score that shows for it."
I've long defended the basic accuracy of the statement "Google doesn't hand-edit results". Now, that statement obviously can't be true in the most extreme sense, otherwise they couldn't ever throw out spammers. And certainly they'll country-blacklist illegal sites. But I've been against making an reductio ad absurdum interpretation of such a statement, and then knocking down a strawman. That's not useful.
There were also lesser spam penalties. Arguably, that was merely caught up in an algorithmic sweep. But now (my emphasis):
Google stressed, by the way, that the current set of PageRank decreases is not assigned completely automatically; the majority of these decreases happened after a human review. That should help prevent false matches from happening so easily
I don't want to create false incentives, and human review is good of course. Yet I can't help thinking that we've now crossed a line here. Perhaps with the best of intentions, for the most worthy of reasons. But still, we're now on the other side of some divide.
Now, there really is someone sitting in a room thinking along the lines of : "Hmm, the algorithm says you have Pagerank 9, but looking at your site, you're using your pagerank-powers for link-profit, so let's turn it down a few notches, perhaps to Pagerank 7, so it's not quite as attractive. If in the future you prove to be a more moral vessel of our power, we may restore you to full strength."
That's a change. Good or bad, it's different from what's been the case before.
Echo: www.googlizationofeverything.com
... we should ask some hard questions about how Google is not only "creatively destroying" established players in various markets, but is also altering the very ways we see our world and ourselves.
For those who make a Google a god, recall the quote "[A computer is] like an Old Testament god, with a lot of rules and no mercy. -- Joseph Campbell". Google has many algorithmic rules, and I've seen too many people begging for mercy.
ObPunditry: Google calls for web privacy laws.
Search site Google has called on governments and business to agree [to] a basic set of global privacy rules.
In other news, foxes have called on farmers to agree to a basic set of henhouse privacy rules. They propose to standardize on "APEC principles" (Association of Poultry Eating Carnivores).
Anyway, there's no point in me rewriting what others have said better:
Google and new, international privacy rules
Franco Frattini, European Union Justice Commissioner, has set off a minor blogstorm from the following censorship proposal:
BRUSSELS (Reuters) - Internet searches for bomb-making instructions should be blocked across the European Union, the bloc's top security official said on Monday.
Internet providers should also prevent access to any site giving instructions on how to make a bomb, EU Justice and Security Commissioner Franco Frattini said in an interview.
"I do intend to carry out a clear exploring exercise with the private sector ... on how it is possible to use technology to prevent people from using or searching dangerous words like bomb, kill, genocide or terrorism," Frattini told Reuters.
Putting aside the phrasing silliness (I know, it's like blogger catnip, ha-ha-he's-so-dumb), he has been making the same noises for a while:
Voice Of America News August 2006
"But I think it is very important, for example, to explore further possibilities of blocking websites that concretely incite to commit terrorist actions or for example providing of the diffusion of expertise or knowledge about bomb making," said Frattini.
However, it turns out that an organization http://SpyBlog.org.uk has compiled a VERY LONG Q-and-A about such censorship proposals
Below is the first part of our letter to Franco Frattini, and the preliminary, general answer, by Jonathan Faull the Director General for Justice, Freedom and Security of the European Commission.
See subsequent blog postings for Questions and Answers numbers 1 to 17
I doubt my audience needs me to say anything more about the battles of censoring the Internet, and he certainly doesn't care what I think reading ...
Allen Kraus, a focus of the NY Times' Google-power article, already has a web page, as pointed out by Jon Garfunkel in his piece:
Jack Shafer of Slate (Page Rank 6/10) tells Mr. Kraus to get a web page. But the man has a web page (which I linked to as my random act of charity for the day). It's just that nobody else linked to it [Ed. note: the back links feature of Google and Yahoo is well-known to be highly inexact, er, wrong, with Yahoo being slightly better, so the link is there for dramatic effect]. And as such, his page, and his company (ImplexHealth), have a PageRank of 0/10. ...
["My readers know more than I do" :-)]
So here's another link for it.
I suppose the web propagandists, I mean, evangelists, could object that they said to start begging A-listers for links, I mean, blogging - not just have a web page. But I think the above point is powerful evidence about the scamminess of that idea, if any more was needed.
"When Bad News Follows You" is today's must-pundit article for SEO (Search Engine Optimization), about the power of top Google results to affect people's lives, even if it's misinformation (h/t RoughType).
Rather than rehash what everyone else is saying, I'll try to provide some value-adds:
I like Oliver Widder's cartoon:
Welcome to our SEO seminar - "The Truth Is On The First Page"
I normally don't like to talk about Bennett Haselton's writings due to conflict of interest issues, but this seems far enough away from any potential contention so I'll note his amusing site detailing his dispute over a New York Times article about him: PublicEditorMyAss.com
The New York Times Web site has been hosting an article since May 2000 claiming that I was fired from Microsoft in January of that year. I complained several times that this was wrong -- I wasn't fired, I quit in good standing (and, for the record, voluntarily, not some "quit now or you'll be fired" deal) -- and I showed the NYT editors a copy of my personnel file from Microsoft which has "Term. type: Voluntary" and "Term reason: Resignation" printed on it, but the paper has still not corrected the article. ...
... I also told them that recently one of my employers found the article by Googling my name and thought I had lied about my employment history, and I only dodged that bullet because my employer looked up my Microsoft reference and determined that I was telling the truth.
And sadly, I've seen many marketers pushing the response "Start a blog!". I have the impulse to tell(off) these hucksters, that ordinary people do not want to get on the blog-evangelism gatekeeper-begging attention-mongering digital-sharecropping rat-race. They have lives instead, and want to live them without (free) laboring endlessly to be manipulated and sold for the benefit of pyramid-schemers. But my saying that wouldn't be heard, so it wouldn't do any good :-(.
I made some notes as I went through the "Federal Search Commission?" paper, and since I've already given an overview of my thoughts, I decided to post these for whatever value they have in terms of the specifics of the argument, and where I believe it doesn't work. Again, basically, I sympathize with the examination of the concentration of media power. But the claims as to why it's not like other media power simply don't seem to me to be valid.
The first dimension involves an important preliminary question: what exactly is the relevant speech in relation to which search engines assert first amendment rights?
This: "If you're looking for pages about "widgets", the most relevant page is this, the second most relevant page is that, the third, etc".
When, however, the frame of reference is the supposed speech embodied in rankings the claim that regulation of search results violates the first amendment becomes highly precarious. It is highly questionable that search results constitute the kind of speech recognized to be within the ambit of the first amendment by either existing doctrine or any of the common normative theories in the field. While having an undeniable expressive element, the prevailing character of such speech is performative rather than propositional.
Regrets, I don't buy it. I don't see a way you can claim "Vote for X" is "propositional" while "The most relevant page for X is Y" is "performative". This part in the reasoning seems flawed: "To use the terminology of Robert Post, the speech of search engines as embodied in rankings is not a form of social interaction that realizes first amendment values."
That claim is problematic in a very deep sense, because if search engines rankings embody social values, then they're a form of social interaction in the relevant sense. The argument can't have it both ways, that they're expressions of the algorithm-writer's bias and prejudice for the sake of criticizing them, but not social interaction when it comes to regulation.
After all, one could say everything from tabloid newspapers to book publishing is not social interaction, in that they're monologue or pontification, not a town hall meeting.
In short, extending the compelled speech rule to cover the mere observations on relevance implied in search engine rankings seems to take the doctrine to domains where it was never meant to go.
But the problem here is taking that view in the opposite direction, to wit:
The evaluation of the value of bonds which was found to be an "opinion" in that case, while not the strongest case of an expression subject to a dialogical relationship, still has some potentially-dialogical features. Listeners can agree or disagree with the evaluation, criticize or support it, and make arguments for or against it. Search engine rankings, by contrast, are not perceived by users as an expression with which they can interact in ways characteristic of what we usually refer to as an "opinion."
Again, this just doesn't seems correct to me. Generally we have as little ability to dialog with a statement like "Standard and Poors rated this bond as junk" as "Google blacklisted this site as spam". In both cases, the mechanism used to determine the result is proprietary, and the institution offers it on a take-it-or-leave-it basis.
As in the case of the compelled-speech rule, recognizing the incidental and limited form of "opinions" implicit in search results -- i.e. opinions about relevance to users -- might cause the doctrine to spin out of control.
Right, right, got it. This idea is seen (in the reverse) in a lot in net-ranting. You can't convert every statement into protected speech by the magic of prepending "It's my opinion that ...", and so it's an opinion, which is protected speech, ha-ha-ha gotcha. Calling every statement an opinion isn't a get-out-of-regulation-free card. Understood. However, trying to turn it around in the other direction is just as bad, in that there's a problem playing off the many senses of the word "opinion". A search engine result is more like a judicial "opinion", which doesn't map exactly to the most common use of the word either.
The Google does not need me to save it, and I certainly know how its results can be gamed. But I also don't think it can be so readily categorized as somehow apart from standard journalism.
There's an interesting legal discussion concerning the paper "Federal Search Commission? Access, Fairness and Accountability in the Law of Search" by Frank Pasquale and Oren Bracha.
I find myself torn, as I'm very politically sympathetic to the issues raised by the authors. As they recognize, this is really about mass media and information gatekeeping in a democratic society. There's a whole genre of these types of paper. But they usually boil down to saying roughly the same basic things in a very elaborate way:
-1. An informed populace is important for a democratic society
0. The First Amendment forbids government regulation of political speech
1. These mass media institutions concentrate enormous political power in a few corporations, giving these businesses huge megaphones, without any effective reply by the citizenry
2. But the courts have ruled that under the First Amendment, at least for newspapers, that's just fine (e.g. the "Tornillo" case).
3. This institution is not like newspapers, because [fill in the blank].
The magic is in item #3, and sadly, I've yet to see one of these papers where I found the reasoning convincing there. The writer's problem (generically, not this paper in specific) is that they can't make it a general media analysis, since then they would be both on the wrong side of existing law, and would immediately lay themselves open to intense attack as censors. So they're forced to try to find some hairsplit, some key feature that they can claim gets them out from under that trap (myself, I think the intellectually consistent liberal solution is saying that corporations aren't persons, but that's a whole different topic).
Now, the above task isn't entirely impossible. For TV and radio, it's "spectrum scarcity" and "pervasiveness". Which supported the Fairness Doctrine, to counteract practical monopolization. However, that regulation has been gone for a long time, and any proposal to restore it brings instant oppositional targeting by professional propagandists. The only relevant TV/radio material regulation still in force - and even increased in some ways in modern times! - is prohibitions on sex and cursing (which tells you something ...).
But the authors' specific attempts to find a hairsplit for search engines (my paraphrase here) - secret algorithms, or overblown marketing claims, or Google-is-God perceptions, or defining it as not discussion among citizens - just seem to me to be playing to the discomfort that some liberal-arts types have with anything involving technology. If computer programs are covered by copyright (something that was not so evident years ago), then search engine ranking are "opinions". Arguments otherwise are easy to shoot down.
I'd suggest putting the advocacy energy into some sort of "Right Of Reply" argument - that might even be possible, though it's still very much bucking the trend.
Sean Daly sent me a notice about his Groklaw posting Update on Copiepresse v. Google. This is the case where Google News in Belgium was sued by newspapers over copyright violations. Along with analysis
... here's the official English translation of the ruling in Copiepresse vs. Google. I have linked to cited jurisprudence and essays where possible (the Belgian documents are in French, the European documents are in English and other languages).
I don't agree with many of positions taken by the person who introduces the article, but it's definitely yeoman's service to acquire and post the original sources.
So Google News has comments (for small values of comments), and it is incumbents upon everyone to comment.
Of course, this gives Google a huge amount of power in picking and choosing who will be allowed to comment. They state:
We'll be trying out a mechanism for publishing comments from a special subset of readers: those people or organizations who were actual participants in the story in question
Essentially, they're taking their function as an automatic aggregator, and adding some human ORIGINAL REPORTING in follow-up. Very minimal original reporting, but they are in effect generating their own follow-up reaction articles from the original aggregated articles.
And wow, does this create some perverse incentives that can lead to unintended consequences. I can think of one obvious result off the top of my head:
1) Get mentioned in a popular article for doing something outrageous
Then either
2) Google gives you a platform to say whatever you want
Or then
2) Scream GOOGLE IS CENSORING!!! as loud as you can, and watch the fireworks.
I'm sure there's plenty of devious schemes hatching in the minds of flacks. This is going to draw a huge amount of attention. And that draws people to manipulate it.
Echo: http://lauren.vortex.com/archive/000266.html
Greetings. As part of my continuing research and an upcoming white paper focusing on policy and related technical issues associated with search engines and their impacts, I'd very much appreciate any examples of relevant specific situations, concerns, and any other positive or negative experiences with search engine operations and support personnel, with a particular emphasis on (but not limited to) the following categories: [read the post above]
"The Googlization of Everything" is a new book in the works by
Siva Vaidhyanathan.
I'm going to get a jump by echoing it before the crowd (any resemblance between
this post and Google manipulation is purely ironic ...).
[n.b. note the picture in the first link - "Snared in the Web 2.0 ... "User-generated content" is just another name for massive corporate data collection, mining, and profiling"]
Per the The Institute for the Future of the Book's fellow announcement:
Siva is one of just a handful of writers to have leveled a consistent and coherent critique of Google's expansionist policies, arguing not from the usual kneejerk copyright conservatism that has dominated the debate but from a broader cultural and historical perspective: what does it mean for one company to control so much of the world's knowledge?
As I keep saying, there's a shift, but it's from one set of gatekeepers to another set of gatekeepers.
Or, as put in a talk note
His premise was that we've come to talk about Google in theological terms, and that the Google folks themselves encourage this through their familiar "don't be evil"-type approach to their public communications. He thinks their stated aim to eventually provide universal access to all information is basically cynical at worst, unrealizable at best.
More talk elaboration:
Siva concludes his talk with a plea against technofundamentalism - the Google logic that you can always fix the problem by tweaking and innovating. This is also a plea against the myth of technological neutrality. Google is not neutral, he says, and politics are built into the black boxes of their search engines. Finally, this is a plea for Critical Information Studies - a nice start to the conference, then.
Shorter: You can't fix a social problem with a technological solution?
Links for the underheard, in a futile gesture to whip the Long Tail.
Did you hear? Google will lower, to two years, the expiration time of its universal spying device, I mean, cookie. It'll just link to Michael Zimmer on Google cookie expiration:
My hunch is that the brilliant data-mining minds at Google recognize that if someone hasn't searched on Google in two years, their past history probably isn't a good indicator of their current needs. So, if linking to two-year-old data isn't all that valuable, they might as well just dump the cookie altogether. It doesn't harm their data-mining needs - and it's good PR.
[See also "More of Peter Fleischer Misleading on Google Data Retention" - he said it, I didn't.]
From the everybody talks about Wikipedia taking over Google results but finally someone did something about it department:
Will Critchlow: Search Google without wikipedia - a Firefox search plugin
Here at Distilled, it's something that came up in conversation a few times, so we decided to do something about it - we have created a Firefox search plugin that enables you to search Google without getting wikipedia results
[See also the CustomizeGoogle solution]
Humor: Lauren Weinstein - "I Am the Very Model of a Modern Major Googler"
And if you're really good it seems to us that you at least possess,
The skill to quote from memory full source of the Linux OS.
[Rumor has it that this line is only a slight exaggeration of what they expect]
Echo: http://lists.grok.org.uk/pipermail/full-disclosure/2007-July/064625.html
Youtube.com requires account creation and login before allowing visitors to view videos flagged by users as inappropriate.
Sample flagged video: http://www.youtube.com/watch?v=[video_id]
"This video or group may contain content that is inappropriate for some users, as flagged by YouTube's user community.
To view this video or group, please verify you are 18 or older by logging in or signing up.".....alternatively, download the video directly from Google video
http://cache.googlevideo.com/get_video?video_id=[video_id]
[h/t Google Blogoscoped forum]
I've said it before, cache is the bigggest threat to censorware.
I stayed out of the blogstorm of a few days ago regarding Google [Health Advertising Blog] Criticizes Moore's "Sicko" - given the number of ultrahigh-attention sites echoing the story, anything I'd say would either be futile or (personally) dangerous.
In the aftermath, I've seen some suggestions that Google is violating its own policy by permitting critical ads to be run against a search on "Sicko", e.g.:
Sicko short on truth
Moore's movie profers a deadly Rx.
In the smart new business magazine
www.American.com
Checking Google's ad content policy, the relevant passage seems to be:
Ad text advocating against any organization or person (public, private, or protected) is not permitted. Stating disagreement with or campaigning against a candidate for public office, a political party or public administration is generally permissible.
The letter of the policy doesn't say anything either way about a movie. But the spirit seems to the be that "campaigning" is allowed, so they could argue it encompasses general political speech.
Frankly, I think using Google ads in a controversial political issue is just a bad idea. The following is not an implicit encouragement, but since the idea is utterly obvious, I don't think there's any reason to refrain from mentioning it - buying a political Google ad is an invitation for some militants to click them, solely to cost the advertiser money. Maybe Google doesn't care, since they'd make money too off such "protests" (on the other hand, dealing with the claims of click fraud can't be fun).
Philipp Lenssen asked Google about data restrictions, and received a statement concerning "We restrict access internally in a number of ways. [details]".
I left a comment in part:
There's never going to be an official answer which says "Security? What security? We believe in open sourcing our business records. We don't take any precautions, anyone whatsoever can traipse through them at will".
It's important to understand that there's a difference between privacy, and business confidential data. Google's logs fall under both regimes. In many instances, the same incentives apply. But what happens when there's a difference? This is the argument I keep having with some of Google defender's - the Google Search Subpoena case was NOT a privacy case. Google's objections were mainly about business confidential data, which they then "spun" as privacy. Posturing about the extensive procedures Google takes to protect its business records is not wrong, but it's not about privacy either.
We don't know about what happens in serious privacy challenges. There's no way to independently check on Google's statements.
To understand the difference, consider the AT&T wiretapping case
"The Electronic Frontier Foundation (EFF) filed a class-action lawsuit against AT&T on January 31, 2006, accusing the telecom giant of violating the law and the privacy of its customers by collaborating with the National Security Agency (NSA) in its massive, illegal program to wiretap and data-mine Americans' communications."
AT&T surely could have a spokesflacker say all sorts of things about how seriously they protect customer privacy. Without some independent checks, taking such statements on faith is not warranted (pun unintended but still relevant)
http://technology.guardian.co.uk/opinion/story/0,,2107262,00.html
"The task is to prise out any abuses from behind the wall of corporate secrecy. Otherwise, we could end up with an unholy alliance between corporations and governments."
[I hate to do this to Michael Gorman, but I'm not above a little
link-baiting myself.
]
In the Britannica Blog Link-Bait party, Gorman said:
"If you can't Google it, it doesn't exist" is a common saying of Jimmy Wales and his ilk - a remark that gives shallowness a bad name. It does, however, illustrate neatly a state of mind that has turned away from learning and scholarship and swallowed -- hook, line, and sinker -- every banal piece of digital hype. There are intellectual treasures of all kinds in libraries and archives throughout the world that are not available on Google, and, because of the defects of all search engines using free-text searching, would not be retrievable using Google even if every last word in them were digitized. Mr. Wales may place no importance on anything other than information in digital form, but we owe more than that to the young. There is a life beyond the search engine -- a life of richness and nuance undreamed of in Mr. Wales's philosophy -- and all teachers at all levels of education must insist that their students use primary sources and authoritative secondary sources in their papers and studies, regardless whether these sources are digitized. Further, they should emphasize the acquisition of research and critical thinking skills applied to the human record in all its variety.
Unfortunately, before we even get to the Googling, Michael Gorman fell down here on the critical thinking skills. While he certainly can't be expected to be a Jimmy Wales worshipper, hanging on the pronouncements of the guru of work-for-free, it's pretty easy to know that Wales doesn't believe something so strawmannish as the impression given above. If anything, his general line could be attacked as being much more slick, that this stuff is bad for you if you use it to the exclusion of everything else, but you shouldn't do that (and implicitly, if you do, it's your fault, don't go blaming the wonderful wisdom of crowds for steering you wrong, you should have checked anyway).
Anyway, Michael Gorman put a correction in the comments of the thread:
I have heard from Mr. (Jimmy) Wales himself, that he not only has not written "If you can't Google it, it doesn't exist" but also that this quotation is directly opposite to his actual views. I had read the quotation attributed to him in the New Yorker article by Stacy Schiff (July 31 2006) - "Wales, in his public speeches, cites the Google test: ``If it isn't on Google, it doesn't exist''" - and had not seen the attribution disputed. However, I was remiss in not checking further before I published this essay. I apologize to Mr. Wales unreservedly and wish, not for the first time, that the saying "A lie is half way around the world before the truth has its boots on" was not so spot on.
Which started the inevitable blog mockery
The best part of this whole stupid Gorman thing yet: in a blog post on shoddy research, he misquotes Jimmy Wales based on a printed source. And has to apologize. The irony! The laughs! The sheer idiocy of this whole exercise!
I did not "misquote" Mr. Wales. I read that he had said those words in public speeches in the New Yorker article. It's probably counter to the snide ethic of blogs, but I chose to accept his statement that, despite the unrefuted statement in the New Yorker, he had not said and did not believe those words.
Now comes the problem of who do you believe? One thread commenter:
Actually, Gorman cites the New Yorker article accurately, and the New Yorker does its homework and fact-checking and interviewed Wales extensively for the piece. Funny, Wales waits one year to complain about being misquoted? waits until he's on the hot seat and being criticized in this forum? ...but he had no problem with this quote when it merely was contained in the puff-ball New Yorker piece (that also contained the Essjay lies to boot)? Hmmm... .And this reflects badly on Gorman? How convenient for Wales to remember he never said this... .(Gorman is actually being gracious and letting Jimmy off the hook! I doubt I would if I were Gorman.)
Part of the problem is provenance. The bulk of Wikipedia's content originates not in the stacks but on the Web, which offers up everything from breaking news, spin, and gossip to proof that the moon landings never took place. Glaring errors jostle quiet omissions. Wales, in his public speeches, cites the Google test: "If it isn't on Google, it doesn't exist." This position poses another difficulty: on Wikipedia, the present takes precedent over the past.
Sing: Which side are you on?
Well, it turns out this can be determined by ... THE GOOGLE. It's a little more difficult than is apparent, since it seems the reporter tightened the quote. There's no independent reference for "If it isn't on Google, it doesn't exist". What you have to search for is "it probably doesn't exist". And then one finds speech transcripts such as:
"But there are other cases where it's borderline. Where you might say, I'm not sure if this is a hoax, if this is real, is this not real, and the example here was a film called Twisted Issues, an obscure underground punk film from 1988. The funny thing is, I gave a talk just two days ago at the University of Florida, and the next day somebody wrote me and said, "Do you know I played on the soundtrack for Twisted Issues." I said, wow really, go ahead and edit the article, really, so anyway, so the first person says it's supposedly an underground punk film, but it miserably fails the Google test. So what's the Google test. You look something up in Google, and if you can't find it, then it probably doesn't exist. It's -- this is not a foolproof test, but it's pretty good. Right? There are still a few things on the planet that are not in Google. But it's pretty good. And so it fails the Google test, and it doesn't have any listing, so a couple people say, "delete, delete." And then somebody says "Hey wait wait wait wait, I found something. It's in the Film Threat Video Guide to 20 Underground Films You Must See. So maybe it has some notability. Next person down says, complete it. Next person says, it's a real movie, it's in IMDB, keep keep." So at the end of a discussion like this, this would have been kept. In fact it was kept, and the article's still there."
Verdict: From the full section above, I think Jimmy Wales is being taken out of context. He's clearly talking about a narrow circumstance of determining whether something is a hoax or not. And note in the debate Wales uses as an example, a print reference book is actually being cited as evidence.
It's all in how you use the Google, and think critically.
Echo: Search Engine Dispute Notifications: Request For Comments
Increasingly, cases are appearing of individuals and organizations being defamed or otherwise personally damaged -- lives sometimes utterly disrupted -- by purpose-built, falsified Web pages, frequently located in distant jurisdictions. ...
Question: Would it make sense for search engines, only in carefully limited, delineated, and serious situations, to provide on some search results a "Disputed Page" link to information explaining the dispute in detail, as an available middle ground between complete non-action and total page take downs?
In my view, it's a brave thought, but it won't happen. We've got to start thinking of search engines as media companies, because that's what they are (I don't claim this insight to be original - lots of people point it out in regard to their advertising business model). The search results are their content, and they do a very standard business model of selling targets ads around that content.
This then gets into the issue of speech and libel law for Internet service businesses, which is a very complicated topic. Can an algorithm output be libel, even if the human values which go into it don't contemplate the specific libel at issue? Good luck arguing that against Google's money and lawyer-buddies ...
The Privacy International "Race To The Bottom" Report touched off the expected punditry party:
Why Google?
We are aware that the decision to place Google at the bottom of the ranking is likely to be controversial, but throughout our research we have found numerous deficiencies and hostilities in Google's approach to privacy that go well beyond those of other organizations. While a number of companies share some of these negative elements, none comes close to achieving status as an endemic threat to privacy. This is in part due to the diversity and specificity of Google's product range and the ability of the company to share extracted data between these tools, and in part it is due to Google's market dominance and the sheer size of its user base.
I feel like someone should just set up some sort of system where one or two bloggers can be picked as the champion-of-battle of the inevitable reaction. As in, if you think Google is a poor misunderstood maligned gentle giant, go to Matt Cutts' Why I disagree with Privacy International. On the other hand, if you believe Google is an enormous corporation subject to all the negative aspects that come with being a huge business which has a deep interest in collecting personal data, read Shelley Powers On Privacy Redux. Danny Sullivan and Donna Bogatin can be the respective seconds.
Given that there's far more people saying things, than things to say, I'll leave it that.
Echo: "We Googled You"
Hathaway Jones's CEO has found a promising candidate to open the company's flagship store in Shanghai. Should a revelation on the Internet disqualify her now?
In brief: Managers are asked what they would do about hiring a job candidate where a Google search discloses some problematic college activism (h/t many-2-many). It's pretty interesting to read the responses ("I routinely Google people I'm going to interview or be interviewed by.").
I know what the typical Net evangelist would say, that we should all be forgiving, and get used to living in a goldfish-bowl. While that's one common sentiment, note it won't be the evangelist who suffers if they're wrong. It's far more interesting to see some of the negative thoughts of people who actually make such decisions.
A Google algorithmic quirk which spelling "corrected" searches like e.g. [he invents] to [she invents] recently got some attention, and Google has apparently now rolled out a fix for this problem.
I didn't chase after it at the time, since it seemed obviously an issue of statistics difference, and plenty of informed people were explaining that result to those who saw it as deliberate sexism. So I didn't see the need for me to say it too. There can be a long discussion of structural sexism, and the effects of the default English pronoun being "he", etc, but I had no special expertise to weigh in on the matter.
But the fix that Google has made is interesting for what it reveals about how their algorithm actually functions. As Philipp Lenssen said in the above:
(Note: no matter what Google tells you, algorithms are always influenced by those who design, write & test them)
So Google seems to have changed the way "she" is handled in their spelling suggestions.
But it turns out, from seeing what behavior remains, that Google does not
do the obvious sort of simple correction algorithm one might initially
think. That is, a search for ["she inventt"] still gets a suggestion of
Did you mean: "he invent".
Why is this significant?
Because "she" is a common English word, "inveent" is not a common English word, and the naive correction of "inveent" to "invent" should yield a suggestion of "she invent". But it seems to be doing some sort of statistical best-match for the phrase as a whole.
I supposed this is not surprising, even expected, in retrospect. But it shows it's harder than it might appear to remove all aspects of structural bias (which is not to trivialize addressing an obvious case).
Semi-digression: Google seems to special-case swear-words. A search of ["fcck you"] does NOT return the obvious correction! One rule seems to be that if the swear-word doesn't appear in the original search, it won't be suggested.
Echo: "The Social, Political, Economic, and Cultural Dimensions of Search Engines"
The newest issue of the Journal of Computer-Mediated Communication, JCMC 12(3), is a double issue. It features a special theme section on the social and cultural implications of search engines, guest edited by Eszter Hargittai, and a special theme section on CMC and religion from cross-cultural perspectives, guest edited by Charles Ess and colleagues in Japan. The 18 articles brought together on these two diverse themes have in common that they inform and enlighten.
Search engines are some of the most popular destinations on the Web - understandably so, given the vast amounts of information available to users and the need for help in sifting through online content. While the results of significant technical achievements, search engines are also embedded in social processes and institutions that influence how they function and how they are used. ...
[Disclosure: A few of the papers cite me in the references]
Echoes:
Michael Zimmer: Google's Unsatisfying Explanation for Retaining User Search Data
In sum, I applaud Google for trying to be more transparent about why it collects user data and what it does with it, but they still keep much in the dark.
[compare Why does Google retain data? Because nonexistent laws tell it to]
Google's official statement about logs
Note: "In developing this policy, we spoke with various privacy advocates, regulators and others about how long they think the period should be."
Observe the rhetorical set-up, of taking a middle ground between zero and infinity. Somebody is sure to say "never keep logs". Somebody is sure to say "keep logs forever, some investigation might find them useful". By doing whatever they felt like doing in the first place, they are compromising between the two "extremes".
George
W. Bush: A Failure Once Again, According To Google, by Danny
Sullivan at Searchengineland.com, points out that a Google search
for "failure" (not "miserable failure")
currently has a George Bush page at the top result,
due to the page having the word "failure" in it. That happened because
the http://www.whitehouse.gov/president/ page has "Latest Headlines",
which then had this part of
http://www.whitehouse.gov/news/releases/2007/04/20070403.html
"President Bush Makes Remarks on the Emergency Supplemental President Bush on Tuesday said, "In a time of war, it's irresponsible for the... Democratic leadership in Congress to delay for months on end while our troops in combat are waiting for the funds. The bottom line is this: Congress's failure ..."
And so this shows the new Google defusing algorithm uses words on a page to determine in part what's a Google bomb.
Notably, in the comments, "RedCardinal" said: "Well I think we can safely dispel any theories about this being a handjob now."
While nobody who studies Google seriously thought that they hand-edited these problematic results, Google's secrecy breeds superstition, so it's worth placing extra emphasis on the evidence that the changes were not done by a simple blacklist, but were indeed an algorithmic change.
Note this should not be taken to assume that no search engine has ever hand-edited a problematic result! But the number of algorithmic quirks vastly outweighs the rare examples, due to sheer complexity.
KinderStart v. Google, a lawswuit challenging Google's ranking algorithms, has been dismissed - hard and with sanctions against the KinderStart lawyer (h/t Eric Goldman). That last part, the with sanctions is a very significant part here. Essentially KinderStart's lawyer went so far out of bounds on some issue that the court imposed a punishment.
From a quick read of the judge's reasoning, it seems he really didn't like the charges of paid placement, and of political and religious discrimination in Google's search rankings. Google critics take note.
I know some people were rooting for KinderStart because they tried (unjustly, in my view) to position themselves as a focus of the fear of Google's power. But being the enemy of your enemy doesn't make them right.
[See also earlier post on previous dismissal here, Kinderstart vs Google lawsuit dismissed, and ranking on ranking]
Echo: Rick Skrenta of Topix, about worries regarding How Search-Engine Rules Cause Sites to Go Missing:
To say that a content site should not rely on search engine traffic -- most of which comes from Google -- is naive. The web is 10 billion pages now, with a single point of entry. That's the web the way works. If you want to have a web business, you have to acknowledge this reality. ...
Sometimes retailers get hosed because the city decides to re-pave the street their business is on. The street is infrastructure. Like it or not, Google is infrastructure on the net now. They're the source of all the foot traffic. The three words in retail are "location, location, location." The three words online are "search engine optimization." It means the same thing.
The point I want to make in echoing that, is both another proof (if any were needed) that the monopoly effect is quite real, and further that it has substantial implications way beyond web business, to what gets heard in society in general. This is repetitive, but it's worth emphasizing from the monetary angle to establish the reality.
A writer at Language Log, a group linguistics site, just wrote a post motivated by the "Jew" search. This is the controversy well-known in search circles where the anti-Semitic site "Jew Watch" used to come up as the first result in a Google search for the word "Jew"
The post's an interesting window into what someone thinks when seeing the disclaimer Google displays for that search, yet not knowing at first the history of the controversy. He ask the obvious question about why Google displays a disclaimer for that term, but not for, say, hash slurs and racial epithets ("Meanwhile, other words that have uses as offensive epithets, or are used ONLY as offensive epithets, get no warning from Google.")
The answer to that is the disclaimer was prompted by bad publicity in the specific case, not linguistic offensiveness.
There's one small error in the post - the statement "And Google HAS meddled with the search results to some extent; the site's self-description" is noticing that the results display an Open Directory description rather than the site's own description. But it's not a change which was done to tone down the results for that site.
Via Eric Goldman, the "Google v. Copiepresse, No. 06/10.928/C" Google News case decision is available, though it's in French. Perhaps someone can translate it, for the joy and happiness and civic virtue thereof. Anyway, he has some interesting commentary:
1) As I've said before, I think Google treads a lot closer to copyright's boundaries than it publicly admits. Naturally, in public, it takes the advocacy position that its offerings are clearly within copyright law, but this is hard to distinguish from cheap rhetoric. Instead, I think it's fair to say that Google pushes the edge with a lot of its services. Therefore, it should not be surprising that, given enough data points, some judges will conclude that Google has gone too far.
In addition, there an English excerpt of the case's earlier, September, decision in a post back then, at SEO by the SEA.
I should also have mentioned earlier some actual reporting by Danny Sullivan at SearchEngineLand
I have a column in The Guardian about the issue of companies basically dealing in blog PageRank-selling and link-buying (the most well-known being PayPerPost, but I don't single it out, it's just one of many).
Key point: A-listers are being disintermediated in terms of being gatekeepers for advertisers, the agency has re-intermediation, and if a page gets to the top search result from purchasing attention, almost nobody who sees that top search ranking will even know about the blogger ethics debate.
And it's not about "conversation".
Since Google-killers are in the news today, for something original, let me note that despite the hype, the search project based on a Wikipedia model of user data-mining (whatever the thing is being called these days, I think the preceding phrase is clearer), has yet to even have a development machine installed. The project's mailing list has had lots of discussion about possible approaches, but no action.
The God-King of Wikipedia says:
No firm decisions have been made. We have the test servers scheduled for install on Friday, and then I want to turn people loose on them to start playing around and testing. We need to start talking about how that should happen and who wants to be directly involved.
I had the thought that finding good developers to work for free is different from finding wannabe literary types to work for free as copyeditors. But on reflection, I realized it won't be a problem here. There's plenty of programmers who would pay for a shot at being the guy who killed the fastest gun in the West, err, Google.
Like this: RSS
Searching for Ways to Move Up in Google
A year ago the RSS Advisory Board moved to its own domain, losing all Google juice associated with its old site. Because the search term RSS is enormously popular, we've found it difficult to attract search traffic and build a decent Google pagerank. It took nearly a year to crack the top 100 for that term on Google; we're currently up to the 80s.
I'm actually dubious they can get to the top ten in Google. Especially given that the old site has the Harvard name behind it (which works for search engines too, via "trust" algorithms ...). Just one interesting little example about how social power gets replicated in search power.
Interestingly, Yahoo gets this "right" in terms of a search on [RSS] giving rssboard.org's specification page the #3 spot. I suspect that's due to their similarity algorithm picking rssboard.org as the site to display rather than Harvard (which has the #3 spot on Google).
Note the implications here: It's a lot harder to establish an a newer project if Google keeps sending people to the old one.
There's a "Back Off National Pork Board" controversy, where the National Pork Board is using a trademark claim to threaten a lawsuit against a breastfeeding activist for a T-shirt with the slogan "The Other White Milk."
But this post isn't about that.
Rather, in passing, in the SearchEngineLand article National Pork Board Goes After Breastfeeding Search Marketer, when discussing an earlier Google-Bomb article, it's noted that the post on SearchEngineLand.com about "miserable failure" DOESN'T SHOW UP (in the top 100 items) for a Google search on the terms [miserable failure].
Now, that's interesting (Danny, you've got to scream "I'M BEING CENSORED!", and get some A-list bloggers to theorize about how Google is suppressing you so as not to let out the secrets of Google bombing. Or maybe because comments in the article on how to re-ignite Google bombs are considered dangerous. Or Homeland Security had Google remove it because it was talking about bombs. Something like that ...). It's around #46 in Yahoo for [miserable failure], so some of the difference is legitimate outranking. But still, there's a divergence.
The article is in the Google index, since it comes up as #1 for the searches [Google Kills] and [Other Google Bombs]. Even #1 for [Bush Miserable] and #2 for [Failure Search].
But it's around #450 for [Google Bombs]. #390 for ["Google Bombs"].
I conclude [Miserable Failure] is in a general class of searches (like [Google Bombs]) where Google is doing something different from e.g. [Google Kills], and perhaps weighing age/trust more strongly. No reason all searches have to go through the exact same algorithm, we know that they don't. It's a coincidence this was noticed for "miserable failure" in specific.
Learn something new every day ...
SearchEngineLand reports Google defusing Google-Bombs, with a case study of "miserable failure". Google has made an algorithm change "that minimizes the impact of many Googlebombs."
Let the reverse-engineering begin!
Just as a speculation, and not tested much, here's my guess at the algorithm, *something like*:
IF the links to the page contain [BOMB] and
0) There are lot of links with anchortext [BOMB]
1) [BOMB] does not appear on the page or metadata
2) [BOMB] is the most common anchortext in links to the page
3) There are "very few" links of the form [BOMB otherwords]
THEN ignore all links with [BOMB]
This would preserve the ranking of pages talking about it, since they'll have the words on the page, even in the title.
We can test this by adding lots of links with both the expected text and [BOMB]:
George Bush: "Miserable Failure"
I didn't think of the following in time. But Martin Luther King day would have been a perfect opportunity for publicizing the efforts to fix all the links people have mistakenly made to the hate-site (thinking it was a legitimate site). From Natasha Robinson: (h/t Google Blogoscoped forum):
Good to see that MTV's Rock the Vote site took my complaint to heart and removed the link. They also wrote an apology on their blog about the link: http://blog.rockthevote.com/2007/01/last-night-rock-vote-made-mistake.html
The part that disturbs me about the apology is that the webmaster simply used ranking in Google as a means to find an authoritative site.
My emphasis in the below:
Last night, Rock the Vote made a mistake. In honor of Rev. Martin Luther King's birthday, we created a tribute from the RtV front page, as we have done every year for quite some time. To identify the external link, our webmaster searched Google and chose one of the top results, a website that, at a quick glance, appears to be a tribute to Dr. King with speeches, photos and a special emphasis on the holiday (martinlutherking.org - but don't go there). But appearances (and, apparently, popular results on Google) are deceptive. The website is a racist site that disrespects Dr. King and insults all of us who cherish his advocacy for justice. On behalf of RtV, I would like to extend our deepest apologies for this mistake. The link was immediately corrected.
Remember, hate groups can do search engine optimization and marketing too!
The Martin Luther King Google anti-hate campaign seems to be working.
By the way, people should be reminded, despite right-wing attempts to claim otherwise, Martin Luther King supported affirmative action.
The PBS MediaShift blog has an article discussing the Google / Sex Blogs incident. More interesting than the main article (summary: Bug. Fixed, they say. Google has lots of power.) is the controversy that erupted because of a decision not to link to the sex blogs quoted and discussed in the story.
A number of people have asked why there aren't links to the sex blogs mentioned in this post. If Google had been blocking the blogs, then there would have been links included. But because anyone can easily find the blogs through a search on Google, PBS.org felt it was not necessary to include the links here and risk offending some readers who might not expect to find links to explicit sites on PBS.org.
I ask that you as MediaShift readers please leave comments below explaining what you think the link policy should be here and elsewhere on mainstream media sites and blogs. Should we link to explicit material and how should that be handled? Should we include a warning before the links? Which links are OK and which are not? Your thoughts would be appreciated and I hope to return to this issue in a more in-depth way on the blog. PBS editors, who are involved in this issue, tell me they are very much open to your suggestions.
Now, as a statement of fact, "search on Google" is a cop-out here. Most of the time people don't even click on links right in front of them, much less do a search. And given that the article itself discusses the power that Google has over people being able to find sites, it's very ironic to be deferring to it after a long column about the consequences of a glitch.
Moreover, in the thread, people are pointing out examples where PBS.org did link to sexually explicit sites in other cases.
Look, if you didn't want to take the flak from right-wingers, that's understandable. Maybe not laudable, but comprehensible. Otherwise, I'd say standard web practice is unequivocal on the issue, that readers should be immediately referred to the sites discussed. I don't see any reason to override that convention - for sites that are trolling for traffic and manufacturing controversy, maybe you don't want to give it to them (but any real guilty parties in this case wouldn't care about a link from PBS.org). So put a warning about content next to the link if you must (though I think context makes it very clear). But otherwise, well, I'm now left with a lot of sympathy for why the sex bloggers tended to think Google was deliberately removing them.
[I could not resist a chance to use that title]
I spent some time trying to figure out what caused the recent sexblog kerfuffle. I noticed affected sites all seemed to link to commercial erotic sites (for example Comstock Films?).
My speculation as to what happened, is that Google's anti-spam algorithm got set a little too aggressively in terms of what sites are considered porn-spam. The twist comes that this didn't hit the affected sexbloggers directly, but indirectly, as they then got hit by a linking-to-spam penalty. That is, it's not that they were marked as spam themselves, but rather that they were suddenly seen as closely associated with porn spam.
Such an indirect change wouldn't necessarily affect all blogs which link to the spam-false-positive commercial erotic sites. It's just one factor, and other factors could override any penalty. The actual calculation involved could be very complex. No way to prove this, just a theory.
It's an amusing thought that somewhere deep in the innards of Google's anti-spam algorithm, there might be an honest-to-Potter-Stewart (I-know-it-when-I-see-it) line between "pornography" and "erotica".
Regarding Valleywag.com's original article, which seems to have done a certain amount of poisoning the well:
Some word Violet [Blue] wrote probably triggered a Google ban, inadvertently, but the search engine's rules are opaque, as is the procedure for an appeal against deletion.
Never eat at a place called "Mom's", never play cards with a man named "Doc", and don't take search engine analysis from a site called "Valleywag". There's far more to Google's criteria than simple word counting.
"SEO superstition" strikes again:
Valleywag.com:
Chronicle writer disappears in porn clampdown
The personal blog of San Francisco's Violet Blue, a sex writer published in the San Francisco Chronicle and Valleywag's sister site, has been removed from the Google index, along with several other adult sites. Tiny Nibbles, which runs a well-known annual list of the year's sexiest geeks, does not show in Google's search results, even if filters are turned off. Other sites affected include ErosBlog, a sex news site, and Comstock Films, which makes adult movies of real-life couples. The content's all legal, and naughty, rather than degrading. Some word Violet wrote probably triggered a Google ban, inadvertently, but the search engine's rules are opaque, as is the procedure for an appeal against deletion. You think there are other search engines, so that's okay? There are no other search engines.
IT'S A BUG! The sites haven't been removed from the index. If you go
further into the results, the sites are still there. They apparently
got "sandboxed" [update: "marked as spam-like"] for some unknown reason, so they're showing up
much lower than normal. Almost exactly 30 spots, in fact.
Google doesn't hate you. Really.
A good person to contact about these things is Matt Cutts.
[Sigh ... *why* *bother*? Who is going to hear me in the face of the sensationalism?]
[Update: Some of the sites are back, the Google people know about the issue. Again, this is really about spam false-positives, not censorship].
The Google SOAP API, a system for getting Google search results in a way programmers can easily use them, is no longer being supported by Google (non-techies: SOAP is a protocol, like Java is a programming language), in favor of, essentially, a web ad box (aka "AJAX API"). The system hasn't been working well for a while now, and it looks like the plug is being pulled on it.
The basic meaning of this, is that Google is telling independent search developers to get lost, in favor of billboard displayers.
Everybody talks about search-as-a-service, but few people want to do something about it. I suspect this is one of those projects where the cost to run it exceeds what people will really pay for it. I've had ideas of my own in this direction, but the economics is daunting.
Anyway, in the ensuing discussion, there's been relatively little attention paid to the projects to reverse-engineer Google's "web ad box". This mention may be useless in terms of dissemination, but I'll do it anyway:
Cracking Google AJAX Search API
Written by Matthew Wilkinson
Monday 18 December 2006 20:20:09
Recently, Google disabled the use of it's Google Search SOAP API. It now recommends that you use the Google AJAX Search API, which displays a search box on your website, much like a widget. This of course denies developers the means by which to fetch Google Search results and use them in their website. However, me and Martin Porcheron over at MPWEBWIZARD, decided to crack this new API to get some search results out of it.
There's also a screen-scraping EvilAPI (via Google Blogoscoped).
Memo to any Yahoo corporate readers: I assume you already know this, but there's a golden opportunity to grab some of the "cool" from Google here. Set up a compatible server, so anyone who has a Google SOAP API application can switch over to using Yahoo just by switching servers. Yes, it's a lot of server work for no direct revenue, and Yahoo already has a search API, and Google may make threatening legal noises. But you'll rarely have a better opportunity to grab mindshare from developers than now: "Google doesn't want you - but we do!".
The Google Employee Stock Options coverage has been a case study in uncritical thinking. I know, what else is new, but I'll say it anyway.
About the best other criticism I've found is an excellent post on SearchViews, doing time-value calculations, about the aspect of that the plan dramatically shortens the time of the option when the employee sells it.
Initially hailed as an innovative HR strategy, then called "good for investors", the option plan has received so much praise that Internet Outsider asks, "If anyone has figured out the drawbacks of Google's new transferable option plan, please weigh in, because at first glance it looks like a win all around." Though numerous 'draw backs' have been suggested, including "an employee rush for exits", "shareholder dilution" and "arrogance", I'm surprised that no one has pointed out the most important nugget from plan's fine print: [detailed calculation]
But it's almost all been echoing of Google's announcement, or confusion over what the "transfer"/sale system does - and what it does not do. For example, there is no innovation here in determining the value of a Google stock price option. There's already a big public market in trading such options. The auction is basically just to determine who is the low bidder for handling the employee option transaction, given there's some weird constraints in the process. Which bring me to one simple example, in discussing the program, where what should be grist for serious reporting has apparently passed unnoticed:
Institutional buyers, who will be invited by Google to participate, will not be able to resell the employee stock options.
No offense meant to any reporter, but what in the world does this sentence MEAN? That is, it should be a big red flag that something strange is going on. Options on a stock are bought and sold all the time. How is the institutional buyer intended to distinguish from "the employee stock options" and "the stock options bought from yesterday's sheep-shearing"?
And this connects to the earlier issue of why not just let employees sell their stock options on the open market? After thinking about it for a while, I *suspect* this has to with the connection between the options and the underlying stock, maybe that if employee options were released into the open market, they would have to be covered by the company issuing stock (or something similar). But if they're just "transfered" to an institution, they still exist in accounting format as options, so certain negative effects (from Google's point of view) are avoided.
Wouldn't you like to know what this is all about? I would. I'm sure th