February 21, 2009

Wikia Search / Yahoo results confusion example

I recently noted the Wikia Search using Yahoo operational change, speculating:
"And one wouldn't want people who go to Wikia Search currently to think the results are proof of anything other than that Yahoo has a program to allow others to use its search system. It would be pretty easy to get a misimpression along the way."

It turns out at least one person got exactly that sort of misimpression:

"... I really think this Wikia search has the ability to beat Google in some key areas. I've already discovered a few searches that Wikia Search beats Google on, and I figured I'd write one down - dreamhost wiki."

But it wasn't "Wikia Search", it was Yahoo's search.

I wonder if any reporters will be similarly fooled, and write even more Google-killer hype articles. Certainly I don't see any "Powered by Yahoo" identifier on Wikia Search now. For a project that touts "Transparency" as a main goal, that's rather ironic.

I checked the Yahoo! BOSS (Build your Own Search Service) details, and they don't require attribution, so what Wikia is doing is permitted. Still, given all the free publicity Wikia garnered, along with the storyline of killing Google (a complicated matter, something of a media invention), having it silently being so much a rebadged Yahoo! Search seems like something which, morally, should be more evident. The phrase "legal, but sleazy and unethical" comes to mind.

Disclaimer: I'm a member of the "negative people and FUD mongers".

Posted by Seth Finkelstein at 11:59 PM | Comments (3)
December 20, 2006

Google "SOAP API" / "AJAX API" - replacement projects, and a Yahoo opportunity

The Google SOAP API, a system for getting Google search results in a way programmers can easily use them, is no longer being supported by Google (non-techies: SOAP is a protocol, like Java is a programming language), in favor of, essentially, a web ad box (aka "AJAX API"). The system hasn't been working well for a while now, and it looks like the plug is being pulled on it.

The basic meaning of this, is that Google is telling independent search developers to get lost, in favor of billboard displayers.

Everybody talks about search-as-a-service, but few people want to do something about it. I suspect this is one of those projects where the cost to run it exceeds what people will really pay for it. I've had ideas of my own in this direction, but the economics is daunting.

Anyway, in the ensuing discussion, there's been relatively little attention paid to the projects to reverse-engineer Google's "web ad box". This mention may be useless in terms of dissemination, but I'll do it anyway:

Cracking Google AJAX Search API

Written by Matthew Wilkinson
Monday 18 December 2006 20:20:09

Recently, Google disabled the use of it's Google Search SOAP API. It now recommends that you use the Google AJAX Search API, which displays a search box on your website, much like a widget. This of course denies developers the means by which to fetch Google Search results and use them in their website. However, me and Martin Porcheron over at MPWEBWIZARD, decided to crack this new API to get some search results out of it.

There's also a screen-scraping EvilAPI (via Google Blogoscoped).

Memo to any Yahoo corporate readers: I assume you already know this, but there's a golden opportunity to grab some of the "cool" from Google here. Set up a compatible server, so anyone who has a Google SOAP API application can switch over to using Yahoo just by switching servers. Yes, it's a lot of server work for no direct revenue, and Yahoo already has a search API, and Google may make threatening legal noises. But you'll rarely have a better opportunity to grab mindshare from developers than now: "Google doesn't want you - but we do!".

Posted by Seth Finkelstein at 11:59 PM
May 10, 2006

Yahoo Italy Keyword Search Problem - Censorship Or Bug?

Yahoo Italy has been denying results for searching certain search keywords, reported by Jacopo Gonzales, echoed by the Google blogs ( Inside Google, Google Blogoscoped, SearchEnginewatch.com, SEW Forum)

To summarize what's known, including some of my research:

1) A few affected words have been found: "shit", "shithead", "preteen"

2) The pattern-matching is tight - searching [shit] will be denied, but [Shit], [sHit], [shIt] and [shiT] are all fine, as well as [shit shit]

3) It's very easy to see the problem at a low-level. Searching with a denied keyword generates a HTTP 302 redirect response to the Yahoo directory, whereas anything else gives a normal HTTP 200 OK response. That is

http://it.search.yahoo.com/search?p=shit

Gives a low-level HTTP response of:

Location: http://it.search.yahoo.com/search/dir?p=shit

(which is a redirection to the directory)

Someone might want to spin through wordlists to find other words (I'll pass). Though I've found [shits] and [shitting] are affected too, as well as, err, the Nabokov character (this post has enough strange keywords!)

All in all, while some people are wondering if this is a censorship issue, it looks at least partly like a bug to me. Some wordlist has gotten misplaced - "shit" is much too mild a word to be a censorship target here.

Posted by Seth Finkelstein at 11:59 PM | Comments (3)