January 11, 2008

Why Wikia Search Is Not Like Wikipedia Growth

I've been mulling over the it'll-get-better argument for Wikia Search, which has struck me as problematic:

When he first announced the idea, Wales said the search engine would improve over time, much like the other project he [co]founded, Wikipedia, has.

It's seemed to me that, though it's expected PR, there's something subtly wrong with that comparison, and I believe I've figured out the error.

To wit: For a search engine, a certain level of quality has to be reached for everything before it's usable for anything.

That's stated in a concise way - by "everything" I mean not "every result", but rather that a lot of things *all* have to reach at least a mediocre level: indexing, server response, ranking algorithm, anti-spam, etc. And if any single one of those factors isn't at least passable, the whole search engine is unusable in practice. Roughly, you won't even have a few good results that people in a topic area can use until all the basics are working.

Moreover, it's pretty unrewarding to work on improving internals for free. Where someone might write one Wikipedia article for the joy and happiness of having written it, it's much more difficult to get a volunteer to work for months on a ranking algorithm (note I don't claim it's impossible - students doing a class project or a thesis, or people trying to gain experience, are potential sources of unpaid development - but the free labor pool is much smaller)

So I believe the argument is wrong, in that there's a barrier of functionality, which can't be climbed incrementally with small contributions. Which again is not to say it's completely insurmountable. But that would seem to require an extensive amount of concentrated skilled development.

But then, I am one of the "negative people".

By Seth Finkelstein | posted in wikia-search | on January 11, 2008 11:58 PM (Infothought permalink)

Seth Finkelstein's Infothought blog (Wikipedia, Google, censorware, and an inside view of net-politics) - Syndicate site (subscribe, RSS)

Comments

Seth, once again you are right on. But I also think the really key thing will be the fact that Wikipedia is a NON-PROFIT project, therefore duping more slave laborers to contribute countless hours of time and passion. Wikia Search is FOR-PROFIT, with every dollar of revenue going to pay Jimbo and his minions, and to pay back Amazon's $10 million investment mistake. This "mission" is not so easily translated into slave laborers contributing countless hours of time and passion.

Is DMOZ.org a success? Wikia Search will likely be no more successful than that search project. Actually, I'm of the belief that Wikia Search will max out at about Webcrawler.com's traffic stats by the end of 2009.

Posted by: Gregory Kohs at January 12, 2008 08:55 AM

There is another barrier, based on what I see at alpha level as a user who has not signed up: there appear to be no discoverable mechanisms to improve what I see. The mini-article is simply a spam-fest, there appear to be no other tools.

That is not an alpha, alphas have functionality, even if broken. It is not a proof of concept as there is no way to prove, for a limited subset, that you can influence results.

Aside from some childish warnings, there is no obvious presentation of how this system conceives of doing battle with all the nasty people. Thinking of Wikipedia, 90% of its administrative effort seems to be defending in the trenches, but at least there is a measurable product at the other end, every proud contributor has their articles to show for their efforts. For WikiaSearch, what can you say? Hey, 'look at those beautifully ranked search results for "one born every minute"'? - I don't think, outside a techy core, there are going to be a wide group of participants, unless barnstars become really valuable.

Posted by: dogbiscuit at January 12, 2008 09:48 AM

It occurs to me that Wikia Search offers too much opportunity and temptation to game the system.

Wikipedia, which has quite a hodgepodge of rules, regulations, and referees, is already an MMPORG with a fair amount of system gaming going on.

Wikia Search appears to have an even more foolishly designed regulatory structure.

Posted by: Moulton at January 12, 2008 10:20 AM

The problems with Wikia Search are almost too numerous to mention.

1) There is no privacy policy, which is a violation of California law. The "Privacy" link is only for logged-in users who construct profiles. It allows them to restrict certain portions of their Facebook-like profile to those on their "Friends" list. This is not a privacy policy. What about everyone else? They get a two-year cookie as soon as they arrive at search.wikia.com that appears to have a unique ID in it. Does this mean that Wikia is tracking your search terms? Do they record your IP address also, possibly for future geotargeted advertising revenue? They aren't telling us.

2) There is no information for webmasters. From the looks of the search results, Wikia Search has been scraping other engines, and has yet to start any web crawling of their own. When they do start crawling, how do webmasters tell Wikia's crawler to go away through the robots.txt protocol? They aren't telling us.

3) The Mini wikipedia pages that anyone can start and anyone can edit are a potential invasion of privacy or a potential source of defamation for the subjects of such pages. Most subjects aren't keen on the idea of checking their page every day to make sure it hasn't been vandalized. Can individuals and corporations opt out of having such a page started about them? They aren't telling us.

4) Will the user profiles and/or the Mini pages be available to other search engine crawlers, and end up getting indexed on other engines? If not officially available, then are they available to rogue scrapers? What sort of security is in place to restrict access to this information? They aren't telling us.

5) The search results, by any and all conceivable measures, are next to worthless. Do they expect hordes of teenagers to descend on Wikia and do the ranking for them? They aren't tellings us.

I call on Wikia, Inc. to close down Wikia Search until such time that they are able to address these issues. In its present condition, Wikia Search amounts to little more than a honey trap for children of all ages.

Posted by: Daniel Brandt at January 12, 2008 12:35 PM

One simple explanation is that one can write pretty much ANYTHING for a new article in the Wikipedia and it's better than nothing. That doesn't work with software, the code must be very good before it can even begin to challenge the Google proposition.

Posted by: Sergey at January 12, 2008 03:23 PM

The motivation for people to contribute to Wikipedia, at least for those that don't have 1000s of edits, is some vague responsibility to the information being presented, not free culture, the sum of human knowledge, or making the internet not suck.

I don't think there is a comparable "vague responsibility" for helping a search engine deliver good results or line the pockets of people profiting from your free labor.

There is zero motivation to contribute to Wikia Search, outside of total boredom.

Posted by: BobbyB at January 12, 2008 05:17 PM

Wikia Search is the sort of idea that would sound good only to certain people who have no IT background: people like Jimmy Wales. Jimbo is clearly in over his head, and the scorn shown toward WS by tech bloggers is hardly surprising.

I say this as someone who like Jimbo has no IT background; like him, I am an "end user". However, unlike Jimbo, I have not surrounded myself with a cadre of sycophants to act as my echo chamber, and as an unpaid army to shoot down "trolls" and "FUD-mongers" (i.e., critics and anyone else that might mention that the emperor is naked). Accordingly, I can clearly see what he cannot or will not: that WS is a crap product based upon a deeply flawed concept, and that it never will be significantly improved without a complete revamp after scrapping the original concept.

Jimbo appears to be incapable of admitting to such fundamental error, and thus WS is doomed. Unlike the case of Wikipedia, Jimbo's one claim to fame, he probably really is "The Sole Founder" of WS. That is, he is the "idea man" here, instead of just the financial backer, as he initially was for WP. This one is truly his baby, which goes a long way toward explaining his increasing brittleness and intolerance of criticism. He has been counting on the for-profit WS to transform him from a millionaire to a billionaire; something the non-profit WP cannot do.

Jimbo has in essence staked his reputation in a high-stakes game of poker; a game he is bound to lose. He has deluded himself into thinking that he is the guru of "Web 3.0", when in fact he is a mendacious, credit-stealing egomaniac.

Posted by: Cedric at January 12, 2008 07:27 PM

Seth,

I'm wondering what you think of the *general* idea: improving a search by using a combination of code and people -- doesn't sound like a bad idea to me... This is what he's trying to do, isn't it? And he appears to think he can borrow the code part from others (as long as Google etc. don't mind, I don't see a problem there either).

As to the people part, I'm pretty sure I could come up with a better links list for "craigslist criticism" than Google (which appears to do little more than provide the links to the pages where the phrase "craigslist criticism" is mentioned).

Would I just hand it over to a for-profit? why would I? Now, if it were a non-profit that I trusted would stay that way for the long run... I'd be glad to help (within reason).

Delia

Posted by: Delia at January 12, 2008 11:14 PM

Cedric is spot on. Wikipedia worked because it was not mainly a programming problem, but building search engines requires some solid IT skills that are hard to buy even when you have money (just look at Microsoft).

Posted by: Anonymous at January 13, 2008 01:02 PM

Seth,
I think that the wisdom of the crowds aspect that built up Wikipedia may not translate easily to WS. With search engines there is only one rule that matters: whomever has the best results wins.

It is not a question of which search engine has the best code or which search engine has the biggest index. The users want the results and they want *relevant* results quickly. If the users don't get those relevant results, then they will not use that search engine.

The current index is supposed to be an Alpha index and the WS people have said it is junk. However I think it is a classic search engine wannabe index. Nutch has a facility to import the DMOZ.org dump as a seed for a search. So a lot of search engine wannabes throw this (and a few other URLs) into Nutch and crawl away. This does not result in a clean index.

Building a good index and keeping it clean of spam and other rubbish is a very tough and time-consuming business. Search engine operators probably spend more time cleaning their indices than detecting new sites. The detection of new sites by following links is automated anyway. Google and the other major search engines have people employed on the search quality side of things. I don't know if WS has anyone with a strong search quality or even search engine background employed.

However search quality is one possible area where the wisdom of crowds could be harnessed. Concentrated skilled development is needed but there is also a requirement for a critical mass of user feedback that helps the quality to improve. And that requires users. So in some respects, it is a Catch 22 situation.

The solution is knowing when to launch an index that is almost good enough. And that requires search engine developer expertise when making the decision. Good search engine developers are far rarer than people who want to contribute to Wikipedia.

A fundamental difference between search engines and a directories is that search engines work from the top down, classifying sites. Directories works from the bottom up, often with the contributors or operators classifying sites. A search engine relies on algorithms to classify and produce results. A directory relies mainly on people. This difference isn't so much a barrier of functionality as a barrier of scalability. Search engine developers think in terms of automation where directory operators seem to think in terms of editors. Using a search engine is like calling directory enquiries for a phone number whereas using a directory is just like leafing through the phone directory to find the number. Maybe WS can bridge this fundamental difference in outlook but it can only do it if it gets a critical mass of users.

It should be interesting to see how WS progresses. Perhaps its second index will be better.

Posted by: John McCormac at January 14, 2008 01:58 AM

Your "negative" schtick has grown on me lots in 2007. Rock on in 2008!

Posted by: hugh macleod at January 20, 2008 11:04 AM