July 31, 2007

Updated "Grub" crawler code available, more Wikia Search(ploitation)

Uncommon echo: updated "grub" crawler code available, though the source is in very raw form.

Kudos to Michael Zimmer's detailed questions about privacy, and attempts at getting answers.

He said it, not me: Wikia's Outrageous Exploitation of the Human Race:

Google, Microsoft, AOL, AltaVista, Yahoo! and thousands more have something in common, can you guess what it is? That's right! They all pay for their crawlers, power bill, servers, and everything else! So why does Wales think he gets to exploit billions of computers around the web for no reason other than to make him money?

Because that's the Web 2.0 way: "Community" for the digital-sharecroppers, cash for the A-listers.

By Seth Finkelstein | posted in wikia-search | on July 31, 2007 11:59 PM | (Infothought permalink) | Comments (1)
July 27, 2007

"Grub" crawler and Wikia Search, and working for free

News: Search Wikia Takes Steps To Crawl; Acquires Grub

"Wikia, Inc., the for-profit company developing the open source search engine Search Wikia, has acquired Grub, a distributed crawler platform, from LookSmart."

Now, let's follow the money:

Specifics of the deal were not revealed, though it is part of a larger advertising deal between Wikia and LookSmart which was announced last week.

Under the deal, LookSmart will provide text and display ads in Wikia's freely hosted wiki communities, and eventually on the Search Wikia site, Wales said. Ads will be sold by Wikia on either a cost-per-click (CPC) or cost-per-thousand impressions (CPM) model. Inventory not sold by Wikia will be back-filled by ads from LookSmart's distributed ad network.

So ... the ad-provider will give some old searching technology (I downloaded the Linux version, and it seemed to be from December 2002). The legions of free citizen-workers-for-no-money will immediately improve this, again, donating their skilled labor. The company will run ads on the system the free workers build. That's called democracy?

Bonus Link: Beware The Online Collective - Jaron Lanier

The Web 2.0 notion is that an entrepreneur comes up with some scheme that attracts huge numbers of people to participate in an activity online - like the video sharing on YouTube, for instance. Then you can "monetize" at an astronomical level by offering a way to bring ads or online purchasing to people in your gigantic crowd of participants. What is amazing about this idea is that the people are the value - and they also pay for the value they provide instead of being paid for it. For instance, when you buy something that is advertized, part of the price goes to the ads - but in the new online world, you yourself were the bait for the ad you saw. The whole cycle is remarkably efficient and concentrates giant fortunes faster than any other business scheme in history.

By Seth Finkelstein | posted in wikia-search | on July 27, 2007 04:16 PM | (Infothought permalink) | Comments (13)
July 26, 2007

Commerce Committee Net Censorware Hearing - looks like posturing

I suppose I'll write a few words on the censorware story: "US Senators call for universal filtering"

The measures they are calling for include directing the Federal Communications Commission to identify industry practices "that can limit the transmission of child pornography" and requiring the Federal Trade Commission to form a working group to identify blocking and filtering technologies in use and "identify, what, if anything could be done to improve the process and better enable parents to proactively protect their children online."

I looked at the hearing's testimony, and various statements but actually didn't see much beyond a lot of speechifying. If they're calling for a report here, and working group there, all about "industry practices", I'd say it looks like more generic posturing than a serious attempt to get a new law on the table.

By Seth Finkelstein | posted in censorware | on July 26, 2007 06:41 PM | (Infothought permalink) | Comments (2)
July 24, 2007

Britannica Blog, Google PageRank, and Cites & Insights 8/2007

Walt Crawford's Cites & Insights publication issue 7:9 (August 2007) is available now, with a long article On Authority, Worth and Linkbaiting discussing the (my phrasing) Britannica Blog "link-bait" party. I want to give a slight correction to one Google aspect:

I'm focusing on the blog because of something Seth Finkelstein (and, I believe, others) have suggested: That the controversy over Michael Gorman's posts is, at least to some extent, linkbaiting -- behavior designed to increase the number of inbound links to Britannica blog, increasing its visibility on search engines. If that's true, it seems to be working: Google shows a PageRank of 7 in early July 2007, a level that usually takes a while to reach.

In fact, the time delay there is too short for a PageRank increase to show up in the public reports - the data Google usually displays for easy public consumption is typically a few months old. The Britannica blog has a PageRank 7 mostly because it's linked off the main Britannica page (which is PageRank 8) as well as article pages and similar.

In fact, the Britannica organization seems notably SEO-aware and marketing-conscious. For example, they've previously sent out a press release about "Michael Feldman blogs at Britannica site". Anyway:

Was this genuine controversy or incited controversy? ...
I will give Gorman himself the benefit of the doubt and not presume that he was setting out to incite controversy for the sake of controversy. I'm not inclined to be so generous regarding Britannica -- and, frankly, I wonder why the firm is so anxious to have a hot blog.

Well, I can't speak for them, but there's many obvious answers - e.g. to be part of the pontification (NOT "conversation" - A-listers speak down from on-high, to the audience), for the personal publicity (intellectuals are hardly immune from ego), for the product publicity (Encyclopedia Britannica is commercial product, remember), for the general awareness and promotion that comes with high Google placement, and so on.

It's actually not a bad blog on its own terms, a bit like an upscale liberal-arts type magazine. But that's not going to draw readers like taking a stick to the web-evangelist hornet's-nest will.

They do seem to read at least some blogger reactions, or so it's said :-)

# tpanelas Says: July 23rd, 2007 at 10:17 pm


Yes, we read you. You have a lot of fans at Britannica. I hope this doesn't unnerve you.


By Seth Finkelstein | posted in cyberblather | on July 24, 2007 10:32 AM | (Infothought permalink) | Comments (1)
July 21, 2007

Nigerian One-Laptop-Per-Child Porn-Browsing and Censorware

Thanks to everyone who mailed me about the "Nigerian pupils browse porn" story:

Nigerian schoolchildren who received laptops from a U.S. aid organisation have used them to explore pornographic sites on the Internet, ... [snip]

A representative of the One Laptop Per Child aid group was quoted as saying that the computers, part of a pilot scheme, would now be fitted with filters.

I wasn't sure if there was any point in my saying anything about this story, since it looked like it was going to be extensively echoed, so what would be the point in my adding my marginal voice to the chorus?

I think the "One Laptop Per Child News" (an independent site) had the best commentary about the utterly mundane situation that sex draws interest: "To focus on it this much means that the reporter really wanted a headline grabbing story or is against the project on a personal level."

I could note that adding censorware to the machines seems problematic. If these are open-source, then removing the censorware will be trivial. And who is going to control what goes on the blacklist? Etc. But it's not like anyone is going to press those issues (oh, I could try playing "citizen journalist", but there's nothing I could do if my phone calls weren't returned, or I was routed to a flack who just kept repeating a PR line).

Frankly, the One Laptop Per Child idea has always struck me as very ill-considered, a bad combination of techno-utopianism and paternalistic colonialism. But I'll get in trouble if I say what I really feel.

By Seth Finkelstein | posted in censorware | on July 21, 2007 04:45 PM | (Infothought permalink) | Comments (4)
July 18, 2007

"The Googlization of Everything" - Siva Vaidhyanathan

"The Googlization of Everything" is a new book in the works by Siva Vaidhyanathan. I'm going to get a jump by echoing it before the crowd (any resemblance between this post and Google manipulation is purely ironic ...).
[n.b. note the picture in the first link - "Snared in the Web 2.0 ... "User-generated content" is just another name for massive corporate data collection, mining, and profiling"]

Per the The Institute for the Future of the Book's fellow announcement:

Siva is one of just a handful of writers to have leveled a consistent and coherent critique of Google's expansionist policies, arguing not from the usual kneejerk copyright conservatism that has dominated the debate but from a broader cultural and historical perspective: what does it mean for one company to control so much of the world's knowledge?

As I keep saying, there's a shift, but it's from one set of gatekeepers to another set of gatekeepers.

Or, as put in a talk note

His premise was that we've come to talk about Google in theological terms, and that the Google folks themselves encourage this through their familiar "don't be evil"-type approach to their public communications. He thinks their stated aim to eventually provide universal access to all information is basically cynical at worst, unrealizable at best.

More talk elaboration:

Siva concludes his talk with a plea against technofundamentalism - the Google logic that you can always fix the problem by tweaking and innovating. This is also a plea against the myth of technological neutrality. Google is not neutral, he says, and politics are built into the black boxes of their search engines. Finally, this is a plea for Critical Information Studies - a nice start to the conference, then.

Shorter: You can't fix a social problem with a technological solution?

By Seth Finkelstein | posted in google | on July 18, 2007 11:59 PM | (Infothought permalink)
July 16, 2007

Google Roundup: Cookies, Getting Rid of Wikipedia (Results), Song

Links for the underheard, in a futile gesture to whip the Long Tail.

Did you hear? Google will lower, to two years, the expiration time of its universal spying device, I mean, cookie. It'll just link to Michael Zimmer on Google cookie expiration:

My hunch is that the brilliant data-mining minds at Google recognize that if someone hasn't searched on Google in two years, their past history probably isn't a good indicator of their current needs. So, if linking to two-year-old data isn't all that valuable, they might as well just dump the cookie altogether. It doesn't harm their data-mining needs - and it's good PR.

[See also "More of Peter Fleischer Misleading on Google Data Retention" - he said it, I didn't.]

From the everybody talks about Wikipedia taking over Google results but finally someone did something about it department:

Will Critchlow: Search Google without wikipedia - a Firefox search plugin

Here at Distilled, it's something that came up in conversation a few times, so we decided to do something about it - we have created a Firefox search plugin that enables you to search Google without getting wikipedia results

[See also the CustomizeGoogle solution]

Humor: Lauren Weinstein - "I Am the Very Model of a Modern Major Googler"

And if you're really good it seems to us that you at least possess,
The skill to quote from memory full source of the Linux OS.

[Rumor has it that this line is only a slight exaggeration of what they expect]

By Seth Finkelstein | posted in google , wikipedia | on July 16, 2007 11:59 PM | (Infothought permalink) | Comments (2)
July 13, 2007

Google Video Cache Bypasses YouTube Age Verification

Echo: http://lists.grok.org.uk/pipermail/full-disclosure/2007-July/064625.html

Youtube.com requires account creation and login before allowing visitors to view videos flagged by users as inappropriate.

Sample flagged video: http://www.youtube.com/watch?v=[video_id]
"This video or group may contain content that is inappropriate for some users, as flagged by YouTube's user community.
To view this video or group, please verify you are 18 or older by logging in or signing up."

.....alternatively, download the video directly from Google video


[h/t Google Blogoscoped forum]

I've said it before, cache is the bigggest threat to censorware.

By Seth Finkelstein | posted in censorware , google | on July 13, 2007 11:59 PM | (Infothought permalink)
July 11, 2007

My _Guardian_ column on the Britannica Blog Linkbaiting, err, "Web 2.0" Forum


Has Britannica co-opted blogging or has it been corrupted by it?

"If you can't beat 'em, join 'em. That's what the venerable Encyclopedia Britannica apparently decided to do ..."

By Seth Finkelstein | posted in cyberblather | on July 11, 2007 07:47 PM | (Infothought permalink) | Comments (2)
July 09, 2007

Wikipedia, "Lava Lamp", Trademark threats

Background: The Wikipedia article on "Lava Lamp" disappeared for two weeks, apparently due to some legal bluster about the words being trademarked. The Register ran a story about it - Brit fumes over Wikipedia, lava lamps:

Barberio acknowledges there will be cases where OTRS [complaint-handling] volunteers would be justified in keeping a complaint secret. If a person claims they're being libeled by a Wikipedia article, for instance, it stands to reason their identity shouldn't be divulged. But this was far from the case with the lava lamp article. Wales insisted that the reason for suppressing the article was posted to its "talk" page, but there doesn't seem to be a link between those discussions and the OTRS action.

Value-add: Wales posted to a Wikipedia mailing list:

Of course, in this case, the entire complaint is right there on the talk page for anyone to see, so it is pretty hard to see how much MORE of an explanation could be given.

I told The Register this quite plainly, which they admit:
"Wales insisted that the reason for suppressing the article was posted to its "talk" page, but there doesn't seem to be a link between those discussions and the OTRS action."

That's total bullshit of course. I can tell you, having seen the OTRS ticket, and talked to the person who did the blanking, that there is an EXACT link between those discussions and the OTRS action. Not that the Register ever cared to report things fairly.

While technically, he's correct, I think there's a problem with severe underestimation of the amount of effort required for a non-insider to figure out the connections in Wikipedia. I wouldn't get worked up over the specifics of this incident myself, filing it under "silly lawyer tricks". But it is an interesting little example of problems of fathoming even minor disputes.

That is, instead of a straightforward "This article temporarily gone because of legal dispute over trademark issues", there's some verbiage about "Open Ticket Request System ticket # 2007052310014607." You think, what in the world is that? And it seems mere morals can't see it anyway. Then you get pointed to the "talk" page, which has a huge amount of trivial discussion to plow through, before you get to the reason buried down in the page. I pity anyone trying to make sense of it. Especially someone who doesn't have a practiced eye in reading those sorts of pages so as to know what's chatter, and what's a legal issue serious enough to cause the article to be removed pending resolution.

The lesson, as I see it, is another proof that Wikipedia is a badly-run bureaucracy. But I'm talking to the crickets again.

By Seth Finkelstein | posted in wikipedia | on July 09, 2007 11:47 PM | (Infothought permalink) | Comments (7)
July 05, 2007

Anti-"Sicko" Google Search Ads and Google Policy

I stayed out of the blogstorm of a few days ago regarding Google [Health Advertising Blog] Criticizes Moore's "Sicko" - given the number of ultrahigh-attention sites echoing the story, anything I'd say would either be futile or (personally) dangerous.

In the aftermath, I've seen some suggestions that Google is violating its own policy by permitting critical ads to be run against a search on "Sicko", e.g.:

Sicko short on truth
Moore's movie profers a deadly Rx.
In the smart new business magazine

Checking Google's ad content policy, the relevant passage seems to be:

Ad text advocating against any organization or person (public, private, or protected) is not permitted. Stating disagreement with or campaigning against a candidate for public office, a political party or public administration is generally permissible.

The letter of the policy doesn't say anything either way about a movie. But the spirit seems to the be that "campaigning" is allowed, so they could argue it encompasses general political speech.

Frankly, I think using Google ads in a controversial political issue is just a bad idea. The following is not an implicit encouragement, but since the idea is utterly obvious, I don't think there's any reason to refrain from mentioning it - buying a political Google ad is an invitation for some militants to click them, solely to cost the advertiser money. Maybe Google doesn't care, since they'd make money too off such "protests" (on the other hand, dealing with the claims of click fraud can't be fun).

By Seth Finkelstein | posted in google | on July 05, 2007 12:34 AM | (Infothought permalink) | Comments (2)
July 04, 2007

Read Kent Newsome's "Declaration of Blogging Independence"

Echo: Declaration of Blogging Independence

They have refused to respond to conversational overtures, the most wholesome and necessary for the public good.

They have ignored posts of immediate and pressing importance, unless emailed till their Attention should be obtained; and when so emailed, they have utterly neglected to reply. ....

"Conversation" without representation is tyranny!

By Seth Finkelstein | posted in cyberblather | on July 04, 2007 11:52 PM | (Infothought permalink)