April 30, 2006

The Definition Of "A Slow News Day" In The Tech Bogosphere

tech.memeorandum.com algorithm flaw

Proof of the flawed judgment of machine algorithm "news" :-).

By Seth Finkelstein | posted in mba | on April 30, 2006 02:58 PM | (Infothought permalink)
April 29, 2006

Warren Kremer Paino v. Lance Dutson, and Google keyword matching

Warren Kremer Paino v. Lance Dutson is a lawsuit by an advertising agency against the writer of the blog Maine Web Report. (source: MBA, Boston Globe).

Some key issues of the dispute appear to revolve around actions of the Maine Office of Tourism, and its Pay-Per-Click (PPC) Google advertising campaign. Lance Dutson has been criticizing this campaign on various grounds, and agency Warren Kremer Paino Advertising has sued him for "copyright infringement ... defamation and trade libel/injurious falsehood".

Here's one aspect of the case I've dug though. When a search is done for words such as [Camden Maine Bad Lawyers], the Google advertising display algorithm might match on the keywords "Camden Maine", and display the ad for that. This would not mean that the person who was buying the keywords had any particular interest in targeting "Bad Lawyers". Or the algorithm might match on the words "Bad Lawyers", which would not imply that the buyer had any interest in "Camden, Maine". There are some very broad choices as to the extent of matching which can be made by the ad-buyer. This is the background to Lance Dutson's post:

Maine Office of Tourism Corners Smut Market

Well the ads aren't down, wishful thinking on my part.

But it appears the MOT is diversifying it's target audience, maybe to make sure more good folks come to see our state. These are screenshots from Google this morning:

Then he displayed screenshots of Google searches for [camden maine child pornography], [camden maine escort], [camden maine xxx], [camden maine swingers]. These matched the "camden maine" keywords, and hence had ads for the Maine Office of Tourism ("MOT")

In a later comment (April 28) to the post, he explained:

You are completely correct, these ads were a result of broad matching. That's what I'm trying to illustrate, the folly of broad matching, because the ads end up in stupid places, like I've shown here.

However, the Maine Office of Tourism seems to have taken that post as a literal accusation that they were intentionally advertising to pedophile tourists. From the lawsuit:

11. Dutson also claimed, falsely, that WKPA expended state tourism funds for the purpose of returning internet search results for non-tourism activity, such as pornography and pedophilia.

I am not a lawyer, so I won't comment on the legal merits of such a charge. Though socially, given the relative power of the parties involved, it strikes me as an extreme overreaction.

By Seth Finkelstein | posted in google , mba | on April 29, 2006 09:39 AM | (Infothought permalink)

Media Bloggers Association member

Many months ago, I applied to be a member of the Media Bloggers Association. There's been some delays in processing applications, but the organization looks like it's becoming more active now. Friday, I was informed "Your Application to the Media Bloggers Association has been approved".

In a strange planetary alignment, I had specified on my application that "my grasp of the internals of how Google works might be a solid asset given the ongoing connections between Google, country-specific censorship, and the impact of search on journalism.". And a big project being done right now is defending a lawsuit against a blogger, Maine Web Report, where apparently major aspects of the legal dispute involve Google advertisements and associated Google algorithms. I assume this wasn't cause and effect. But rather, it might become an amusing small example of right place, right time.

So, once more to the blog, as I write about this case's Google aspects, and see how that works out.

Disclosures: Part of the membership letter reads: "As a member you are expected to read these ALERTS and give serious consideration to blogging about the alert (we don't REQUIRE members to post but this IS a key part of being a member)". Nobody's offered me any money, or even said anything to me personally about the case. Writing about Google, however, tends to be positive to me in many ways (and Google doesn't sue!).

By Seth Finkelstein | posted in mba | on April 29, 2006 09:09 AM | (Infothought permalink)
April 27, 2006

Having Splogs (Spam Blogs) Boost Your Technorati Rank

When I wrote the Google logo chocolate poker chip post, I knew the keywords might attract spammers (I can tell which of my blog posts are popular in search engines, because they're the ones which become targets for spam). But a side-effect of attracting spammers seems to be attracting splogs (spam blogs). Roughly, these are blogs which exist to try to fool search engines, and often scrape other blogs for content. And Technorati, which ranks blogs by number of other blogs linked to them, can be fooled by these spam blogs.

So my post ended up being linked to, by some of these spam blogs. Which counted towards my Technorati "Authority". Which is another sad commentary on the concept (spammers are not notable for being great judges of worth).

A non-spam blogger probably couldn't push this too much without being caught. But, for these limited purposes, it's an intriguing bit of judo that if you can't get A-listers to link to you, mindless spammers seem to work just as well.

By Seth Finkelstein | posted in spam | on April 27, 2006 06:00 PM | (Infothought permalink) | Comments (5)
April 24, 2006

Chocolate Poker Chips, the Google Logo, and Search Relevance

A Google Blogoscoped post about Google chocolate poker chips caught my attention. The description says:

We love chocolate, and occasionally we're known to play a game of poker or two. Why not combine the two and offer this fun-but-odd treat, the Google Milk Chocolate Poker Chip. Sold individually.

But that's some very expensive chocolate!
At 75 cents per each "Chocolate Poker Chip", those are priced like Google's stock (it's a complete reverse of "this item not packaged for individual retail sale").

Now, I wondered just how much the logo is costing. I've seen that sort of chocolate novelty before, unbranded. It turns out the very same basic chocolate coin, without the logo, can be had for around 19 cents ($65/345 chips). And probably even cheaper at a discount store.

However, those chips wouldn't have the "Google" logo on them. So you're paying a lot for the designer label. And that's where things get interesting from another angle. I tried to search for how much it would cost to put a corporate logo on a chocolate poker chip (wonders of the Net). However, the resulting Google search was not a pretty sight. All the "poker" spammers reduced the Google search results to a very bad hand indeed. A Yahoo search seemed a little better, but not by much.

Clusty won the results relevance battle royally. There was still a huge load of spam, but searching ["chocolate poker chip"], and selecting the "Milk Chocolate Poker Chip" had a desired result on the first page. Item #8 pointed to an internal page of a company called A La Carte, and flipping through their catalog quickly showed logo prices for decorated poker chips.

So between the base price for the customization, multiple colors on the logo label, and whatever volume deal Google might have, it seems that Google wasn't ripping off people on the price of a chocolate chip.

But again, anyone buying them is paying a lot to have that brand on a piece of chocolate. And it's not even good chocolate.

By Seth Finkelstein | posted in google | on April 24, 2006 10:13 AM | (Infothought permalink) | Comments (3)
April 20, 2006

Post1000versary, and skepticism related to growth statistics

Milestone: Post #1000. Is that half a working year in total? sad face

There was recently yet another spate of articles on blog statistics. I remain skeptical of the precise numbers, given that nobody else can examine them, as unverified reports are often wrong. But the interest is a good reason to reflect on what such growth statistics mean (especially since the press eats up the hype, and it'll be echoed many times).

While it's unarguable that there's growth, I think there's some questions as to where the growth is going. My conjecture is that it's going first to increasing numbers of young people chatting with friends (e.g. MySpace), then to generally popular pundits, then a little to local A-list BigHeads, and last of all to the Z-listers. So doubling of the total number of the bogosphere doesn't necessarily translate into doubling to the average blog-writer. It's tricky to establish this, though, because there's definitely an increase in automated retrievals of pages, and that *will* affect everyone to some extent.

It's very important to examine raw data with care. For example, I get some hundreds of image retrievals a day from various piggybackers using my site bandwidth to display icons, something which I haven't bothered about since it's relatively trivial. But if I mistakenly believed that it meant anything, I'm sure it would contribute to an impressive but meaningless number of hits (as in, "I get blah-blah unique IP addresses visiting my site per day").

There's also numbers which do not mean what you might think they mean. One aggregator-maintainer said there were around 200 subscribers to headlines from my blog. But when I checked against my own log files, it seemed that traffic from there was only one or two readers per day. The number was true. It just didn't mean what it sounded like it meant, what would be easy to believe it meant. Note this wasn't a read-by-feed issue. Rather, ~ 200 headline subscribers translated into one or two real readers.

I decided to look at some subscriber statistics compared to about six months ago

Aggravator November 08, 2005 April 20, 2006
Bloglines (.RDF feed): 189 216
Bloglines (.XML feed): 39 39
Rojo: 27 35
Newsgator: 13 30
Livejournal: 11 11

So, on that measure, there's been a roughly 20% increase in six months. Or, in absolute terms, a whole whopping *52* subscribers (am I A-list yet?). Not that I turn anyone down ... but it does present a different perspective than breathless bubbleness.

By Seth Finkelstein | posted in cyberblather , statistics | on April 20, 2006 11:59 PM | (Infothought permalink) | Comments (3)
April 13, 2006

EFF: "Unintended Consequences: Seven Years under the DMCA"

EFF has released a new DMCA report :

Unintended Consequences: Seven Years under the DMCA

I'm mentioned:

Censorware Research Obstructed

Seth Finkelstein conducts research on "censorware" software (i.e., programs that block websites that contain objectionable material), documenting flaws in such software. Finkelstein's research, for example, revealed that censorware vendor N2H2 blocked a variety of legitimate websites, evidence that assisted the ACLU in challenging a law requiring the use web filtering software by federally-funded public libraries.

N2H2 claimed that the DMCA should block researchers like Finkelstein from examining it. Finkelstein was ultimately forced to seek a DMCA exemption from the Librarian of Congress, who granted the exemption in both the 2000 and 2003 triennial rulemakings. The exemption, however, has not been a complete remedy, since it is limited to the act of circumvention, and does not permit researchers to create or distribute tools to facilitate research.

By Seth Finkelstein | posted in censorware , dmca | on April 13, 2006 12:03 PM | (Infothought permalink) | Comments (2)
April 10, 2006

Cites & Insights Spring 2006 - "Blogs, Google and [Prawn]"

Cites & Insights 6:6, Spring 2006, Walt Crawford's publication, is out. I haven't written as much about these as I wish I could. But this issue is chock-full of material to motivate me (such as several nice mentions of things I've written). For example, regarding blogs (my links):

It's probably important to say at this point that Seth Finkelstein and Jon Garfunkel are, as far as I can tell, right about what they call "gatekeepers"--within any given field, a relatively small number of bloggers commands most of the attention and, to some extent, dominates the topics under discussion. For relatively small fields, that may not be an awful situation: It's not too difficult to break into the top hundred library related blogs (or even the top fifty). But, as Finkelstein notes, that's little solace if the fields you're interested in aren't narrow fields--if you're interested in politics or the like. There, things seem to be getting worse: The chances of a single amateur to be heard aren't zero, but they're no better than in traditional media.

Elsewhere, there's a section with the obscure title of "Discovering Books" (subtitle - "The OCA/GBS Saga Continues"). Hidden away in the middle of this section is a Google Book Search (that's the "GBS") discussion compilation, including an argument with some dude named Siva [Vaidhyanathan] (since apologized for, and clarified, for hopefully less acrimony). Perhaps idiosyncratically, I found much of the section oddly disheartening. As I read through it, I spotted (what I considered to be) many significant flaws in several quoted assertions. But there's no point, or even negative incentive, to my detailing that, because (almost) nobody would hear me, and many are far more famous commentators than me. The Google Book Search debate is full of "advocacy", which makes it very difficult to sort out *accuracy*.

By Seth Finkelstein | posted in copyblight , cyberblather | on April 10, 2006 08:09 AM | (Infothought permalink) | Comments (1)
April 04, 2006

Censorware in Australia, "YesterDMCA", DMCA and censorware work

Collected noteworthy items, on censorware/DMCA and my past work.

Electronic Frontiers Australia (no relationship to US EFF) has a report out opposing a proposal by an Australian political party to require mandatory ISP censorware, if that party gets into power.


My work is cited in the middle of the report (sometimes it seems that that I'm more cited in Australia than my own country!).

Last week, the DMCA Rulemaking on Exemptions from Prohibition on Circumvention of Technological Measures had a hearing on censorware, as part of that process. As a very small milestone in my quitting activism, let the record show I did not testify. The experiment was run, the measurement's been done, the bad guys won :-(.

Speaking of the DMCA, a doggerel take-off song I wrote a while back, "YesterDMCA", has been recorded and posted to the web, by Quentin Smith. For the brave of heart:

"YesterDMCA" - audio

And touchingly, Domoni at templeofme.com wrote many kind words about my censorware work (concluding: "While I was an administrator I fought censorware locally. Seth fought it globally. I know what I'm about to say isn't enough. It's all I have to give. Seth, you have my respect. Thanks."). Thank you.

By Seth Finkelstein | posted in censorware , dmca | on April 04, 2006 02:26 PM | (Infothought permalink)