March 23, 2004

More on "Belle de Jour" as fake blog

Checking other "Belle de Jour" articles, I found one which argued skepticism based on a "Gender Genie", an algorithm for allegedly determining male or female authorship. Comments pointed out the statistics are unimpressive.

So I tried testing the infamous book review, the (female author) passage of text which supposedly formed the basis of the recent identity hunt.

In the results below, there's a caveat "(NOTE: The genie works best on texts of more than 500 words.)". All book reviews were given as "nonfiction" category writing.

Words: 256

Female Score: 74
Male Score: 346

The Gender Genie thinks the author of this passage is: male!

Amusing, when I clicked on feedback submission ( "Am I right? The author of this passage is actually ..."), the results were:

That is one butch chick.

According to Koppel and Argamon, the algorithm should predict the gender of the author approximately 80% of the time.
Accuracy Results
Am I right?
yes 129165 (63.72%)
no 73542 (36.28%)

Note coin-flipping will be right 50% of the time. So 80% is interesting, but not all that amazing. And 63%, for this implementation, seems only a slight improvement on the coin-flipping algorithm.

Testing a second review:

Words: 143
Female Score: 172
Male Score: 192
The Gender Genie thinks the author of this passage is: male!

Testing a third review:

Words: 261
Female Score: 337
Male Score: 280

The Gender Genie thinks the author of this passage is: female!

One out of three is bad (though granted, these are small-word samples)

So, now testing the "Belle de Jour" first month archive:

Considered as category "fiction" or "nonfiction":

Words: 1785
Female Score: 2138
Male Score: 1936

The Gender Genie thinks the author of this passage is: female!

Considered as category "blog entry" (apparently different keywords)

Words: 1785
Female Score: 2326
Male Score: 3384

The Gender Genie thinks the author of this passage is: male!

I can't see these results as worth much at all.

By Seth Finkelstein | posted in journo | on March 23, 2004 11:30 PM (Infothought permalink)
Seth Finkelstein's Infothought blog (Wikipedia, Google, censorware, and an inside view of net-politics) - Syndicate site (subscribe, RSS)

Subscribe with Bloglines      Subscribe in NewsGator Online  Google Reader or Homepage

Comments

Note coin-flipping will be right 50% of the time.

Unfortunately, this isn't true.


Posted by: Martin at March 25, 2004 11:07 AM

Sorry to burst your bubble but my blog just came up as "male" and I KNOW I'm not male!

Could be the algorhythm is sexist? Maybe an educated, well-read female is interpreted as male?

Potentially interesting tool in literary forensics, but needs debugging.

Posted by: Diann at April 7, 2004 08:11 PM

Aha, checked out the comments link where other women said the same thing. Nevermind!

Oh yes, and this educated, well-read female actually spells it: algorithm!

Posted by: Diann at April 7, 2004 08:14 PM