Checking other "Belle de Jour" articles, I found one which argued skepticism based on a "Gender Genie", an algorithm for allegedly determining male or female authorship. Comments pointed out the statistics are unimpressive.
So I tried testing the infamous book review, the (female author) passage of text which supposedly formed the basis of the recent identity hunt.
In the results below, there's a caveat "(NOTE: The genie works best on texts of more than 500 words.)". All book reviews were given as "nonfiction" category writing.
Words: 256
Female Score: 74
Male Score: 346
The Gender Genie thinks the author of this passage is: male!
Amusing, when I clicked on feedback submission ( "Am I right? The author of this passage is actually ..."), the results were:
That is one butch chick.
According to Koppel and Argamon, the algorithm should predict the gender of the author approximately 80% of the time.
Accuracy Results
Am I right?
yes 129165 (63.72%)
no 73542 (36.28%)
Note coin-flipping will be right 50% of the time. So 80% is interesting, but not all that amazing. And 63%, for this implementation, seems only a slight improvement on the coin-flipping algorithm.
Testing a second review:
Words: 143
Female Score: 172
Male Score: 192
The Gender Genie thinks the author of this passage is: male!
Testing a third review:
Words: 261
Female Score: 337
Male Score: 280
The Gender Genie thinks the author of this passage is: female!
One out of three is bad (though granted, these are small-word samples)
So, now testing the "Belle de Jour" first month archive:
Considered as category "fiction" or "nonfiction":
Words: 1785
Female Score: 2138
Male Score: 1936
The Gender Genie thinks the author of this passage is: female!
Considered as category "blog entry" (apparently different keywords)
Words: 1785
Female Score: 2326
Male Score: 3384
The Gender Genie thinks the author of this passage is: male!
I can't see these results as worth much at all.
By Seth Finkelstein | posted in journo | on March 23, 2004 11:30 PM (Infothought permalink)
Note coin-flipping will be right 50% of the time.
Unfortunately, this isn't true.
Sorry to burst your bubble but my blog just came up as "male" and I KNOW I'm not male!
Could be the algorhythm is sexist? Maybe an educated, well-read female is interpreted as male?
Potentially interesting tool in literary forensics, but needs debugging.
Aha, checked out the comments link where other women said the same thing. Nevermind!
Oh yes, and this educated, well-read female actually spells it: algorithm!