September 22, 2002

SpamAssassin and Crypto-Gram

Seth David Schoen has noted

Schneier's Crypto-Gram is getting flagged as spam by Razor. The reason is that some spam-detecting software will try to automatically detect spam and then automatically report it. So somebody's SpamAssassin mistakenly concludes that a copy of Crypto-Gram is spam and reports it to Razor, and this happens a few times over; now everyone who uses Razor will automatically be advised that Razor considers Crypto-Gram to be spam!

I've been looking at SpamAssassin, and indeed, it does flag the latest Crypto-Gram Newsletter as spam, given the default threshold. Here's which tests are being triggered (information given by SpamAssassin) and why (information not given by SpamAssassin, but can be found from simple investigation since it's open-source). This is from version 2.31:

SPAM: DOUBLE_CAPSWORD (1.1 points) BODY: A word in all caps repeated on the line

"Boolean functions of AES, which could possibly be used to break AES. But"
"called BES that treats each AES byte as an 8-byte vector. BES operates on"
"A new company, PGP Corp., has purchased PGP from Network Associates."

SPAM: PORN_10 (0.6 points) BODY: Uses words and phrases which indicate porn

"by pedophiles, child pornographers, cultists, occultists, drug pushers and"

SPAM: ONE_HUNDRED_PC_FREE (3.4 points) BODY: No such thing as a free lunch

"There's a new Twofish C library, written by Niels Ferguson. The main differences with existing code available is that this one is fully portable, easy to integrate, well documented, and contains extensive self-tests. And it's 100% free."

SPAM: PORN_3 (0.5 points) Uses words and phrases which indicate porn

(?i-xsm:\bporn) : "by pedophiles, child pornographers, cultists, occultists, drug pushers and"
(?i-xsm:\bsex+) : "with the sexual words you'd expect -- I won't print them because too many"
(?i-xsm:\blive) : "complex machinery. Their primary duty is to protect the lives and"
(?i-xsm:\baction) : "any criminal or civil action for disabling, interfering with,"

So, more than 5 points ... SPAM (at default levels)

This is not good.

By Seth Finkelstein | posted in spam | on September 22, 2002 06:40 PM (Infothought permalink)

Seth Finkelstein's Infothought blog (Wikipedia, Google, censorware, and an inside view of net-politics) - Syndicate site (subscribe, RSS)

Subscribe with Bloglines      Subscribe in NewsGator Online  Google Reader or Homepage