SmartFilter - I've Got A Little List

An anticensorware investigation by Seth Finkelstein

Abstract: This report examines some details as to how the censorware product SmartFilter blacklists sites. The mathematical impossibility of human review of the blacklist is demonstrated, along with empirical evidence validating this criticism. Possible programming-related reasons are put forth as to why some items are blacklisted. The general problem of investigation of censorware blacklists, in view of personal constraints and legal topics such as the The Digital Millennium Copyright Act (DMCA), is extensively discussed.

Note: The author of this paper, Seth Finkelstein, is the former chief programmer for the Censorware Project, and proud to be an honored member of Peacefire . This work, however, is independent and not associated with either organization.

Introduction


This is why I believe that the right role for Congress to play is to encourage the development of software filters that prevent my child and others from being harmed in the first place.

Recall that the basic technology we're talking about here is the computer -- the most flexible, programmable, "intelligent" technology we build and market.

-- Mike Godwin, famous Internet civil-liberties lawyer, 1995 Congressional testimony


Critics of censorware face a daunting task in exposing the flaws inherent in such programs. There's a great mystique attached to computers. They're too often thought as mysterious entities, with inner workings which are beyond comprehension.

But in fact, the operation of computer programs often follows very simple-minded rules. This will be demonstrated in examination of SmartFilter , a censorware product of the company Secure Computing (censorware is software designed and optimized for use by an authority to prevent another person from sending or receiving information).

I've Got A Little List


As some day it may happen that a victim must be found,
I've got a little list--I've got a little list
Of society offenders who might well be underground,
And who never would be missed--who never would be missed!

-- They'll None Of 'Em Be Missed , sung by Ko-Ko, the Lord High Executioner, in The Mikado


Typically, censorware contains a enormous blacklist of forbidden sites. SmartFilter claims that it "provides the industry's most comprehensive and proven database of URLs available today, containing more than 11 million non-business related URLs." SmartFilter[tm] Frequently Asked Questions In fact, I believe the number to be approximately 350,000 items, unless they somehow count an entry blacklisting one website as all URLs on that website (but this is a discussion for another time). In any case, it's important to understand the implications of such a huge number.

Consider that if you viewed a page per minute, that would be 60 pages in a hour, and so 60 * 8 = 480 pages in an eight-hour workday. Let's call it 500 pages per work-day. Thus in a full 200-day work-year, doing nothing but looking at such pages (all day, every day) you could only look at 500 * 200 = 100,000 items. Again, in a full work-year, a person could not even come close to examining the whole blacklist. Now extend the problem to consider what sites to add to the blacklist.

And yet, the product literature for SmartFilter states :

Secure Computing uses automated tools to continually search the Internet for new sites and pages that meet the content criteria for the 27 Control List categories. Candidate sites are then added to the Control List, after being viewed and approved by our Control List technicians.

These claims are mathematically impossible. But no matter how many times it's debunked, in how many ways, the myth persists ("Math is hard - Barbie" ).

Mother rapers, father stabbers, littering, and creating a nuisance


... Group W's where they put you if you may not be moral enough to join the army after committing your special crime, and there was all kinds of mean nasty ugly looking people on the bench there. Mother rapers. Father stabbers. Father rapers! ... He said, "What were you arrested for, kid?" And I said, "Littering." And they all moved away from me on the bench there, and the hairy eyeball and all kinds of mean nasty things, till I said, "And creating a nuisance."

--- from the song Alice's Restaurant , by Arlo Guthrie


And I said: "What were you blacklisted for, site?". For example, Secure Computing defines a category Extreme or Obscene:
Child Pornography: Excessive Violence / Mutilation

The Extreme category includes URLs that may fall into other categories, but push the limits of acceptability because of their particularly graphic nature. These URLs are typically extremely violent, gory, or horrific in nature and may be related to sex, bodily functions, obscenity, or perverse activities. Sites include:

You can just hear the song (... Mother rapers. Father stabbers. Father rapers! ...). Child Pornographers! And let there be no misunderstanding, there are all kinds of mean nasty ugly looking websites on the blacklist there.

But also the equivalent of littering. And creating a nuisance.

The following results are derived from examining sites blacklisted in at least the Extreme or Obscene SmartFilter blacklist category, but in fact excluding the Sex Related category. That is, the blacklist categories for examples below would include Extreme or Obscene, possibly other categories, but not Sex Related. These constraints were chosen to limit the number of reasons a site might be blacklisted, but still address issues of material possibly being banned by law in schools or libraries.

Perhaps the best example is the web site http://www.affirmation.org - "Affirmation: Gay & Lesbian Mormons" ("Affirmation serves the needs of gays, lesbians, bisexual LDS and their supportive family and friends through social and educational activities"). What could SmartFilter have against gay and lesbian Mormons? What could those Mormons have done, so extreme, to get themselves blacklisted as child pornographers or nearly so?

While the true answer may never be known, a look at the keywords used in the page provides a possible solution. They begin "Gay, Mormon, LDS, Lesbian, Bisexual, Transexual, Samesex, Church of Jesus Christ of Latter-day Saints, ...". Now computers are dumb. They do not understand context. SmartFilter's "automated tools" likely pick up on strings such as Lesbian, Bisexual, Transexual, Samesex; and on the front page is repeated "Affirmation serves the needs of gays, lesbians, bisexual LDS and their supportive family and friends through social and educational activities". So it's in the blacklist, evidently automated, the program having determined that such words mean it will push the limits of acceptability. It is highly unlikely that any "Control List technician" viewed and approved such categorization.

In fact, Extreme itself seems to be a very bad word to SmartFilter. Possibly that word itself is the reason for the blacklisting of sites such as:

I couldn't see a good reason for why SmartFilter blacklisted http://www.climbingmedia.com - "Words and Images of Climbing from around the world". But the references to Extreme Rock seem as good an explanation as anything.

My guess as to why the Mad Monk travel writers http://www.monk.com were blacklisted, is because they wrote about the "Museum of Death" as one place they visited. These aren't isolated instances. Just for example, the same simple automated keyword matching is likely the reason for the blacklisting of:

Or maybe I'm completely wrong in my theory, and some "Control List technician" just doesn't like punk rock or heavy metal or hiphop or rap or ... Because similarly blacklisted are web sites such as What do they have against the comic-book series Savage Dragon? (http://www.savagedragon.com )

I see a lot they could have against http://www.tao.ca (Tao Toronto - "... a wobbly-unionized worker co-op based in toronto" ). But my guess is that the fatal section was the portion about: creating radical solutions to the "gay rights movement".

They don't like certain artists either, blacklisting here for example http://www.giger.com - "GIGER.COM - H.R. Giger's official US Site"

Now, I'll grant that http://www.annoy.com isn't Disney. But they aren't anywhere close to obscene or child pornography either.

I was almost tempted to deem SmartFilter correct in blacklisting http://www.jerryspringer.com - "The Official Jerry Springer Show Website". But let's be fair.

Keywords galore


The simple believeth every word: but the prudent man looketh well to his going.

-- Proverbs 14:15


Moving to more general categories for a moment, contemplate this selected list of newsgroups blacklisted in the categories noted, and consider how SmartFilter likely came up with the reason: Note how broad a brush is used in these blacklistings.

The Slippery Slope Becomes Category Creep


And the lady from the provinces, who dresses like a guy,
And who "doesn't think she dances, but would rather like to try";
And that singular anomaly, the lady novelist--
I don't think she'd be missed--I'm sure she'd not be missed!
-- less well-known verses of They'll None Of 'Em Be Missed , sung sung by Ko-Ko, the Lord High Executioner, in The Mikado


Needless to say, one could say much more about the social assumptions behind this verse. While no one has yet raised objection to dispatching the cross-dressing lady, Ian Bradley tells us that [e]ven within Gilbert's lifetime there ceased to be anything either singular or anomalous about the lady novelist ...

-- from "Law as Performance" , J.M. Balkin and Sanford Levinson


Censorware blacklists provide one of the best validations ever seen, regarding the slippery slope theory of censorship. Consider the preceding examination of the Extreme or Obscene category. The words certainly sound scary. Extreme ... Obscene ... Child Pornography ... Excessive Violence ... Mutilation. Only upon very careful and precise reading does one realize that the category definition is akin to Mother rapers ... Father stabbers ... and creating a nuisance. They have mixed in very severe and legally-meaningful First Amendment terms such as Obscene and Child Pornography, with vague and broad phrases such as push the limits of acceptability and may be related to sex, bodily functions, obscenity, or perverse activities. This allows them to start the electronic book-burning with a claim of Constitutional justification. But then it reaches everything from Jerry Springer to punk rockers to difficult artists. Or even gay and lesbian Mormons.

The paragraph above might be readily dismissed as hyperbolic were the evidence not present, or the scenario being seen right now. Consider these excerpts from the Chatham-Effingham-Liberty Library Computer & Internet Policy :

Internet access is filtered by means of software* which attempts to block access to obscene and sexually explicit materials by comparing access requests to a list of specifically prohibited sites. The list of prohibited sites is maintained by the publisher of the filter software, and it is updated frequently. Filtering is used in support of the library's prohibition against using library resources to display obscene materials. ...

The CEL library uses SmartFilter, by Secure Computing, as its filtering software. All public workstations use this software to halt access to Internet sites which fall into SmartFilter's Extreme or Sex control list categories. For further explanation of the software and control lists, please consult the software publisher's web site.

Some other libraries around the country are adopting similar policies. And there is currently Congressional legislation under discussion which would encourage the development of more extensive application of censorware in public schools and libraries, by linking funding to the use of censorware in such settings. In America, the government could not directly ban any sites and newsgroups discussed above. But if a system of privatized censorship is used, the sky (or the bottom of the slippery slope) is the limit. Then state action provides a strong general incentive to employ censorware. Yet the actual banning decisions are hidden away, made by secret, unaccountable, private blacklisters.

But even the tedious and unrewarding process of exposing the content of secret blacklists involves legal and other difficulties for activists, as will be discussed in the next section.

The Problem Of Censorware Investigation And Verification


Jon Johansen: I'm 16 now, I was 15 when it happened ... and the encryption code wasn't in fact written by me, but written by the German member. There seems to be a bit of confusion about that part.

LinuxWorld: The other two people that you had worked with to make the player are remaining anonymous -- is that right?

Jon Johansen: Yes, that is correct.
...
LinuxWorld: Do you know why they want to remain anonymous?

Jon Johansen: They are both a lot older than me, and they are employed. So I guess they just didn't want the publicity, and they were perhaps afraid of getting fired.

-- LinuxWorld interview with Jon Johansen , of DeCSS/DVD fame


Originally, this report was envisioned to be much more technical in nature, discussing extensive details of SmartFilter. And therein lies a tale.

When censorware blacklists are created, sometimes the results are so ludicrous and absurd as to defy belief. Verification is a critical part of the investigation process. Often, the censorware manufacturers obviously do not want to admit to the embarrassing performance of their product. After the Peacefire "Blind Ballots" expose, the company in that case was caught making inaccurate claims that "sites mentioned in the study were not blocked" .

But working with a reporter to validate results can be dangerous. There's a significant risk that the reporter will take the work of the investigator, and present it as the reporter's own efforts. This risk is not mere speculation. Reporter Declan McCullagh was publicly called to account for this conduct by activist Bennett Haselton, in a Peacefire press release - Wired News reporter responds to plagiarism charges (full disclosure - I've worked extensively at various times with both Bennett Haselton and Declan McCullagh).

I've had many bad experiences of my own, regarding reporters. There's a saying, "Never argue with a man who buys ink by the barrel". The only reason the above event became known is that Peacefire has a gallon or two of ink of its own, and Bennett Haselton was willing to use that. But, myself, I have barely a quill-full of ink for my potential defense. I'm a programmer by trade, not a journalist or a full-time activist, and that puts me at a vast disadvantage for gladiatorial combat in the political arena.

My programmer's solution was to write some software which would allow anyone to easily check many URLs against SmartFilter's hidden blacklist. I was thinking of an "Open-source" investigation and verification. Perhaps some sort of cool collaborative Net project. After all, there's hundreds of thousands (or maybe millions) of items on such blacklists. And it's a time-consuming, labor-intensive, task to examine even a few hundred entries in detail (this again points out the mathematical impossibility of the entire blacklist "being viewed and approved by our Control List technicians"). So why not solve a few problems at once, and provide anyone (be they lawyer, programmer, journalist, or activist) with good tools to investigate the blacklist?

Yes, SmartFilter has a form on their website called SmartFilterWhere which nominally allows checking of their blacklist. But it will only take three URLs at a time. Moreover, the company can always change the blacklist file without notice. It's not very useful in conducting an extensive investigation, or to verify results which the company has an incentive to quickly change or deny.

So, one would think a SmartFilter blacklisting-investigating program was a great idea. Unfortunately, there are extensive legal risks associated with this type of technical work. Programmers have been subject to lawsuits for reverse-engineering censorware information. The most famous case is that of Matthew Skala and Eddy Jansson, discussed in the Cyber Patrol break FAQ . The most severe legal peril comes from a recent law known as the The Digital Millennium Copyright Act (DMCA)

Note contrary to some myth and misunderstanding, the DMCA is NOT the only legal risk affecting anticensorware programming . However, the DMCA (a topic itself) has the most severe and extreme implications. A while ago, I retired from volunteering as chief programmer for Censorware Project due to the skyrocketing legal risks of such work, and the lack of desperately needed support and defense (yet another topic in itself). It looked like the DMCA was creating far too many powerful grounds to sue anticensorware investigators, at least from my point of view.

Recently, as part of a rulemaking process , the Library of Congress determined investigating censorware blacklists to be one of two specific exemptions granted regarding one part of the DMCA ("circumvention") . My submission to the Library of Congress concerning my anticensorware work, and fears of legal consequences, was one of the reasons cited in the ruling as justification for the censorware exemption. I believe the effect of this exemption was to provide anticensorware work with a sort of legitimacy even greater than the Loudoun court victory against censorware in public libraries (incidentally, I project-managed and did most of the technical work for the group effort which developed a great deal of the anticensorware evidence for that victory).

BUT, there's a catch involving the DMCA exemption for censorware. As with all things law and legal, this is a fiendishly complicated subject. Does the exemption only technically apply to the actual process of investigating a censorware blacklist? (i.e, "circumvention" itself, provision "1201(a)(1)" ) There's another part of the DMCA ("1201(a)(2)" ) which deals with prohibitions against "manufacture, import, offer to the public, provide, or otherwise traffic in any technology, product, service, device, component, or part thereof, that -" roughly, are for circumvention. Yet a third section contains a a virtually identical prohibition concerning a technological measure that effectively protects a right of a copyright owner ("1201(b)(1)" ). Whether the anticensorware circumvention exemption is extremely narrow, or implies some broader protection for making tools to aid in investigating censorware, is the subject of possible future civil-liberties litigation .

At this point, I don't think I'm at unacceptable levels of legal danger in talking about the results of anticensorware investigations. But if not for the specific Library of Congress-granted censorware exemption regarding circumvention, prosecution under the DMCA anti-circumvention provision would be a major worry of mine. However, distributing software to aid in investigating censorware blacklists remains very problematic. It might be DMCA violation, not of the circumvention per se section (the censorware exemption seems to take care of that), but rather considered manufacturing or trafficking. It's unclear if the censorware exemption extends to these provisions.

After serious legal consultation, I regrettably decided to refrain from making public my SmartFilter technical details, and associated software. This is subject to further legal discussions and investigation of the topic. Right now, because of the The Digital Millennium Copyright Act (DMCA) , there won't be an "Open-Source" SmartFilter investigation.

Personally, it comes down to that fact that I'm 36, not 16. I'm professional computer programmer with a consulting business (I hate being called a "hacker", the people who apply such a term rarely mean it as a compliment, but rather to insinuate something illicit or unsavory). I don't have a good track record of generating legal and press support for myself. There's an mistaken idea that organizations such as the ACLU have cadres of top attorneys waiting around in the background, ready to move in like a SWAT team at the first hint of a civil-liberties case. Would that it were so. At best, there's chronically overworked people doing what they can. At worst, there's very nasty and vicious politics which will unhesitatingly sacrifice activists who are the wrong place at the wrong time.

If there's sufficient backing and legal support, I may release future, more technical, SmartFilter material. Otherwise, I'll have to offer my apologies, and the above explanations as to why the validation and verification is not all it could be.

Conclusion


I really hate this damned machine
I wish that they would sell it.
It never does quite what I want
But only what I tell it.

-- A Programmer's Lament


Censorware is a classic attempt to provide a technological solution to a social problem. And consistent with past history, thorough analysis often deflates the extravagant promises of the peddlers. Almost all censorware blacklists are far too big to be effectively human-reviewed. Yet, even while government imposition of censorware becomes a major factor, independent investigation of the Snake Oil claims of the censorware companies is still too fraught with legal peril.


Version 1.0, December 7 2000

See also: "SmartFilter's Greatest Evils" (censorware & privacy/anonymity)


Mail comments to: Seth Finkelstein <sethf@sethf.com>

For future information:   subscribe    to   Seth Finkelstein's Infothought list    or read the    Infothought blog

(if you subscribed a few months ago, please resubscribe due to a crash)

See more of Seth Finkelstein 's Censorware Investigations