Searching Through The Great Firewall Of China

An anticensorware investigation by Seth Finkelstein

December 2002

Abstract: This report describes a simple technique which can be used with some search engines to bypass censorware bans on searching for forbidden words. Particular emphasis is placed on the situation of the Great Firewall Of China.

Introduction

Censorware, more euphemistically called "filtering", is most widely known for banning the reading of sites with forbidden content. However, while such site bans are a primary aspect of censorware, another often-seen type of control is to forbid the use of search engines. This facility has received much press, stemming from the time that the Chinese government banned search engines such as Google and Altavista. (see Edelman and Zittrain , Replacement of Google with Alternative Search Systems in China ) But, contrary perhaps to some impressions, China was not doing anything unprecedented here or in any way innovative in terms of censorware. The ability to ban the use of search engines, for being search engines, is in fact a standard feature of many censorware programs.

The reasons for search engines to be regarded with suspicion should be evident. They may point a way to sites with prohibited content, which have not yet been put on the censorware blacklist. Some, such as Google, have a cache of web pages, which represents a loophole in censorware control.

When total search-engine bans are not feasible (in a political, not technological, sense), the control may be reduced to denying the ability to seek out information based on certain terms. For example, in China, trying to do a search for the word "falun" (from the forbidden "Falun Gong") may be banned See the discussion in Edelman and Zittrain, Empirical Analysis of Internet Filtering in China or Amnesty International , State Control Of The Internet In China .

This report shows a simple technique to bypass such searching prohibitions, using an undocumented ability of certain search engines (unfortunately, not Google).

Theory and Practice

Almost all search requests to search engines are done via using an HTTP method known as "GET". This is a simple protocol which puts the requested data in the URL itself. When censorware prohibits searching of certain words, it typically checks parameters in the URL, as that is how the search request is ordinarily made.

However, there is another method for sending such data to a server, known as "POST". Data sent by this method is not in the URL, but transmitted via another channel (called "standard input"). This channel is the means by which documents are normally uploaded and downloaded, so is typically not checked by censorware (though there are exceptions).

It turns out that though many search engine entry forms send their search data via the "GET" method, they can easily be converted to send that data via the "POST" method.

The procedure to do this conversion is a simple modification of the HTML used. Although it requires a little familiarity with HTML editing, it is straightforward:

1) Go to the search engine page, say http://www.alltheweb.com/advanced
2) Save the page, as HTML, to a file. Edit this file as follows.
3) Look for an HTML tag "<head>". On the line after this tag, add a new line starting <base href=", then containing the URL of the search engine page, then finish with ">, here
<base href="http://www.alltheweb.com/advanced">
4) Look for an HTML tag which starts with the characters
<form
a) If there is a
method="GET"
string on that line, change it to
method="POST"
b) If there is no
method=
string, add a
method="POST"
string right after the
<form
portion. So the result would start out:
<form method="POST"

Load this new file. Any searches typed into this form should now be sent as text data, and bypass the prohibitions of many censorware programs.

This change in method doesn't carry through to any search results screens. That is, once the results are returned, clicking that page for the next screen of results would still use the "GET" method, and so run afoul of the censorware search prohibitions. That second results screen would have to be saved and edited again per the procedure above. But if the number of search results is set high, there should be little need for such repetition.

Limitations

Note this procedure unfortunately does not work with Google, in that Google will not accept a "POST" method request. Although the same idea of search data as text information can probably be applied to Google's own API, using that is much more complicated than the changes described here.

In addition, Chinese users might find it a good idea to turn-off automatic image loading in their browsers. Sometimes "in-line images" for advertisements send back data in their image URLs, which can activate keyword-based prohibitions. Turning off image loading is typically a browsers Preferences menu option. In the browsers Mozilla, it's under "Privacy & Security" or "Advanced" then "Images". The "Image Acceptance Policy" setting should be at least "Accept images that come from the originating server only", possibly "Do not load any images".

Examples

The examples below demonstrate the critical elements of the modified HTML. Complete copies of the search interface are not given, in order to avoid any copyright problems. Only minimal HTML is used. However, the examples below are fully functional code, and illustrate the key changes needed.

alltheweb

This is a simple, stripped-down example for the search engine alltheweb.com ( alltheweb example source code ). Anything typed in the search box will be sent to alltheweb.com via the "POST" method.

altavista

Similarly, the following example will do a simple "POST" method search for altavista.com ( altavista example source code ).

Conclusion

This technique will not work forever. But for now, it represents a persistent hole in The Great Firewall Of China.

Mail comments to: Seth Finkelstein <sethf@sethf.com>

For future information: subscribe to Seth Finkelstein's Infothought list or read the Infothought blog

(if you subscribed a few months ago, please resubscribe due to a crash)

See more of Seth Finkelstein 's Censorware Investigations