Update 10/28: The White House says it's merely a design issue, from
http://www.2600.com/news/view/article/1803
Per: http://www.bway.net/~keith/whrobots/whresp.html
[(10/27) Very preliminary, but posted since it's breaking news - for background, see http://www.bway.net/~keith/whrobots/ ]
I've been analyzing the robots.txt file, exactly because the directories are so strange. I have a theory on what's happened. But it's so jaw-dropping that I'm hesitant to rush it into a formal report/release. In short:
There's no conspiracy.
There's a real-life instance of the joke genre which runs
"I thought you said ..."
For example, here's one of the jokes: "After a California earthquake, Dan Quayle is sent to visit the most damaged site. But he never arrives there. Finally, he's found in Florida. He says, shocked, "Go to the EPIcenter? I thought you said ..." [EPCOT Center]
The joke here? Someone said:
"Don't have the search engines looking at the Iraq documents index"
And that was heard as:
"Don't have the search engines looking at every "index" with Iraq"
Really!
The evidence for this is that the robots.txt file (copy as of Oct 27 15:10 EST - seems changed now! - also grab from Google cache while good) had lines for
Disallow: /disk2/www/htdocs/infocus/iraq
Disallow: /disk2/www/htdocs/infocus/iraq/news/infocus/iraq
These are the only lines where there's never any matching pattern of "iraq" and "text" at all. They're obviously special in some way. And they look like they're a searchable index.
Then there's the fact that some people are confused between directories, the function of the file "index.html", and that a bare directory will display as "Index of <directory name>" in some servers.
So ... "Iraq index" ... "Index of <directory name>" ... Oooops!
Never attribute to malice which can be explained by stupidity.
This is hard to believe. But it fits!
Update: See also this comment posted in Dan Gillmore's eJournal discussion :
At the Internet Archive, we just recently (before this speculation erupted) got word from the White House webmaster that they wanted us to do an extensive crawl of their site. See my my blog entry for more details:
http://gojomo.blogspot.com/#106732065514107786
Their robots.txt is weird and suboptimal, no doubt, but given that I just saw them express a genuine desire to be crawled and archived a few days ago, the weirdness should have a completely innocuous explanation.
Posted by: Gordon Mohr on October 27, 2003 10:16 PM
This project was not supported by anyone. If anyone is providing financial support for such projects, the author would dearly like to know.
Version 0.5 October 28 2003
See also: Domain Investigations
(if you subscribed a few months ago, please resubscribe due to a crash)
See more of Seth Finkelstein 's Censorware Investigations