May 01, 2005

PDF redacted text extraction works on Giuliana Sgrena Report

As is being widely covered, a report detailing the Giuliana Sgrena shooting incident was released in PDF form with "redactions" which could be removed by using common PDF tools to obtain the underlying text.

Again, what happens here is that the creator starts with a text document, then draws an image over the text. So, on the screen, it looks like the text has been blacked-out. But in terms of the PDF, there's the text, and then an image layered on top of it. Any tool which extracts text, such as cut-and-paste, or text driver, will ignore the redacting image. So, instant unredacted document.

While not quite the oldest trick in the book, this has been known for years, having exposed some spies some and confidential memos. I sometimes think curious people try it on every applicable document. My first reaction when hearing of this case was "Wow, that old unredaction trick actually worked on something nowadays?"

I wouldn't normally echo such a extensively reported item. But it gives me an opportunity to tell an extended activism-story about how well-known is the redaction trick (this involves two other people, one of whom I'm fairly certainly doesn't read my blog, the other who might but will be less concerned - so I've changed a detail or two). Let's say there was a certain report which was of intense interest to several people, and was available in a redacted form. So, what do I do the first time I get a copy of it? Look at it microscopically, and try to un-redact it ("Hmm ... `My consulting rates have been $[black bar] a hour' ... is that two figures under the bar, or three? Looks like three ..."). I load it into a PDF viewer, and see the standard black redaction bars which indicate an image layer, and try cutting and pasting around it. And I get text back, but the text is dashes. Whaa??? It's text. Is this some new security feature for redaction? I dig into the raw PDF structure (I can do that). I dump the data which goes into making text, going down into a low level, and it's still dashes. Suddenly, it dawns on me what the writer has done. He's overwritten the redacted words with text dashes, and then put the black image bar on top of them! I toss an unseen salute to the writer, who does know his stuff.

Months later, another activist says to me "Seth, have you seen so-and-so's report? It would be really cool if you could apply the PDF extraction procedure to it." Way ahead of you [name redacted], way ahead of you. Both of us. Now how about you actually read a relevant court case before giving advice about the risk of people getting sued.

I suppose I should end by saying I wouldn't want to be in the shoes of the person who created that Sgrena Report PDF file.

By Seth Finkelstein | posted in security | on May 01, 2005 11:59 PM (Infothought permalink)
Seth Finkelstein's Infothought blog (Wikipedia, Google, censorware, and an inside view of net-politics) - Syndicate site (subscribe, RSS)

Subscribe with Bloglines      Subscribe in NewsGator Online  Google Reader or Homepage


I think I'm probably the second person Seth is referring to ('another activist says to me "Seth, have you seen so-and-so's report?"' etc.).

At the time of the conversation he was referring to (about 3 years ago), Seth was already talking about how nothing he does gets press-coverage (yes, this has been a running theme for years!). What I told Seth was that some of his reports probably get passed over because they restated the obvious, the classic example being:
a report on how Bess blocking software blocks sites like the Anonymizer and other sites which allow you to reload a URL indirectly. Well, of course it does. Furthermore, the report included a lot of literary quotes and dogmatic statements like "censorware is about control, not about protection", and I argued that when your report doesn't convey any surprising information to begin with, the other stuff detracts from it even more, because it comes across as an attempt to substitute for any actual new information.

Seth and I were brainstorming by email for an idea for a new report that *would* actually get some coverage, and the PDF redaction was just one of many, many ideas tossed around. My point was not about that idea in particular, just that you do have to brainstorm to come up with a new angle. Re-stating the obvious, like "Censorware blocks the Anonymizer", is not going to work.

It's not an exact science, but given the fact that reports put out by me and some other people have gotten more coverage than some of Seth's reports, this means there *is* something about those reports that causes them to propagate further. I don't pay off journalists to write about them.

(At one time I made the off-handed remark that I had a "press list" of about 300 email addresses collected from journalists who had written to us over the years, and Seth has since claimed that the reason I got more press was because I had more "press-reach". But what I kept telling Seth after that the vast majority of those email addresses were for reporters with hardly any "reach" themselves, and only 20 or so of those addresses belonged to reporters who had provided meaningful coverage of anything I'd ever sent out -- and almost all of those are reporters that Seth has interacted with as well at one point. In other words I have **no more press reach than he does**.)

Basically, if you want to get press coverage and you are *not* famous enough that something you say will be news just because you said it, you need to have something surprising to say.

Posted by: Bennett Haselton at May 2, 2005 06:45 PM

I didn't think the context of the PDF exchange was all that relevant to point of the story here, of how the un-redaction procedure is common knowledge among many people. Moreover, I was attempting to avoid giving any identifying personal detail.

Though yes, some slight criticism of the advice was intended. Press-reach is not only sheer numbers, but also "name recognition", and connections. I believe you severely underestimate the impact of those factors, and the general issue has been much explored in my postings.

"... given the fact that reports put out by me and some other people have gotten more coverage than some of Seth's reports, this means there *is* something about those reports that causes them to propagate further."

No. It could also mean that there's something about the "press-reach" of the people putting them out. This seems so obviously true in the case of Harvard that to deny it would be absurd. I would also argue it is true of someone who has a Slashdot editor helping them, versus someone who has a Slashdot editor *hurting* them.

I don't think it's a good idea to air all the dirty laundry in public, and there will be no winners from it. But you seem insistent.

Posted by: Seth Finkelstein at May 3, 2005 01:31 AM

But I just explained exactly why I don't think that I have any "greater press-reach". Most of the reporters' email addresses that I have that actually matter in terms of press-reach (about 20 total a couple of people at Wired, C-Net, MSNBC, AP, etc.) are people that you have had contact with, and you could always write to them yourself as a follow-up to a story they've written. Getting a reporter's permission to add them to your own email distribution list is pretty easy.

I maintain that if I had written a report saying that Bess blocks sites which allow people to circumvent it:
that would not have gotten any press either.

Posted by: Bennett Haselton at May 3, 2005 02:38 PM

You did indeed explain what you think; and I explained why I think what you think is mistaken. What then? Once more, if it were only a matter of email addresses, then there would be no difference between something coming from Harvard, or from a no-name place. In the real world, things aren't so simple. I repeat, you underestimate name-recognition as a factor, and further you incorrectly believe I have more than I do. Moreover, the extensive attacks on me by various people have mattered, which is an aspect that many are loath to acknowledge because of the moral implications.

It borders on the ludicrous to deny this factor.

We can go around this, as we have. However, I regard the relative lack of coverage I received for the DMCA exemption win as thoroughly dispositive, and relegates this argument in my mind to the level of evolution vs. creationism.

I maintain you are wrong, that if you had written a report about a secret censorware category which could not be disabled, blacklisting anonymizers, privacy, more, it would have been on the front page of Slashdot at the very least.

Myself, I do not see the benefit of your publicly diminishing my research. But if you are determined, there is nothing I can do but respond.

Posted by: Seth Finkelstein at May 3, 2005 04:35 PM

OK, so at least it appears we've moved past the "You have a press list of 300 journalists" thing.

As for name-recognition, there's no objective way to measure it, so no way to test your statement that you think I have more than you do, or that I over-estimate yours.

But, both of us started with zero, and the way you build up name recognition can have to do with the kind of information you put out, and how it's presented.

Similarly, in the case of the Slashdot editor who is on speaking terms with me but not with you, well, he started out not knowing either one of us, and after a few years he couldn't stand communicating with you any more. That didn't just happen by itself.

I'm not saying the rules of the game are fair (certainly I wish that every new report from anybody would be judged on merit and not on any name recognition), but that they do apply the same to both of us. (To pick your favorite analogy, it is not like dating, where some people's looks give them an unbeatable advantage over others. Ben Edelman had that advantage, working from Harvard, but neither of us did.)

My theories are that the explanation for the lack of press-attention for your reports is that they contain information that is not surprising, and that the souring of many of your relationships is due to excessive complaining (again, 80% of your current blog posts containing some gripe against the world!) about incidents that an objective observer would look at differently.

Now, if you disagree with those explanations, there might be other ones. But any explanation has to take into account the fact that we both started in the same place, and if we're getting different results all these years later, it would have to be due to either the nature of the reports we put out, or something that we did differently over that time period, or both.

Posted by: Bennett Haselton at May 3, 2005 08:17 PM

We have now started into a game I call "Seth, IT'S ALL YOUR FAULT".

The way this game works, is that every item of objective evidence is answered by a personal attack of the form "Seth, IT'S ALL YOUR FAULT".

A Slashdot editor helps you, but not me? - "Seth, IT'S ALL YOUR FAULT".

I win a DMCA exemption, but poor press coverage? - "Seth, IT'S ALL YOUR FAULT".

No publicity for being an expert-witness in a Internet censorship case? - "Seth, IT'S ALL YOUR FAULT".

This game can be played endlessly. Every explanation attributed to a character flaw of mine. Very convenient for the critic.

Do you grasp the structure of the argument reducing to either my work must be unworthy, or I must personally be unworthy? (as in "not surprising" or "excessive complaining"). Do you see how this has no falsifiability, and in fact is exceedingly mean-spirited at heart? (note the dating analogy was my attempt to put things in terms you could relate to, as to looks - my own favorite is the cruelty of the rich who believe the poor don't succeed due to unwillingness to work or bad attitude, so at best they should be lectured on their failings - an approach you are coming uncomfortably close to mimicking, take heed!)

No, it didn't happen by accident. Much worse. It happened because I did so much work under such legal risk, while being extensively attacked, yet (willingly, but never again) letting others take the credit.

I direct you to the following statement (my emphasis):

"That was only the beginning; Seth did tireless and brilliant work
after that to determine what censorware products really blocked. Seth
is one of the heroes of Internet free speech; one of those rare people
who do the work despite the fact that they know they will receive no

In fact, someone wrote:

"From 1995 onward, at a time when Seth could have been using his
technical and programming skills to take home enough money to sleep on
a mattress stuffed with dollar bills, he was instead donating his time
o an important project -- and anonymously, for that matter, receiving
no credit for his work

And directly:

"especially considering that he had to remain anonymous while others ... (not to mention Peacefire) took the bulk of the credit."

Yes, exactly: "But any explanation has to take into account the fact that we both started in the same place, and if we're getting different results all these years later, it would have to be due to either the nature of the reports we put out, or something that we did differently over that time period, or both."

Like: "he had to remain anonymous while others ... (not to mention Peacefire) took the bulk of the credit."

That's something we did differently. And, in retrospect, I regret it in the extreme.

Bennett, frankly, on a moral level, you should be ashamed of yourself. You benefited enormously in part from work I did. I got extensive attacks and the life-long enmity of a powerful lawyer. You got publicity and reputation. That was a crucial difference.

That's the answer, in short - a big difference is that for a long time, I got no credit, and you got more credit than, bluntly, you deserved.

You will never, ever, incorporate this into your worldview, because it's just too easy to play the fault-finding game, than to admit you benefitted at my cost.

The next stage in the game, is to say that my position is more evidence of my flaws, because I will not agree to your self-serving view of history.

Once again, the reason this is at the level of evolution vs. creationism, is that there exists no reasonable, practical, conditions under which you will ever admit being in the wrong here.

Posted by: Seth Finkelstein at May 3, 2005 09:05 PM

[Note to other readers - yes, I am bitter over it, I freely admit it. However, if you had spent years laboring anonymously, while others got much of the credit, well, wouldn't you be a little unhappy with such a person if they then started writing publicly that your subsequent work was unworthy and deserved its obscurity, and attributing your lack of future success to character flaws?]

Posted by: Seth Finkelstein at May 3, 2005 09:20 PM

To address some points one at a time:

1) All of Peacefire's work which resulted in any significant press coverage was the result of work I had done and not taking credit for anything of yours. For the list-decryptors that we published, I worked out the decryptions myself (there was once instance where I vaguely remember someone relayed a *hint* from you about how to break *one* of the encrypted lists, but that hardly constitutes "the bulk of our work"), and most of our press resulted from activities other than the list-decryptors, such as calling attention to lists of blocked sites:
or new angles like the Bait And Switch experiment:
And of course the Circumventor software on our site, which you could have written just as easily (in fact it probably would have been easier for you).

Where is it that you think that I "benefited enormously in part from work that you did"?

2) I had to endure some legal risk as a consequence of what I was doing, having been threatened with lawsuits by CYBERsitter, N2H2, I-Gear, and Cyber Patrol. There is no reason to think that the people who helped me (James Tyre and the ACLU) would not have helped you in the same situation.

3) A lot of your problems cannot possibly be attributed to the fact that you worked anonymously for a while. In the case of the Slashdot editor who can't stand talking to you, it's ludicrous to think that has anything to do with the fact that you worked behind the scenes on censorware decryption.

Posted by: Bennett Haselton at May 4, 2005 01:38 PM

1) "there was once instance where I vaguely remember someone relayed a *hint*"

Wrong. Simply, factually, wrong. Here is the quote (my emphasis):

"Seth does not mind at all that it is his crack (in October 1997) of I-Gear that is the basis of Bennett's program. Bennett got it through me from my AF, just as we've all had the benefit of Seth's handiwork.)

Also by James S. Tyre:

"Recently, Bennett Haselton of Peacefire went public with reports on
and decoders for X-Stop and I-Gear ... Again, those reports were based on Seth's work."


You sneer, minimize, trivialize, and then turn around and do personal attacks to cover it. It is an extremely unattractive aspect of your own personality.

#2) Also completely wrong. Jim is out of town now. When he gets back, I will ask his permission if I can tell you even more along these lines.

#3) Wrong again. In fact, working anonymously on decryption was a very harmful factor - people couldn't understand why I was so upset, so got a very distorted image of my concerns.

Every item you've listed has been wrong. But you won't admit being wrong. Because you can't. You won't admit any aspect, no reasonable evidence is sufficient, since sanctimony is cheap and easy.

Recursively, I have repeatedly told you no good will come of this. And you won't admit being wrong about that either.

Posted by: Seth Finkelstein at May 4, 2005 05:39 PM

The tip about how to decrypt I-Gear was the hint I was referring to. (Well, it was such a trivial "encryption" that the "tip" basically *was* the algorithm.)

I broke the encryption for X-Stop independently of you, that was my own work. (Your decryptor was probably easier to use when it came along, but I had decoded it as well.) When Jim cited that as an example of others getting credit for your work, in writing your nomination for a Pioneer Award, I either didn't notice or didn't care because I agreed with him that you deserved credit for your work in general. But, I did that on my own.

And you ignored my main point, which is that a large part of our work that did generate press, had nothing to do with decryption at all, such as the Bait And Switch report, the Amnesty Intercepted report, and the Circumventor software, all things that you could have done if you'd wanted to, and none of which any reasonable person could have believed would carry any "legal risk".

And I still say it makes no sense to say that your anonymous decryption work is the reason that you're not on speaking terms with a Slashdot editor. Even if some people didn't understand what you were complaining about back when your decryption work was secret, that did become public and you got the Pioneer Award for it, and it was only after that point that many people broke off contact with you anyway.

Posted by: Bennett Haselton at May 4, 2005 08:11 PM

Sigh. Wrong again. In fact, after the I won the Pioneer Award, I made a concerted effort to do some bygones-be-bygones. But too much damage - from the tensions when I worked anonymously - had been done by then. This timeline isn't a matter of interpretation. You're simply saying something which is incorrect.

How many times are we going to play this game? How many times are you going to rewrite history in some self-serving way, I correct you with the sources, and instead of admitting error, you shift around and cast about for something else, a how-about-this, some bit of humanity on my part, so you can triumphantly proclaim the game's goal: "Seth, IT'S ALL YOUR FAULT".

The propositions seem proven to reasonable standards of evidence - that you benefited enormously from work I did anonymously (that you have certainly done other work is besides the point), and on the other side, it harmed me. To repeat, ad nauseum - you cannot grant this. You are not asking a question. You are trying to excuse yourself, because sermonizing gives a sense of moral righteousness.

I would assert that the sheer number of factually inaccurate statements you have made - with great confidence, note! - should cause you to reexamine your thinking here. But sadly, I know it won't :-(.

Posted by: Seth Finkelstein at May 4, 2005 08:46 PM