Dowbrigade has sad comments on difficulty in making fair use of the Shorenstein Center report "Big Media" Meets the "Bloggers": (link credit Dave Winer)
The weird thing is the extent to which the authors have gone to make sure this milestone article in the academic history of the Blogosphere is unbloggable. Excerpts or selections of the text cannot be saved, or copied and pasted. The document cannot be converted to another format or saved as anything else. ... The selection below were typed out by the Dowbrigade, letter by letter.
It takes a very twisted view for a court to believe things like this do not impinge fair use rights ...
The encryption used here is well-known, and trivially within my technical ability to decrypt. But given what happened to the last guy who programmed about PDF files and decryption (the name Dmitry Sklyarov might ring a bell), I'll let someone else take the risk of an unquestioned DMCA 1201(a)(2) violation.
Instead, I'll note a very simple way to get usable text from the restricted file. Observe that printing is allowed. Now, one does not have to get fancy with OCR or images. Simply do a version of the "analog hole". The document can be printed. The printing process has the ability to print to a file. Use that option. That is, print the document to a file instead of directly to a printer. This produces a file in a different format.
There's a "Do not remove this tag under penalty of (DMCA) law" bit of code in that file, which handles the security for usage restrictions. HOWEVER, the text of the document itself is in the clear here! All that's needed is to make it more usable. So extract the whole text chunk from any line in the file where the line starts with a left parenthesis or ends with a right parenthesis (no text chunk has a segment with more than two lines)
That is, cough, I meant to say,
perl -n -e 'print $1 if (/^\(([^)]+)/ || /([^)]+)\)$/);' < shorenstein.ps
[I think I'm allowed to write the English statement, but in peril with the Perl statement, at least under current court precedents]
All done. You now have a file of text which, though not all that pretty in formatting, is quite amenable to cut-and-paste.
Does even this post violate the DMCA? Is it trafficking in "technology" that "is marketed by that person ... for use in circumventing a technological measure that effectively controls access to a work protected under this title."?
You guys at Harvard will defend me, right? Right? Right? ...
Disclaimer: No encryptions were broken in the making of this post.
[UPDATE: I found a simpler, better, procedure (all the following are standard Linux programs)Use the program xpdf to generate the postscript print file. This program obeys the usage restrictions itself, but does NOT insert the usage restriction code in the generated print output.
Then use pstopdf13 to generate a PDF file from the print file (the default 1.2 version didn't work well, 1.3 works better).
This new PDF file is not usage restricted!
Then run pdftotext over this new file ... and presto, a pretty text version!
I'm really worried now ...
]
That reminds me of Alex Haldermann and his description of breaking the Sunncomm encryption scheme by pressing the shift key. Just as Haldermann you are only discussing a way of working around an encryption scheme. That is not providing a "technology, product, service, device, component, or part thereof," as required by the DMCA.
And just as in Haldermann's case, the workaround shows that the technology was not "effective" in the first place.
Pack-journalism to the contrary, the Halderman DMCA issue was not about the shift key. It was about descriptions over the internals of the program.
The problem is whether speech is "technology" in terms of the DMCA. Because if code is technology, and code is speech, can speech be technology?
For the DMCA "effective" doesn't mean "unbreakable", that's been established repeatedly.
If we assume those magic 8 lines do provide effective protection, then xpdf is probably violating the DMCA by leaving them off.
See DMCA 1201(c)(3), discussed in comments of the next message, there's no DMCA "broadcast flag".
Besides, there is a minor bug in the phrasing of the threat-tag. It says you can't remove the tag, but doesn't say you can't remove the matress? Is there a difference? Perhaps they meant you can't detach the matress and the tag from each other, but that's not what they said. And that's not what the PDF tag says either. Just open it up, cut and paste, and leave the tag in the file like it says. Or is that too simple?