lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lynx-dev] Displaying a pdf live on the Fly?


From: Mouse
Subject: Re: [Lynx-dev] Displaying a pdf live on the Fly?
Date: Tue, 4 Jun 2019 06:57:02 -0400 (EDT)

>>>> Well, lynx said it may be a binary, see it anyway?  It was a mess.
>> Yes.  Most PDFs in my experience have most of their data compressed,
>> so they are "binary junk" when looked at with tools that don't
>> understand PDF structure and the compression method(s) in question.
> zless may be a better alternative since it does compressed data.

Not of much use here.  PDFs are not simply text files which have had a
general-purpose compression tool applied to them; they have internal
structure, and _some_ of the content gets compressed.

One PDF I have, for example, begins

%PDF-1.6
%âãÏÓ
5191 0 obj
<</Filter/FlateDecode/First 939/Length 3647/N 93/Type/ObjStm>>stream

after which the "binary junk" begins.  A few KB later (3647 bytes, I
expect), I see

endstream
endobj
5192 0 obj
<</Filter/FlateDecode/First 909/Length 4329/N 93/Type/ObjStm>>stream

and it's back to binary compressed data.

Other PDFs have more plaintext before the compressed data begins;
another one I checked has some sixty or seventy lines of plain text
before going into compressed data.

I don't recall enough details to know whether FlateDecode's compression
algorithm is close enough to any of the general-purpose compression
tools like gzip or compress to be of use, but even if it is, you would
at a minimum have to pick apart the PDF structure enough to extract the
compressed portion.  And, of course, FlateDecode is not the only
compression algorithm PDFs can use.

For full details, of course, read the PDF spec.

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                address@hidden
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B



reply via email to

[Prev in Thread] Current Thread [Next in Thread]