[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Problems with .PDFPIC caused by pdfinfo
From: |
Heinz-Jürgen Oertel |
Subject: |
Re: Problems with .PDFPIC caused by pdfinfo |
Date: |
Tue, 21 Sep 2021 14:34:14 +0200 |
Am Dienstag, 21. September 2021, 13:51:12 CEST schrieb Heinz-Jürgen Oertel:
> Am Montag, 20. September 2021, 21:39:49 CEST schrieb Keith Marshall:
> > On 20/09/2021 19:22, Dave Kemper wrote:
> > > Hi Heinz-Jürgen,
> > >
> > > Thanks for debugging and submitting a fix for this problem!
> >
> > Except that it's not really the most appropriate solution; that was
> > proposed four years ago...
> >
> > > In general, when proposing changes to the groff code base, it's best
> > > to open a bug report ...
> >
> > ...and Bertrand opened a (belated) ticket:
> > https://savannah.gnu.org/bugs/index.php?55107
> >
> > which has shown no activity since; (so even open tickets aren't immune
> > to fading into obscurity).
> >
> > > Regarding the specific change you've proposed, there may be some
> > > resistance to using a grep option that's not part of the POSIX
> > > standard for the command. I'm not sure how widely implemented -a is,
> > > or what equivalent solution might be more portable.
> >
> > I would go even further ... groff should *not* be calling out to
> > external tools, such as grep — much less pdfinfo — from within core
> > code, in a manner which requires use of unsafe mode, *especially* when
> > core code to achieve the required functionally has been awaiting
> > integration for a number of years!
>
> For me too it would be a better solution to not calling external tools, even
> that's part of the linux/unix philosophy.
> As you found out, this small problem does exist for years already. I don't
> remember exactly, but I looked already at the pdfinfo code, but was not able
> to correct it. It seems to be only the pdfinfo used in OpenSuse.
>
> Posix grep
> https://www.unix.com/man-page/posix/1P/grep/
> does not know the -a or --text option.
>
> Regards
> Heinz
I did some more research. The result, it's not "pdfinfo" it is Imagemagick
"convert".
I mostly use jpg file converted to pdf by "convert".
The example file "Selz.pdf"
% pdfinfo Selz.pdf | hexdump -xc
0000000 6954 6c74 3a65 2020 2020 2020 2020 2020
0000000 T i t l e :
0000010 5300 6500 6c00 7a00 0000 410a 7475 6f68
0000010 \0 S \0 e \0 l \0 z \0 \0 \n A u t h o
0000020 3a72 2020 2020 2020 2020 6820 7474 7370
0000020 r : h t t p s
...
as one can see, there are \0 chars already in the title.
Looking at the PDF:
/Title <00530065006C007A0000>
/CreationDate (D:20210914095154)
/ModDate (D:20210914095154)
/Author (https://imagemagick.org)
/Producer (https://imagemagick.org)
you see, the \0 chars are already there.
What can I do?
Regards
Heinz