bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: subtle bug in generating PDF outlines with luatex


From: Gavin Smith
Subject: Re: subtle bug in generating PDF outlines with luatex
Date: Wed, 8 Jan 2025 20:35:16 +0000

On Wed, Jan 08, 2025 at 04:39:58AM +0000, Werner LEMBERG wrote:
> > So 0x20 is translated as '\\040', i.e., 4 bytes instead of ' ',
> > 1 byte, and strcmp correctly sorts 4 bytes and this is the bug:
> > luatex has to unscape these strings before the sorting.  [...]
> >
> > is it necessary to escape the space 0x20?  Perhaps no, so in the
> > lua function it could be c<0x20 instead of c<=0x20.  This could be
> > an enhancement (and probably it masks the bug [...])
> 
> So: can this be fixed on the side of `texinfo.tex` at least for the
> space character, circumventing the luatex bug for the most common
> case?

I have minimal understanding of lua and luatex.  I've edited texinfo.tex
in the obvious way:

diff --git a/doc/texinfo.tex b/doc/texinfo.tex
index bef52a95ec..c31e0c50dd 100644
--- a/doc/texinfo.tex
+++ b/doc/texinfo.tex
@@ -1017,11 +1017,21 @@ where each line of input produces a line of output.}
   \endgroup
   \def\pdfescapestrutfsixteen#1{\directlua{UTF16oct('\luaescapestring{#1}')}}
   % Escape PDF strings without converting
+  % Use \(, \), \\ escapes for (, ), \.
   \begingroup
     \directlua{
       function PDFescstr(str)
         for c in string.bytes(str) do
-          if c <= 0x20 or c >= 0x80 or c == 0x28 or c == 0x29 or c == 0x5c then
+          if c == 0x28 then
+            tex.sprint(-2,
+              string.format(string.char(0x5c) .. string.char(0x28)))
+          elseif c == 0x29 then
+            tex.sprint(-2,
+              string.format(string.char(0x5c) .. string.char(0x29)))
+          elseif c == 0x5c then
+            tex.sprint(-2,
+              string.format(string.char(0x5c) .. string.char(0x5c)))
+          elseif c < 0x20 or c >= 0x80 then
             tex.sprint(-2,
               string.format(string.char(0x5c) .. string.char(0x25) .. '03o',
                             c))

I checked PDF outlines worked as expected with this.

Is there any testing you can do of this to confirm it works as expected
before I commit it?

With a short input file:

\input texinfo

@contents

@node abc def
@chapter abc

@node a(a
@chapter One

@node a)a
@chapter Two

@node a\a
@chapter b\b

@bye


I ran "luatex test2.texi", and then "qpdf --stream-data=uncompress
test2.pdf - | less".  Searching for the string "abc" then leads to
the part of the file:



<< /Count 4 /First 24 0 R /Last 33 0 R /Type /Outlines >>
<< /Limits [ (-1) (abc def) ] /Names [ (-1) 12 0 R (1) 37 0 R (2) 40 0 R (3) 43 
0 R (4) 46 0 R (a\(a) 20 0 R (a\)a) 21 0 R (a\\a) 22 0 R (abc def) 19 0 R ] >>
<< /Dests 56 0 R >>
<< /Names 57 0 R /Outlines 55 0 R /PageLabels << /Nums [ 0 << /P (T-) /S /D >> 
0 << /S /r >> 1 << /S /D >> ] >> /PageMode /UseOutlines /Pages 17 0 R /Type 
/Catalog >>
endstream
endobj
59 0 obj


(I'm not too familiar with the pdf format so have copied some more context.)

You can see the escaping in the strings "(a\(a)", "(a\)a)" and "(a\\a)".

Will luatex sort these escapes correctly?

> > Anyway, the pdf spec says that it's possible to use these escape
> > sequences:
> >
> > ```
> > \n LINE FEED (0Ah)  (LF)
> > \r CARRIAGE RETURN (0Dh)  (CR)
> > \t HORIZONTAL TAB (09h) (HT)
> > \b BACKSPACE (08h) (BS)
> > \f FORM FEED (FF)
> > \( LEFT PARENTHESIS (28h)
> > \) RIGHT PARENTHESIS (29h)
> > \\ REVERSE SOLIDUS (5Ch) (Backslash)

I expect we wouldn't need to worry about special escapes for \n, \r, \t,
\b or \f.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]