groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: [groff] 28/28: [pdf]: Implement linear bookmark tag search.


From: G. Branden Robinson
Subject: Fwd: [groff] 28/28: [pdf]: Implement linear bookmark tag search.
Date: Mon, 4 Mar 2024 06:52:42 -0600

Hi folks,

The following change could probably use some additional eyeballs.  It's
one of the things Deri and I disagreed about.

First, I'd like to know if it horrifies anyone, or if anyone has
concerns about the performance hit.  I'm willing to pay a little bit of
performance if I can buy robustness with it, and I think that's what
this does.  But that's me.

The other thing to ask is of Peter: assuming you are among the
non-horrified, would you like me to prepare a patch to om.tmac to
migrate it to this new `pdf:lookup` macro?

If that is done (regardless of who does it), I can also chop out the `ie
d PRINTSTYLE` branches from pdf.tmac shown below.

It's becoming clear to me how to add hyperlink support to the ms, me,
and mm packages.

Regards,
Branden

----- Forwarded message from "G. Branden Robinson" 
<g.branden.robinson@gmail.com> -----

Date: Mon,  4 Mar 2024 07:42:40 -0500 (EST)
From: "G. Branden Robinson" <g.branden.robinson@gmail.com>
To: groff-commit@gnu.org
Subject: [groff] 28/28: [pdf]: Implement linear bookmark tag search.

gbranden pushed a commit to branch master
in repository groff.

commit cd9fde325fd889f673f9603b43e1204b1fde42fc
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
AuthorDate: Mon Mar 4 05:24:40 2024 -0600

    [pdf]: Implement linear bookmark tag search.
    
    * tmac/pdf.tmac (pdf:lookup): Given an argument, search defined bookmark
      tags for a match and return one (if found) in the string
      `pdf:lookup-result`, which is defined but empty if there is no match.
      Previously (and, actually, still--see below), lookups were O(1)
      because strings named `pdf:look($TAG_NAME)` were defined.  The speed
      was great but unfortunately, in practical use, tags often got *roff
      escape sequences stuck into them, which drew diagnostic messages from
      the formatter and could defeat the matches.
    
      (pdfbookmark, pdf*href-M): Use the new mechanism to record a bookmark
      tag if `PRINTSTYLE` (a mom(7) macro) is _not_ defined, so as to not
      regress documents using that package.  Store the tag text in the
      string `pdf:bm\\n[pdf:bm.nr].tag`; every PDF bookmark in a groff
      document gets a serial number already.
    
      (pdf*href): Use the new mechanism to call `pdf:lookup` and locate a
      match for the desired tag.
    
    * doc/GMPfront.t.in (an*cln): Delete this macro.  It iterated through a
      string, scrubbing it of `\%` escape sequences.  Much more exotic
      things could be placed in bookmark tags; we were fortunate that man
      page cross references tend to stick to ASCII, plus `\%` to suppress
      hyphenation.  But that's only suitable for man page references; if we
      want taggable (sub)section headings, like groff_mmse(7)'s "Se ocksÄ",
      we need strings, not just valid groff identifiers.
    
      (an*bookmark): Populate `an*page-ref-nm` without cleaning it first.
    
    So what's the performance hit?  I built the tree before this commit,
    then ran the following simple script and saved the output.  Applied the
    commit and did it again.
    
    target=doc/groff-man-pages.pdf
    
    date
    
    for n in $(seq 20)
    do
            rm -f $target
            time -v make $target
    done
    
    date
    
    This 381-page PDF's generation slowed from 5.99-6.71 seconds to
    7.58-8.49 seconds.
    
    Scraping out the elapsed time and flinging these to GNU datamash:
    
    $ datamash mean 1 sstdev 1 < 20times-before.data
    6.3565  0.22806451072587
    $ datamash mean 1 sstdev 1 < 20times-after.data
    8.0925  0.2556904995868
    
    ...for a mean performance hit of 27.3%.
    
    If this is intolerable, then the priority for resolving Savannah #62264
    could be increased.  Once we have properly iterable strings (and a
    strings.tmac "standard library") it will be straightforward for people
    to write (and contribute!) string-hashing algorithms in *roff.
---
 ChangeLog         | 32 ++++++++++++++++++++++++++++++++
 doc/GMPfront.t.in | 13 +------------
 tmac/pdf.tmac     | 48 ++++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 75 insertions(+), 18 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index a58151907..f6e0587ea 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,35 @@
+2024-03-04  G. Branden Robinson <g.branden.robinson@gmail.com>
+
+       [pdf]: Implement linear bookmark tag search.
+
+       * tmac/pdf.tmac (pdf:lookup): Given an argument, search defined
+       bookmark tags for a match and return one (if found) in the
+       string `pdf:lookup-result`, which is defined but empty if there
+       is no match.  Previously (and, actually, still--see below),
+       lookups were O(1) because strings named `pdf:look($TAG_NAME)`
+       were defined.  The speed was great but unfortunately, in
+       practical use, tags often got *roff escape sequences stuck into
+       them, which drew diagnostic messages from the formatter and
+       could defeat the matches.
+       (pdfbookmark, pdf*href-M): Use the new mechanism to record a
+       bookmark tag if `PRINTSTYLE` (a mom(7) macro) is _not_ defined,
+       so as to not regress documents using that package.  Store the
+       tag text in the string `pdf:bm\\n[pdf:bm.nr].tag`; every PDF
+       bookmark in a groff document gets a serial number already.
+       (pdf*href): Use the new mechanism to call `pdf:lookup` and
+       locate a match for the desired tag.
+
+       * doc/GMPfront.t.in (an*cln): Delete this macro.  It iterated
+       through a string, scrubbing it of `\%` escape sequences.  Much
+       more exotic things could be placed in bookmark tags; we were
+       fortunate that man page cross references tend to stick to ASCII,
+       plus `\%` to suppress hyphenation.  But that's only suitable for
+       man page references; if we want taggable (sub)section headings,
+       like groff_mmse(7)'s "Se ocksÄ", we need strings, not just valid
+       groff identifiers.
+       (an*bookmark): Populate `an*page-ref-nm` without cleaning it
+       first.
+
 2024-03-04  G. Branden Robinson <g.branden.robinson@gmail.com>
 
        * doc/GMPfront.t.in: Resync `MR` replacement with its
diff --git a/doc/GMPfront.t.in b/doc/GMPfront.t.in
index 367190478..4390ea5f3 100644
--- a/doc/GMPfront.t.in
+++ b/doc/GMPfront.t.in
@@ -7,17 +7,6 @@
 .nr PDFOUTLINE.FOLDLEVEL 1
 .defcolor pdf:href.colour rgb 0.00 0.25 0.75
 .pdfinfo /Title "groff Collected Reference Pages"
-.de an*cln
-.  ds \\$1
-.  als an*cln:res \\$1
-.  shift
-.  ds an*cln:res \\$*\"
-.  ds an*cln:chr \\$*
-.  substring an*cln:chr 0 0
-.  if '\%'\\*[an*cln:chr]' \{\
-.    substring an*cln:res 1
-.  \}
-..
 .
 .de END
 ..
@@ -25,7 +14,7 @@
 .am reload-man END
 .de an*bookmark
 .  ie (\\\\$1=1) \{\
-.     an*cln an*page-ref-nm \\\\$2\"
+.     ds an*page-ref-nm \\\\$2\"
 .     pdfbookmark -T "\\\\*[an*page-ref-nm]" \\\\$1 \\\\$2
 .  \}
 .  el .pdfbookmark \\\\$1 \\\\$2
diff --git a/tmac/pdf.tmac b/tmac/pdf.tmac
index e26cd1cd0..3824e7c6d 100644
--- a/tmac/pdf.tmac
+++ b/tmac/pdf.tmac
@@ -191,6 +191,24 @@ am solely responsible for any bugs I may have introduced 
into this file.
 .\"
 .nr PDFOUTLINE.FOLDLEVEL 10000
 .\"
+.\" Search defined bookmarks for a tag matching $1.  This gets around
+.\" problems with *roff escape sequences embedded in identifiers (which
+.\" is not allowed by the language syntax), and the need to tediously
+.\" scrub the strings of them (and throw diagnostics in the user's face
+.\" if we don't do a good enough job) at the expense of an O(n) search
+.\" every time we reference a bookmark instead of an O(1) one.
+.\"
+.de pdf:lookup
+.nr pdf:index 0 1
+.ds pdf:lookup-result \" empty
+.while d pdf:bm\\n+[pdf:index].tag \{\
+.   if '\\$1'\\*[pdf:bm\\n[pdf:index].tag]' \{\
+.      ds pdf:lookup-result \\*[pdf:bm\\n[pdf:index].tag]\"
+.      break
+.      \}
+.   \}
+..
+.
 .\" The actual job of creating an outline reference
 .\" is performed by the "pdfbookmark" macro.
 .\"
@@ -267,8 +285,14 @@ am solely responsible for any bugs I may have introduced 
into this file.
 .      rm pdf:clean
 .      ev
 .      tr \[em]\[em]
-.      ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\*[pdf:cleaned]
-.      if dPDF.EXPORT .tm .ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\*[pdf:cleaned]
+.      ie d PRINTSTYLE \{\
+.         ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\*[pdf:cleaned]
+.         if dPDF.EXPORT .tm .ds pdf:look(\\*[PDFBOOKMARK.NAME]) 
\\*[pdf:cleaned]
+.         \}
+.      el \{\
+.         ds pdf:bm\\n[pdf:bm.nr].tag \\*[PDFBOOKMARK.NAME]
+.         if d PDF.EXPORT .tm .ds pdf:bm\\n[pdf:bm.nr].tag 
\\*[PDFBOOKMARK.NAME]
+.         \}
 .      pdfmark /Dest /\\*[PDFBOOKMARK.NAME] /View [\\*[PDFBOOKMARK.VIEW]] /DEST
 .      nop \!x X ps:exec [/Dest /\\*[PDFBOOKMARK.NAME] /Title 
(\\*[pdf:cleaned]) /Level \\n[pdf:bm.lev] /OUT pdfmark
 .      pdf:href.options.clear
@@ -589,8 +613,14 @@ am solely responsible for any bugs I may have introduced 
into this file.
 .\"
 .ds PDFBOOKMARK.NAME "\\*[pdf:href-N]\\*[pdf:href-D]
 .pdf*href.set \\*[PDFBOOKMARK.NAME] \\$1
-.ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\$*
-.if dPDF.EXPORT .tm .ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\$*
+.ie d PRINTSTYLE \{\
+.   ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\$*
+.   if dPDF.EXPORT .tm .ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\$*
+.   \}
+.el \{\
+.   ds pdf:bm\\n[pdf:bm.nr].tag \\*[PDFBOOKMARK.NAME]
+.   if d PDF.EXPORT .tm .ds pdf:bm\\n[pdf:bm.nr].tag \\*[PDFBOOKMARK.NAME]
+.   \}
 .\"
 .\"
 .\" Irrespective of whether this marker is created, or not,
@@ -704,8 +734,14 @@ am solely responsible for any bugs I may have introduced 
into this file.
 .      ds PDFHREF.DESC \\$*
 .      \}
 .   el \{\
-.      ie dpdf:look(\\*[pdf:href-D]) .ds PDFHREF.DESC 
\\*[pdf:look(\\*[pdf:href-D])]
-.      el .ds PDFHREF.DESC Unknown
+.      ds PDFHREF.DESC Unknown
+.      ie d PRINTSTYLE \{\
+.         if dpdf:look(\\*[pdf:href-D]) .ds PDFHREF.DESC 
\\*[pdf:look(\\*[pdf:href-D])]
+.         \}
+.      el \{\
+.         pdf:lookup \\*[pdf:href-D]
+.         ie !'\\*[pdf:lookup-result]'' .ds PDFHREF.DESC 
\\*[\\*pdf:lookup-result]
+.         \}
 .      \}
 .   \" Apply border and colour specifications to the PDFMARK string
 .   \" definition, as required.

_______________________________________________
Groff-commit mailing list
Groff-commit@gnu.org
https://lists.gnu.org/mailman/listinfo/groff-commit

----- End forwarded message -----

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]