bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #64484] [troff] \X escape sequence should read its argument in (som


From: G. Branden Robinson
Subject: [bug #64484] [troff] \X escape sequence should read its argument in (something like) copy mode
Date: Mon, 26 Aug 2024 19:52:58 -0400 (EDT)

Update of bug #64484 (group groff):

                 Summary: [troff] \X escape sequence should read its argument
in copy mode => [troff] \X escape sequence should read its argument in
(something like) copy mode

    _______________________________________________________

Follow-up Comment #15:

Hi Deri,

[comment #14 comment #14:]
> I am a bit concerned about this. pdf.tmac contains various .device commands,
which, if replaced by \X stop it working properly.

An understandable concern; one of the reasons this feature is taking a while
to land is that I am trying to figure out what the true operational semantics
of these direct-grout-generating requests and escape sequences are.

Theoretically, there are four different ways to inject stuff into
device-independent output.  (And that's leaving aside the `cf` and `trf`
requests, so there are six.  At least.)


\X'this is a device control command'
.br
.device this is a device control command
.br
\!x X this is a device control command
.br
.output x X this is a device control command


Are all of these equivalent?  Not quite--there are subtleties involving line
breaks, and
[https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/node.cpp?h=1.23.0#n880
even deeper ones involving state transitions of drawing parameters].

In my opinion, a device control command per se shouldn't imply a change of any
drawing parameters.  If there is a need to record the fact that a device
control command "dirtied" the drawing position, font selection, color
configuration, and so on, there should be some separate mechanism of telling
the formatter that.

This is causing me frustration right now with "sboxes.tmac"; maybe the `fl`
request should be given some kind of state-dirtying semantics.  (At present,
and historically, it doesn't do that.)  I won't push until I have it sorted
out; I'm trying to avoid asking you to change anything in any macro package.

> I have shown previously that gropdf is quite happy to receive groff nodes as
7 bit ascii i.e. the character "â" can be sent to gropdf as \[u00E2]
(preconv) or \[^a] (groff special), both 7 bit clean. However, \X blocks both
these uses with an error, which is a little confusing to a user because it
refers to "special character '\^a'" even if \[u00E2] appears in the input to
groff.

I agree that that's confusing.  You might be pleased to know of a change I
have pending.


diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 89f4518c1..d08fe5e4c 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -5829,6 +5829,7 @@ static node *do_device_control() // \X
   return new special_node(mac);
 }
 
+# if 0
 static void device_request()
 {
   if (!has_arg(true /* peek; we want to read in copy mode */)) {
@@ -5849,15 +5850,49 @@ static void device_request()
   }
   if (curdiv == topdiv && topdiv->before_first_page)
     topdiv->begin_page();
-  // Null characters can correspond to node types like vmotion_node that
-  // are unrepresentable in a device control command, and got scrubbed
-  // by `asciify`.
-  for (; c != '\0' && c != '\n' && c != EOF;
+  for (; c != '\n' && c != EOF;
        c = get_copy(0 /* nullptr */))
-    mac.append(c);
+    encode_character_for_device_output(&mac, c);
   curenv->add_node(new special_node(mac));
   tok.next();
 }
+#endif
+
+static void device_request()
+{
+  if (!has_arg()) {
+    warning(WARN_MISSING, "device control request expects arguments");
+    skip_line();
+    return;
+  }
+  macro mac;
+  while (tok.is_space() || tok.is_tab())
+    tok.next();
+  if ('"' == tok.ch())
+    tok.next();
+  for (;;) {
+    unsigned char c;
+    if (tok.is_newline() || tok.is_eof())
+      break;
+    if (tok.is_space())
+      c = ' ';
+    else if (tok.is_tab())
+      c = '\t';
+    else if (tok.is_leader())
+      c = '\001';
+    else if (tok.is_backspace())
+      c = '\b';
+    else
+      c = tok.ch();
+    //assert(c != 0); // XXX: a node?
+    encode_character_for_device_output(&mac, c);
+    tok.next();
+  }
+  if (curdiv == topdiv && topdiv->before_first_page)
+    topdiv->begin_page();
+  curenv->add_node(new special_node(mac));
+  skip_line();
+}
 
 static void device_macro_request()
 {


I've enhanced a
[https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/tests/device-control-special-character-handling.sh
regression test I added in January] that attempts to ensure that processing of
various special character escape sequences comes through in device-independent
output unmolested.

Here's the test input:


input='.
.nf
\X#bogus1: esc \%to-do\[u1F63C]\\[u1F00]
-\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]\[`a]#
.device bogus1: req \%to-do\[u1F63C]\\[u1F00]
-\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]\[`a]
.ec @
@X#bogus2: esc @%to-do@[u1F63C]@@[u1F00]
-@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]@[`a]##
.device bogus2: req @%to-do@[u1F63C]@@[u1F00]
-@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]@[`a]
.'


...and the results.[1]


$ (cd build &&
../src/roff/groff/tests/device-control-special-character-handling.sh)
x X bogus1: esc to-do\[u1F00] -'"`^\~
x X bogus1: req @%to-do\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]\[`a]
x X bogus2: esc to-do\[u1F00] -'"`^\~
x X bogus2: req @%to-do@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]@[`a]
troff:<standard input>:2: error: special character 'u1F63C' cannot be used
within a device control escape sequence
troff:<standard input>:2: error: special character '`a' cannot be used within
a device control escape sequence
troff:<standard input>:5: error: special character 'u1F63C' cannot be used
within a device control escape sequence
troff:<standard input>:5: error: special character '`a' cannot be used within
a device control escape sequence
checking X escape sequence, default escape character
...FAILED
checking X escape sequence, alternate escape character
...FAILED
checking for errors on unsupported special character escapes


That doesn't look good, but when I add some code...


diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 229e7956e..041a455e7 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -5832,6 +5832,7 @@ static node *do_device_control() // \X
   return new special_node(mac);
 }
 
+# if 0
 static void device_request()
 {
   if (!has_arg(true /* peek; we want to read in copy mode */)) {
@@ -5852,15 +5853,49 @@ static void device_request()
   }
   if (curdiv == topdiv && topdiv->before_first_page)
     topdiv->begin_page();
-  // Null characters can correspond to node types like vmotion_node that
-  // are unrepresentable in a device control command, and got scrubbed
-  // by `asciify`.
-  for (; c != '\0' && c != '\n' && c != EOF;
+  for (; c != '\n' && c != EOF;
        c = get_copy(0 /* nullptr */))
-    mac.append(c);
+    encode_character_for_device_output(&mac, c);
   curenv->add_node(new special_node(mac));
   tok.next();
 }
+#endif
+
+static void device_request()
+{
+  if (!has_arg()) {
+    warning(WARN_MISSING, "device control request expects arguments");
+    skip_line();
+    return;
+  }
+  macro mac;
+  while (tok.is_space() || tok.is_tab())
+    tok.next();
+  if ('"' == tok.ch())
+    tok.next();
+  for (;;) {
+    unsigned char c;
+    if (tok.is_newline() || tok.is_eof())
+      break;
+    if (tok.is_space())
+      c = ' ';
+    else if (tok.is_tab())
+      c = '\t';
+    else if (tok.is_leader())
+      c = '\001';
+    else if (tok.is_backspace())
+      c = '\b';
+    else
+      c = tok.ch();
+    //assert(c != 0); // XXX: a node?
+    encode_character_for_device_output(&mac, c);
+    tok.next();
+  }
+  if (curdiv == topdiv && topdiv->before_first_page)
+    topdiv->begin_page();
+  curenv->add_node(new special_node(mac));
+  skip_line();
+}
 
 static void device_macro_request()
 {


...and with which the `BOXSTART` macro is unhappy (the page background turns
completely black), I get the following.


$ (cd build &&
../src/roff/groff/tests/device-control-special-character-handling.sh)
x X bogus1: esc to-do\[u1F63C]\[u1F00] -'"`^\~\[u00E0]
x X bogus1: req to-do\[u1F63C]\[u1F00] -'"`^\~\[u00E0]
x X bogus2: esc to-do\[u1F63C]\[u1F00] -'"`^\~\[u00E0]
x X bogus2: req to-do\[u1F63C]\[u1F00] -'"`^\~\[u00E0]

checking X escape sequence, default escape character
checking X escape sequence, alternate escape character
checking for errors on unsupported special character escapes


That's just miles better.

> If I understand correctly, your plan is linked to the work you have been
doing with filenames (bug #65108, comment 3), which outlines your parsing
rules. The restriction in your rule 5d would prevent "strings" of characters
in other languages to be used. Many file systems allow utf-8 in filenames:-

> -rw-r--r-- 1 derij derij         0 Aug 26 21:40  αβγ.greek


> Which would fail 5d, and would prevent pdf bookmarks in any language except
basic latin or latin-1 supplement, if these rules are extended to .device.
> 
> Have I understood this correctly?

I'm not sure.  I don't think so.

Running my working copy with above BOXSTART-breaking patch applied, I can do
this.


$ cat ATTIC/for-deri.man 
.TH for\-deri 1 2024-08-26 "a demo for Deri"
.SH Name
for\-deri \- a sample command
.SH Description
What program requires documentation?
.SH αβγ.greek
That was some Greek,
and it will end up in a device control command when we format this man
page for PDF.
$ ./build/test-groff -K utf8 -man -T pdf -Z ATTIC/for-deri.man | grep '^x X'
x X ps:exec [/Dest /for\-deri(1) /View [/FitH -26000 u] /DEST pdfmark
x X ps:exec [/Dest /for\-deri(1) /Title (for\-deri(1)) /Level 1 /OUT pdfmark
x X pdf: markrestart
x X ps:exec [/Dest /pdf:bm2 /View [/FitH -57000 u] /DEST pdfmark
x X ps:exec [/Dest /pdf:bm2 /Title (Name) /Level 2 /OUT pdfmark
x X devtag:.NH 1
x X devtag:.eo.h
x X ps:exec [/Dest /pdf:bm3 /View [/FitH -85800 u] /DEST pdfmark
x X ps:exec [/Dest /pdf:bm3 /Title (Description) /Level 2 /OUT pdfmark
x X devtag:.NH 1
x X devtag:.eo.h
x X ps:exec [/Dest /pdf:bm4 /View [/FitH -114600 u] /DEST pdfmark
x X ps:exec [/Dest /pdf:bm4 /Title (\[u03B1]\[u03B2]\[u03B3].greek) /Level 2
/OUT pdfmark
x X devtag:.NH 1
x X devtag:.eo.h
x X pdf: marksuspend


...so the Greek seems to show up fine.

Actually, the "grout" output is unchanged from what's on Savannah's HEAD right
now.  So, that extent, my plans are to _not_ break what you're afraid I'm
going to break.

I think.

Does this illuminate things?

Despite this ticket's postponed status, it might end up fixed as part of
getting bug #63074 over the finish line.  But only as much of it as I need for
that purpose.  Time will tell if that's the whole enchilada for this ticket.

Regards,
Branden

[1] In case anyone's curious what older _groffs_ did with that...


$ (cd build &&
../src/roff/groff/tests/device-control-special-character-handling.sh)
GNU groff version 1.23.0
x X bogus1: esc to-do\[u1F00] -'"`^\~
x X bogus1: req @%to-do\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]\[`a]
x X bogus2: esc to-do@[u1F00] -'"`^\~
x X bogus2: req @%to-do@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]@[`a]

checking X escape sequence, default escape character
...FAILED
checking X escape sequence, alternate escape character
...FAILED
checking for errors on unsupported special character escapes

$ (cd build &&
../src/roff/groff/tests/device-control-special-character-handling.sh)
GNU groff version 1.22.4
x X bogus1: esc to-do\[u1F00] -
x X bogus1: req @%to-do\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]\[`a]
x X bogus2: esc to-do@[u1F00] -
x X bogus2: req @%to-do@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]@[`a]
troff: <standard input>:3: a special character is invalid within \X
troff: <standard input>:3: a special character is invalid within \X
troff: <standard input>:3: a special character is invalid within \X
troff: <standard input>:3: a special character is invalid within \X
troff: <standard input>:3: a special character is invalid within \X
troff: <standard input>:3: a special character is invalid within \X
troff: <standard input>:3: a special character is invalid within \X
troff: <standard input>:3: a special character is invalid within \X
troff: <standard input>:6: a special character is invalid within \X
troff: <standard input>:6: a special character is invalid within \X
troff: <standard input>:6: a special character is invalid within \X
troff: <standard input>:6: a special character is invalid within \X
troff: <standard input>:6: a special character is invalid within \X
troff: <standard input>:6: a special character is invalid within \X
troff: <standard input>:6: a special character is invalid within \X
troff: <standard input>:6: a special character is invalid within \X
checking X escape sequence, default escape character
...FAILED
checking X escape sequence, alternate escape character
...FAILED
checking for errors on unsupported special character escapes




    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?64484>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]