bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented


From: G. Branden Robinson
Subject: [bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented
Date: Wed, 13 Apr 2022 21:42:31 -0400 (EDT)

Update of bug #58962 (project groff):

                  Status:                    None => In Progress            
             Assigned to:                    None => gbranden               

    _______________________________________________________

Follow-up Comment #5:

Hi Dave,

I believe I've cracked this.


$ xxd EXPERIMENTS/dave-58962.roff 
00000000: 2e69 6620 27a0 275c 7e27 202e 746d 2069  .if '.'\~' .tm i
00000010: 6e70 7574 2030 7841 3020 6d61 7463 6865  nput 0xA0 matche
00000020: 7320 5c5c 7e0a 2e69 6620 27ad 275c 2527  s \\~..if '.'\%'
00000030: 202e 746d 2069 6e70 7574 2030 7841 4420   .tm input 0xAD 
00000040: 6d61 7463 6865 7320 5c5c 250a            matches \\%.
$ ./build/troff -F ./build/font -F ./font -M ./build/tmac -M ./tmac
./EXPERIMENTS/dave-58962.roff
input 0xA0 matches \~
input 0xAD matches \%
$ ./build/troff -F ./build/font -F ./font -T utf8 -M ./build/tmac -M ./tmac
./EXPERIMENTS/dave-58962.roff
input 0xA0 matches \~
input 0xAD matches \%


(Not like the output device should really matter.)

It seems like these cases just weren't ever dealt with in the formatter's
input parser.  Maybe there was some dithering because the input encoding could
be either ISO or EBCDIC.

Here's the patch.


$ git diff
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 36822033a..015c17a87 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -1,4 +1,4 @@
-/* Copyright (C) 1989-2020 Free Software Foundation, Inc.
+/* Copyright (C) 1989-2022 Free Software Foundation, Inc.
      Written by James Clark (jjc@jclark.com)
 
 This file is part of groff.
@@ -1743,6 +1743,29 @@ void token::next()
     int cc = input_stack::get(&n);
     if (cc != escape_char || escape_char == 0) {
     handle_normal_char:
+      // Handle no-break space and soft hyphen.
+      if (0x41 == 'A') { // ASCII/ISO 8859/Unicode
+       if (0xA0 == cc) {
+         type = TOKEN_STRETCHABLE_SPACE;
+         return;
+       }
+       else if (0xAD == cc) {
+         type = TOKEN_HYPHEN_INDICATOR;
+         return;
+       }
+      }
+      else if (0xC1 == 'A') { // code page 1047 (EBCDIC)
+       if (0x41 == cc) {
+         type = TOKEN_STRETCHABLE_SPACE;
+         return;
+       }
+       else if (0xCA == cc) {
+         type = TOKEN_HYPHEN_INDICATOR;
+         return;
+       }
+      }
+      else
+       fatal("unrecognized input character encoding");
       switch(cc) {
       case PUSH_GROFF_MODE:
        input_stack::save_compatible_flag(compatible_flag);



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?58962>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]