bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should


From: G. Branden Robinson
Subject: [bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should
Date: Tue, 12 Apr 2022 06:53:36 -0400 (EDT)

Follow-up Comment #2, bug #62300 (project groff):

Hi Bjarni,

[comment #1 comment #1:]
> commit f47b7dd139525bf3b8b4fbe767c3a45816c8445a
> Author: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
> Date:   Sat Nov 17 15:59:09 2018 +0000
> 
>     The character \[u00A0] is not recognized
>     
>       The input character "no-break space" (' ', 0xA0) is mapped by "groff"
>     to '\~' (groff_char(7)), but only the character name '\[char160]' is
>     translated in the file "tmac/troffrc".

Yes.
     
>       The "preconv" translates the no-break space to the name '\[u00A0]'.

That was an error and is the subject of this ticket.
     
> diff --git a/tmac/troffrc b/tmac/troffrc
> index 1bd4aa8c9..8895a9a01 100644
> --- a/tmac/troffrc
> +++ b/tmac/troffrc
> @@ -33,10 +33,14 @@ troffrc!X100 troffrc!X100-12 troffrc!lj4 troff!lbp
troffrc!html troffrc!pdf
>  .
>  .\" Test whether we work under EBCDIC and map the no-breakable space
>  .\" character accordingly.
> -.do ie '\[char97]'a' \
> +.do ie '\[char97]'a' \{\
>  .    do tr \[char160]\~
> -.el \
> +.    do tr \[u00A0]\~
> +.\}
> +.el \{\
>  .    do tr \[char65]\~
> +.    do tr \[u0041]\~
> +.\}
>  .
>  .\" Set the hyphenation language to 'us'.
>  .do hla us
> 

I'm not sure I agree with this patch.  It's preconv's job to produce valid
(GNU) troff _input_.  It was not doing so.

The input sequence '\[u00A0]' is _syntactically_ valid...but like '\[uFFFF]'
and '\[u0000]', it's not _meaningful_, and should be warned about.

Here is the patch I have pending.


diff --git a/src/preproc/preconv/preconv.cpp
b/src/preproc/preconv/preconv.cpp
index 83feef8f7..b1027af17 100644
--- a/src/preproc/preconv/preconv.cpp
+++ b/src/preproc/preconv/preconv.cpp
@@ -404,9 +404,13 @@ unicode_entity(int u)
   if (u < 0x80)
     putchar(u);
   else {
-    // Handle soft hyphen specially -- it is an input character only,
-    // not a glyph.
-    if (u == 0xAD) {
+    // Handle no-break space and soft hyphen specially--they are input
+    // characters only, not glyphs.  See groff_char(7).
+    if (u == 0xA0) {
+      putchar('\\');
+      putchar('~');
+    }
+    else if (u == 0xAD) {
       putchar('\\');
       putchar('%');
     }



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?62300>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]