bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should


From: G. Branden Robinson
Subject: [bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should
Date: Mon, 11 Apr 2022 23:14:54 -0400 (EDT)

URL:
  <https://savannah.gnu.org/bugs/?62300>

                 Summary: [preconv] does not handle U+00A0 (NBSP) as it should
                 Project: GNU troff
            Submitted by: gbranden
            Submitted on: Tue 12 Apr 2022 03:14:52 AM UTC
                Category: Preprocessor preconv
                Severity: 3 - Normal
              Item Group: Incorrect behaviour
                  Status: In Progress
                 Privacy: Public
             Assigned to: gbranden
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None

    _______________________________________________________

Details:

preconv handles the soft hyphen by translating it into an appropriate escape
sequence (\%), but does not do the same for the no-break space.  groff_char(7)
has long defined the semantics in these as input code points (for ISO
character encodings).


$ cat whaaa.man 
.TH ISO_8859-2 7 2014-10-02 "Linux" "Linux Programmer's Manual"
.TS
l l l c lp-1.
240     160     A0              NO-BREAK SPACE
255     173     AD      ­       SOFT HYPHEN
.TE
$ xxd whaaa.man
00000000: 2e54 4820 4953 4f5f 3838 3539 2d32 2037  .TH ISO_8859-2 7
00000010: 2032 3031 342d 3130 2d30 3220 224c 696e   2014-10-02 "Lin
00000020: 7578 2220 224c 696e 7578 2050 726f 6772  ux" "Linux Progr
00000030: 616d 6d65 7227 7320 4d61 6e75 616c 220a  ammer's Manual".
00000040: 2e54 530a 6c20 6c20 6c20 6320 6c70 2d31  .TS.l l l c lp-1
00000050: 2e0a 3234 3009 3136 3009 4130 09c2 a009  ..240.160.A0....
00000060: 4e4f 2d42 5245 414b 2053 5041 4345 0a32  NO-BREAK SPACE.2
00000070: 3535 0931 3733 0941 4409 c2ad 0953 4f46  55.173.AD....SOF
00000080: 5420 4859 5048 454e 0a2e 5445 0a         T HYPHEN..TE.
$ groff -t -kz -man whaaa.man # groff 1.22.4
troff: whaaa.man:4: warning: can't find special character 'u00A0'
$ ./build/test-groff -ww -t -kz -man whaaa.man $ groff Git HEAD
troff:whaaa.man:4: warning: can't find special character 'u00A0'
$ preconv whaaa.man # groff 1.22.4 and Git HEAD
.lf 1 whaaa.man
.TH ISO_8859-2 7 2014-10-02 "Linux" "Linux Programmer's Manual"
.TS
l l l c lp-1.
240     160     A0      \[u00A0]        NO-BREAK SPACE
255     173     AD      \%      SOFT HYPHEN
.TE


preconv should put \~ on the output as documented in groff_char(7) even in
groff 1.22.4.


       160    the ISO latin1 no‐break space is mapped to ‘\~’, the
              stretchable space character.

       173    the soft hyphen control character.  groff never uses
              this character for output (thus it is omitted in the
              table below); the input character 173 is mapped onto
              ‘\%’.


This remapping should occur because the diagnostic itself is not the problem;
there are many Unicode code points that are not valid groff input; expressing
them as special character escape sequences does not change that fact.

Working on this.




    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?62300>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]