[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #66165] commit dcae60b0fb1ad3fa3314fdfdbecb973961a40410 has wrong a
From: |
G. Branden Robinson |
Subject: |
[bug #66165] commit dcae60b0fb1ad3fa3314fdfdbecb973961a40410 has wrong assumption |
Date: |
Thu, 5 Sep 2024 21:40:42 -0400 (EDT) |
Follow-up Comment #6, bug #66165 (group groff):
[trying savane-based email reply to bug]
At 2024-09-05T19:30:38-0400, Deri James wrote:
> Follow-up Comment #4, bug #66165 (group groff):
>
> You seem to have drifted away from the point of this bug report which
> was specifically to point out this code change may cause problems:-
Thanks for quoting the specific piece of the commit that worries you.
> // We want to represent ordinary characters that normally map to
> // non-basic Latin code points in a way that is compatible with how
> // they're typeset, to avoid confusion when these characters are
> // used in ways that are ultimately visible, as in tag names for PDF
> // bookmarks, which can appear in a viewer's navigation pane.
> if ('\'' == c)
> mac->append_str("\\[u2019]");
> else if ('-' == c)
> mac->append_str("\\[u2010]");
> else if ('^' == c)
> mac->append_str("\\[u0302]");
> else if ('`' == c)
> mac->append_str("\\[u0300]");
> else if ('~' == c)
> mac->append_str("\\[u0303]");
> else if (c == escape_char)
> mac
>
> My reading of this change is that you are doing the following
transformations
> before sending the text to output drivers:-
>
> ' U+0027 APOSTROPHE -> ’ U+2019 RIGHT SINGLE QUOTATION MARK
> - U+002D HYPHEN-MINUS -> ‐ U+2010 HYPHEN
> ^ U+005E CIRCUMFLEX ACCENT -> (ab̂c) U+0302 COMBINING CIRCUMFLEX ACCENT
> ` U+0060 GRAVE ACCENT -> (ab̀c) U+0300 COMBINING GRAVE ACCENT
> ~ U+007E TILDE -> (ab̃c) U+0303 COMBINING TILDE
>
> The combining versions are completely wrong,
I wondered a bit about that at the time.
> let's see what it looks like using them:-
> x X ps:exec [/Dest /pdf:bm1 /View [/FitH \[u2010]67000 u] /DEST pdfmark
> ^ causes error line 1490 gropdf
Yes, that looks dubious.
> x X ps:exec [/Dest /pdf:bm1 /Title (My\[u0303]Fave\[u0303]Pic) /Level 1 /OUT
> pdfmark
> ^ tilde changed to combining tilde
As does this.
> x font 36 TB
> f36
> s12000
> tMy~F
> ^ note use of asciitilde here
groff doesn't call anything "asciitilde", but I know what you mean,
U+007E.
> The first thing to notice is the very strange bookmark. Then wonder
> why, in the text of the document, the given asciitilde has become a "˜
> U+02DC SMALL TILDE". The answer is in the afmtodit.tables file:-
>
> "tilde", "02DC",
>
> And the TR font file which has:-
>
> ~ 333,638 2 126 tilde -- 02DC
> a~ "
>
> Which shows that the groff glyph names "~" (and \[a~] are mapped to
> the postscript glyph named tilde and that has unicode \[u02DC]. So
> this is the unicode you should have used in this commit, and I suspect
> similar mistakes with the other combining glyphs:-
>
> ^ 333,674 2 94 circumflex -- 02C6
> a^ "
> ga 333,678 2 146 grave -- 0060
>
> So grave stays as \[u0060], so I'm not sure what you were thinking.
I was overapplying this table from groff_char(7):
┌───────────────────────────────────────────────────────────────────┐
│ Keycap Appearance and meaning Special character and meaning │
├───────────────────────────────────────────────────────────────────┤
│ " " neutral double quote \[dq] neutral double quote │
│ ' ’ closing single quote \[aq] neutral apostrophe │
│ - ‐ hyphen \- or \[-] minus sign/Unix dash │
│ \ (escape character) \e or \[rs] reverse solidus │
│ ^ ˆ modifier circumflex \[ha] circumflex/caret/“hat”
│
│ ` ‘ opening single quote \(ga grave accent │
│ ~ ˜ modifier tilde \[ti] tilde │
└───────────────────────────────────────────────────────────────────┘
...forgetting that character names get remapped again on their way out
of troff into the grout commands.
And indeed the outcome is not always the same.
$ ~/groff-stable/bin/groff --version | head -n 1; for dev in dvi html lbp lj4
pdf ps X75 X100 ascii latin1 utf8; do printf "$dev "'foo~bar\n' |
~/groff-stable/bin/groff -T $dev -Z | grep '^[Ct]'; done
GNU groff version 1.23.0
tdvi
tfo
to~bar
thtml
tfoo~bar
tlbp
tfoo~bar
tlj4
tf
too~bar
tpdf
tfoo~bar
tps
tfoo~bar
Cti
Cti
tascii
tfoo~bar
tlatin1
tfoo~bar
tutf8
tfoo~bar
(The X11 devices conceal their names because they don't use the 't'
command. But we can still see that they choose the special character
'ti'.)
My objective is to use the same glyphs in the bookmarks/metadata as
appear in the document text, so that people populating strings with
section headings and similar in documents will see the same thing in
both places.
It'd be nice to have automated tests for this, wouldn't it? ;-)
Regards,
Branden
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?66165>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
signature.asc
Description: PGP signature
- [bug #66165] commit dcae60b0fb1ad3fa3314fdfdbecb973961a40410 has wrong assumption, Deri James, 2024/09/03
- [bug #66165] commit dcae60b0fb1ad3fa3314fdfdbecb973961a40410 has wrong assumption, G. Branden Robinson, 2024/09/03
- [bug #66165] commit dcae60b0fb1ad3fa3314fdfdbecb973961a40410 has wrong assumption, G. Branden Robinson, 2024/09/03
- [bug #66165] commit dcae60b0fb1ad3fa3314fdfdbecb973961a40410 has wrong assumption, G. Branden Robinson, 2024/09/03
- [bug #66165] commit dcae60b0fb1ad3fa3314fdfdbecb973961a40410 has wrong assumption, Deri James, 2024/09/05
- [bug #66165] commit dcae60b0fb1ad3fa3314fdfdbecb973961a40410 has wrong assumption, Deri James, 2024/09/05
- [bug #66165] commit dcae60b0fb1ad3fa3314fdfdbecb973961a40410 has wrong assumption,
G. Branden Robinson <=
- [bug #66165] commit dcae60b0fb1ad3fa3314fdfdbecb973961a40410 has wrong assumption, Deri James, 2024/09/06
- [bug #66165] `\X` escape sequence should not map ASCII to special characters, G. Branden Robinson, 2024/09/07
- [bug #66165] `\X` escape sequence should not map ASCII to special characters, G. Branden Robinson, 2024/09/07
- [bug #66165] `\X` escape sequence should not map ASCII to special characters, G. Branden Robinson, 2024/09/08