bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #57594] sync hyphenation pattern files with TeX versions


From: G. Branden Robinson
Subject: [bug #57594] sync hyphenation pattern files with TeX versions
Date: Tue, 20 Jul 2021 22:57:47 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0

Follow-up Comment #9, bug #57594 (project groff):

Hi, Dave!

[comment #8 comment #8:]
> [comment #7 comment #7:]
> > Does groff read pattern files in Latin-1 encoding?
> 
> Answering my own question: it must read not only Latin-1, but other members
of the ISO 8859 family as well, since tmac/hyphen.den
<http://git.savannah.gnu.org/cgit/groff.git/tree/tmac/hyphen.den> and
tmac/hyphen.det
<http://git.savannah.gnu.org/cgit/groff.git/tree/tmac/hyphen.det> are already
in Latin-1 encoding, tmac/hyphen.cs
<http://git.savannah.gnu.org/cgit/groff.git/tree/tmac/hyphen.cs> is in
Latin-2, and tmac/hyphen.fr
<http://git.savannah.gnu.org/cgit/groff.git/tree/tmac/hyphen.fr> is in
Latin-9.

Yes.  If the hyphenation pattern files contain non-ASCII characters, the
localization file loads latin{1,2,9}.tmac as appropriate before invoking `hpf`
to load the pattern file.

These latin*.tmac files are each a long set of .trin requests to map the
printing characters from the upper half of ISO 8859 to the correct groff
special character escape sequences.

In principle, possibly the localization files should clear these translations
after reading the pattern files.

In practice, it doesn't seem to matter too much, since it seems likely that
users of the relevant locale will either already be using the corresponding
input character set (so the mappings won't be wrong for input characters in
the _document_), or they'll be using UTF-8, and will have run their input
through preconv(1) already which will bust it down to ASCII and special
character escapes.

If I had to solve the "in principle" problem, I'd change the character-set
tmac files to move the existing .trin requests into a macro named something
like "load-charset", add another macro "unload-charset" that undoes the .trin
requests, and have the localization files call these macros in appropriate
places.

This mechanism is a bit of a dead end because the TeX hyph-utf8 project has
migrated to UTF-8, and decoding of multibyte character sequences is far beyond
the .trin request's capabilities.

Regards,
Branden

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57594>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]