[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: an observation and proposal about hyphenation codes
From: |
Werner LEMBERG |
Subject: |
Re: an observation and proposal about hyphenation codes |
Date: |
Thu, 01 Aug 2024 04:50:39 +0000 (UTC) |
> A fact I found noteworthy about how GNU troff actually sets up
> hyphenation codes is that the equivalence classes it is designed to
> support _are almost never used_ beyond lettercase coalescence.[1]
Yes. As originally intended in TeX (and groff closely follows), the
`.hcode` mechanism is used essentially for 'downcasing'.
> [1] "Almost never". So what's an exception?
>
> tmac/ps.tmac:
>
> .fchar \[S ,] \o'S\[ac]'
> .hcode \[S ,]s
> .fchar \[s ,] \o's\[ac]'
> .hcode \[s ,]s
I no longer can remember why I've mapped the two Romanian characters
to 's' for hyphenation. From today's point of view this looks like a
mistake – I should have mentioned my reasoning in the ChangeLog entry,
but I did not...
Commit c65ea0c8f, which introduced this, is from 2003, and at that
time it was common to use both 'ş' and 'ș' equivalently for this
language (while only the latter is correct today).
> [...] You will observe that most languages declare hyphenation codes
> only for standard letters in their alphabets. For example, Czech
> omits the Polish letter ł, even though that letter is present in the
> ISO 8859-2 encoding that the localization file requires.
>
> Except German. German goes ahead and eats every letter in the
> Latin-1 supplement even though many of them are unknown in pure
> German orthography. (Any language can employ loan words, of
> course.)
[Wearing the maintainer hat of the German hyphenation patterns.]
This corresponds to the setup in
https://repo.or.cz/wortliste.git/blob/HEAD:/daten/german.tr
used to generate the German hyphenation patterns. Many of those
letters are indeed used. Our wordlist at
https://repo.or.cz/wortliste.git/blob/HEAD:/wortliste
contains entries like
Abbé;Ab-bé
Œuvre;Œu-vre
Señor;Se-ñor
Strømfjord;Strøm=fjord
Tête;Tête
Zaïre;Za-ï-re
to name a few.
> In GNU troff, hyphenation codes are _global_. They are not
> dependent on the hyphenation _language_ selected with the `hla`
> request, and which is a property of a GNU troff _environment_.
The reason for that is the age of groff, closely following TeX's
hyphenation algorithm. Unicode was not invented then.
> [...] the document author must write some macro that reconfigures
> the hyphenation codes as needed when switching environments.
Yes. Again, this is following TeX, which can't change hyphenation
code values within a paragraph (i.e., the last one selected are taken
for hyphenating it). IIRC, even today only LuaTeX has removed this
restriction.
> That seems like a problem to me--or it would, except that no one has
> complained about our hyphenation being bad for non-English
> languages.
The intersection of people who use groff, people who write
multilingual documents that need different scripts in a single
paragraph, and people who are typographically aware and take care of
bad or missing hyphenation, is probably very small...
> However, in the meantime, meaning for groff 1.24, I propose to move
> `hcode` definitions to where they make more sense: the character set
> macro files "koi8-r.tmac", "latin1.tmac", "latin2.tmac", and
> "latin9.tmac". (If/when I do that, I'll need to update the
> "tmac/LOCALIZATION" file accordingly.)
Probably a good idea. The few cases where this has to be changed
(classical example: Turkish needs 'İ' mapped to 'i' and 'I' mapped to
'ı') can be overridden in a language-specific hyphenation setup.
Werner
- Re: an observation and proposal about hyphenation codes,
Werner LEMBERG <=
- Re: an observation and proposal about hyphenation codes, Dave Kemper, 2024/08/06
- Re: an observation and proposal about hyphenation codes, G. Branden Robinson, 2024/08/06
- Re: an observation and proposal about hyphenation codes, Dave Kemper, 2024/08/06
- Re: an observation and proposal about hyphenation codes, G. Branden Robinson, 2024/08/06
- Re: an observation and proposal about hyphenation codes, Dave Kemper, 2024/08/06
- Re: an observation and proposal about hyphenation codes, G. Branden Robinson, 2024/08/06
- Re: an observation and proposal about hyphenation codes, Dave Kemper, 2024/08/06
- Re: an observation and proposal about hyphenation codes, Sigfrid Lundberg, 2024/08/07
- a bilingual English/French groff document (was: an observation and proposal about hyphenation codes), G. Branden Robinson, 2024/08/09