[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #59397] Assign default .hcode values to alphabetic characters in gr
From: |
Dave |
Subject: |
[bug #59397] Assign default .hcode values to alphabetic characters in groff's default character set |
Date: |
Wed, 31 Jul 2024 20:04:21 -0400 (EDT) |
Follow-up Comment #5, bug #59397 (group groff):
[comment #3 comment #3:]
> I think it might be an open question as to whether letters
> from outside the basic Latin alphabet _should_ necessarily be
> hyphenated like their basic Latin "base characters".
There's a little fuzz in any automated hyphenation system. When encountering
the string "project", groff can't know whether it's the noun, which is broken
"proj-ect", or the verb, which would be "pro-ject". An LLM could probably
figure it out, but short of integrating one of those into groff, it's just
going to make its best guess, and rely on the user to override it if it's
wrong.
On the other hand, when a diacritic changes the syllabication, such as
"expose" vs "exposé", it will pretty much (I hedge, but can't think of any
exceptions) always do so by adding a syllable, and thus a potential break
point. The patterns, presumably, are set up for the unaccented form, meaning
groff will never use the additional break point offered by the accented form.
But that's fine: it's better to not break a word in an acceptable spot than to
break one in an unacceptable spot.
And anyway, those are the rarer cases. More commonly, the break points won't
change, such as whether "coöperate", "doppelgänger", or "débâcle" are
written with or without the diacritics.
But in order for any of this to work, the adorned letters need hyphenation
codes, which they don't have by default, hence this ticket.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?59397>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
signature.asc
Description: PGP signature
- [bug #59397] Assign default .hcode values to alphabetic characters in groff's default character set,
Dave <=
Message not available
Message not available