bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: minor hyphenation issue


From: Werner LEMBERG
Subject: Re: minor hyphenation issue
Date: Wed, 19 Apr 2017 08:29:31 +0200 (CEST)

> So, if I understand the situation correctly, groff gets its
> hyphenation information from TeX.

groff is rather using the hyphenation patterns created (mainly) for
TeX, and collected at CTAN.  There are plans to copy/move them to
unicode.org or a related organization.

> TeX isn't accommodating any English words with non-ASCII characters
> because of its hyphenation algorithm's limitations,

There aren't such limitations by TeX itself (as patterns for other
languages like German show, you can easily have accented characters).
Fact is that noone stepped forward and took care of English
hyphenation patterns, which is quite sad IMHO.

> and Werner is reluctant to have groff accommodate them because of
> the maintenance complexity of modifying or augmenting the TeX rules.

I'm no longer maintaining groff, and my interest in English
hyphenation patterns is zero :-) – note that I'm one of the
maintainers of the German hyphentation patterns, and this is already
sufficient work...

> Can TeX's list of patterns be expanded to include letters with
> diacritics without breaking TeX's English hyphenation algorithm?

Yes.  Better, however, would be to generate the patterns anew from a
large list of English words.

> That is, if Latin-9 characters are included, will the algorithm
> simply ignore them, or fail?

The patterns ignore everything they don't recognize.

> The user can, of course, use .hw to correctly break the occasional
> such word in predominantly ASCII English text, However, it's far
> from intuitive that such accommodation is the user's responsibility,
> when all other hyphenation Just Works without the user having to
> think about it.  It would be nice if these sorts of words worked out
> of the box.

Yep.  Someone has to take care of that.  It's not rocket science, BTW.

> Side note: groff does, I observe, correctly break "öyster" (which is
> technically not even a real English word) but not "résumé" (which is
> not only a real word, but needs the accents to distinguish it from
> the unrelated word "resume").  I assume this is because no
> hyphenation point of öyster is adjacent to the non-ASCII letter.

groff isn't set up to handle non-ASCII stuff for hyphenation (i.e.,
proper `.hcode' entries are missing).  Consequently, such characters
are completely ignored, and groff uses the remaining part of the word
to find hyphenation points.  If you drop the `ö', the groff tries to
hyphenate `yster'.  In the word `résumé', the remaining part is `sum',
which can't be hyphenated.

Maybe you want to read Liang's dissertation:

  https://www.tug.org/docs/liang/liang-thesis.pdf


    Werner

reply via email to

[Prev in Thread] Current Thread [Next in Thread]