bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: minor hyphenation issue


From: Barbara Beeton
Subject: RE: minor hyphenation issue
Date: Wed, 19 Apr 2017 13:35:56 +0000

responding to werner's message ...

        From: address@hidden [mailto:address@hidden
        Sent: Wednesday, April 19, 2017 2:30 AM
        To: address@hidden
        Cc: Barbara Beeton <address@hidden>; address@hidden
        Subject: Re: minor hyphenation issue

        [...]

        > TeX isn't accommodating any English words with non-ASCII characters
        > because of its hyphenation algorithm's limitations,

        There aren't such limitations by TeX itself (as patterns for other
        languages like German show, you can easily have accented characters).
        Fact is that noone stepped forward and took care of English
        hyphenation patterns, which is quite sad IMHO.

I believe that there *are* limitations for the version of
tex maintained by knuth.  and that is kept "pure" as a
dependable base, also out of consideration for the author.

to see what happens, I have performed this little test:

> tex
This is TeX, Version 3.14159265 (TeX Live 2016) (preloaded format=tex)
**\relax

*\showhyphens{oyster öyster}      

Underfull \hbox (badness 10000) detected at line 0
[] \tenrm oys-ter ys-ter

*\showhyphens{resume résumé}

Underfull \hbox (badness 10000) detected at line 0
[] \tenrm re-sume rsum

*! Interruption.
<*> 
    
? x
No pages of output.
Transcript written on texput.log.
>

notice that in the responses, all non-ascii letters are simply
missing.  to test whether this would also happen in a pattern
file (or the whole thing simply break) would require rebuilding
the format, which I haven't time to do just now.  however, from
the result of the above test, I think it won't give the hoped for
result, namely it will not result in the ability to use any characters
above octal 177 successfully, and may render hyphenation itself
unreliable in basic tex.

        > and Werner is reluctant to have groff accommodate them because of
        > the maintenance complexity of modifying or augmenting the TeX rules.

basic tex is not going to be changed, for the reason given above.
and, addressing the next question, I believe that the tex patterns
cannot be augmented with non-ascii letters.  that is not to say that
a *second* list, say ushyphex8.tex, couldn't be created for use by
the "extended" versions of tex that accommodate the larger set
of letters.

        > Can TeX's list of patterns be expanded to include letters with
        > diacritics without breaking TeX's English hyphenation algorithm?

        Yes.  Better, however, would be to generate the patterns anew from a
        large list of English words.

this is absolutely true.  If I remember correctly, someone used
the list of words in ushyphex.tex to generate a few extra patterns,
which could be added manually to original pattern file.  ("correction"
of just the suffix "-chester", which appears in many place names,
but seldom in ordinary words, would be quite helpful here in
preparing meeting announcements.

getting permission to use one of the major dictionaries as a base
involves a lot of work, a secure development environment, and
signing over one's first-born.  the required files are *not* going
to be delivered via the internet.  maybe someone trusted and
indemnified could arrange to perform the work at the offices of
the dictionary's publisher.  more bureaucracy than technical skill
involved.

        > That is, if Latin-9 characters are included, will the algorithm
        > simply ignore them, or fail?

        The patterns ignore everything they don't recognize.

that's true, but whether there would be unwelcome side effects
loading such patterns into "basic" tex is undetermined.

        [...]

        Maybe you want to read Liang's dissertation:

          https://www.tug.org/docs/liang/liang-thesis.pdf

that's probably not a bad idea.  it's an interesting read.
                                        -- bb



reply via email to

[Prev in Thread] Current Thread [Next in Thread]