[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: minor hyphenation issue

From: Barbara Beeton
Subject: RE: minor hyphenation issue
Date: Wed, 24 May 2017 14:44:31 +0000

I wrote,

    > if I remember correctly, kuiken's patterns were generated using the
    > additions from the then-current exception list that I maintain, and
    > nothing more than that and the original list.

    The original word list still does exist?  Good to know.  IMHO, this
    word list should be extended with much more English words (including
    all the exception entries), then patgen should simply create new

unfortunately, I was wrong -- I've now gone back and read the
tugboat article by kuiken, and the patterns he developed are
*in addition* to the original liang patterns.  the article is at
at the time It was written, the number of patterns in the combined
set exceeded the standard hyphenation memory.  that is no longer
a consideration.  but the article doesn't explain exactly how the new
patterns were developed, so there may have been some manual
adaptation required.  if karl doesn't know, then a query to the
author would be in order.

    > unless I'm missing something important, I believe that, if words
    > with non-ascii letters are added to the base corpus, the original
    > patgen won't suffice;

    Fortunately, this is a wrong assumption.  The standard `patgen'
    program (current TeXLive version is 2.4) can support up to 256
    different characters, which is definitely sufficient for English.

thanks for the correction.

question for karl ... if some patterns with non-ascii values are
added, what might be the effect on a file that doesn't use
\usepackage[utf8]{inputenc} ?  or (possibly worse) what if a
file uses \usepackage[latin9]{inputenc} and the patterns depend
on utf8 encoding?  (I'm assuming that utf8 is the only sensible
encoding to use now for extended patterns.)

cheers.                                         -- bb

reply via email to

[Prev in Thread] Current Thread [Next in Thread]