bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #57594] sync hyphenation pattern files with TeX versions


From: G. Branden Robinson
Subject: [bug #57594] sync hyphenation pattern files with TeX versions
Date: Sat, 10 Jul 2021 11:08:17 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0

Update of bug #57594 (project groff):

                Severity:             5 - Blocker => 3 - Normal             
                  Status:                    None => Postponed              
                 Summary: sync groff hyphenation-pattern files with upstream
TeX versions => sync hyphenation pattern files with TeX versions

    _______________________________________________________

Follow-up Comment #4:

I've carried this as far as I can for the moment.

To do more is going to require adding support for UTF-8 input in the
hyphenation pattern files, as that's what the TeX hyph-utf8 project uses for
Czech, French, German, and Swedish.

We got lucky with English and Italian, which use ASCII.

On the bright side, finding or writing a UTF-8 input parser we can use for
reading hyphenation pattern files would, unless we're very unlucky, go a long
way toward equipping us to handle UTF-8 generally.  If we stick the routines
in libgroff, several pieces of the GNU roff system can use them.  The big lift
for troff itself is going to be moving the stuff in src/roff/troff/input.h to
Unicode-safe code points, something in the Private Use Area, I reckon.

Dropping the severity because we've done what we can for now, I think.


commit b2284ab01d2d87507f3bcbd7de2a081efb6528a6
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
Date:   Sun Jul 11 00:50:27 2021 +1000

    Update English hyphenation patterns.
    
    * NEWS: Add item.
    * tmac/hyphen.en: Update file using `hyph-en-us.tex` patterns file from
      the TeX hyph-utf8 project.
    * tmac/hyphenex.en: Remove explicit hyphenations for words that no
      longer require them when using the new patterns.  Add one item scraped
      from an erratum comment in hyphen.en ("dem-o-crat").
    
    The new patterns likely _will_ change the automatic hyphenation break
    points of your English documents.  Here is a sample of affected words
    found within groff's own documentary corpus.
    
    OLD                     NEW
    ===                     ===
    ar‐range‐ment           arrange‐ment
    col‐umns                columns
    con‐struc‐ted           con‐structed
    cus‐tom‐ized            cus‐tomized
    def‐i‐ni‐tions          de‐f‐i‐n‐i‐tions
    der‐i‐va‐tions          de‐riva‐tions
    hy‐phen‐a‐tion          hy‐phen‐ation
    ma‐te‐rial              ma‐te‐r‐ial
    Mi‐cro‐soft             Mi‐crosoft
    pipe‐lines              pipelines
    post‐pro‐ces‐sors       post‐proces‐sors
    pro‐cessed              processed
    pro‐cesses              processes
    spa‐ces                 spaces
    Wer‐ner                 Werner
    
    Partially addresses <https://savannah.gnu.org/bugs/?57594>.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57594>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]