[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Text hyphenation

From: David Kastrup
Subject: Re: Text hyphenation
Date: Sat, 01 Oct 2011 16:30:28 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)

Arthur Reutenauer <address@hidden> writes:

> On Wed, Sep 28, 2011 at 05:31:31PM +0200, Werner LEMBERG wrote:
>> For hyphenation, there exists the libhyphen library (used e.g. in
>> OpenOffice).
>   It implements the original hyphenation algorithm of TeX with
> extensions (I don't know what those are), but it does not have the rules
> for hyphenating the different languages; these are maintained as a
> package for TeX Live and other distributions (I have been one of the
> maintainers for the past three years).  It contains hyphenation patterns
> for over 60 languages; the ones for "major" languages have been
> extensively tested by one generation of TeX users and are probably of
> very high quality: some of them actually come from major publishers or
> researchers of the language at hand (Oxford Univeristy Press for British
> English, for example).

The patterns are not intended to convey a complete set of hyphenations,
but rather a correct subset.  So they should work fine for actually
doing hyphenation, less so for splitting reliably into singing
syllables.  The latter is also problematic since in singing it more
often than not makes sense moving a consonant at the end of the syllable
to the onset of the next syllable.  Which is actually the manner in
which classical Latin and Greek tend to be hyphenated (the rule being
more or less that one hyphenates before the largest consonant group that
could occur at the start of a word).

For the original American set, Knuth states

    as a result, the patterns of plain TeX guarantee complete
    hyphenation of the 700 or so most common words of English, as well
    as common technical words like al-go-rithm. These patterns find
    89.3% of the hyphens in Liang's dictionary as a whole, and they
    insert no hyphens that are not present.

David Kastrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]