Re: Word boundary (was: find-composition still depends on the compositio

From: Eli Zaretskii
Subject: Re: Word boundary (was: find-composition still depends on the composition property)
Date: Sun, 26 Oct 2008 21:32:09 +0200

> From: Kenichi Handa <address@hidden>
> CC: address@hidden, address@hidden
> Date: Sun, 26 Oct 2008 22:36:05 +0900
> In article <address@hidden>, Eli Zaretskii <address@hidden> writes:
> > Unless I'm missing something important, my reading of th UAX #29
> > (http://www.unicode.org/reports/tr29/tr29-13.html) is that almost all
> > scripts should _not_ have word breaks between letters and digits.  And
> > neither should we define a word break on script boundaries, in most
> > cases.
> Although it says "Do not break between most letters. ALetter
> x ALetter", ALetter doesn't include Han, Katakana, and
> Hiragana.

Yes, that's why I said "in most cases".

> And, it also has this note:
> Normally word breaking does not require breaking between
> different scripts. However, adding that capability may be
> useful in combination with other extensions of word
> segmentation. For example, ...

So maybe we should have a user option to enable that, but I think it
should be off by default.

