bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Enhancement request to wc


From: Pádraig Brady
Subject: Re: Enhancement request to wc
Date: Sun, 8 Feb 2009 22:39:10 +0000
User-agent: Thunderbird 2.0.0.6 (X11/20071008)

Jim Meyering wrote:
> Neo Anderson <address@hidden> wrote:
>> I understand that, so just to ask if it is possible to add a new
>> option e.g.. -utf8 so that wc can count word which is wild characters.
> 
> Do you already have an algorithmic definition of "word"
> that makes sense for the locale(s) you care about?
> If so, does it generalize to any other locales?

Looked very quickly into this.
I don't think there is an algorithm for this.
For languages like Thai, Lao, chinese or Japanese
a dictionary lookup is required to determine words!

http://www.unicode.org/reports/tr29/#Word_Boundaries
http://lists.apple.com/archives/Carbon-dev/2006/Apr/msg00692.html

Coincidentally I noticed Bruno checked some word boundary
stuff into gnulib today:
http://lists.gnu.org/archive/html/bug-gnulib/2009-02/msg00068.html

cheers,
Pádraig.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]