[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: horrible utf-8 performace in wc

From: Bo Borgerson
Subject: Re: horrible utf-8 performace in wc
Date: Wed, 07 May 2008 09:50:29 -0400
User-agent: Thunderbird (X11/20080227)

Jim Meyering wrote:
> Bo Borgerson <address@hidden> wrote:
>> I may be misinterpreting your patch, but it seems to me that
>> decrementing count for zero-width characters could potentially lead to
>> confusion.  Not all zero-width characters are combining characters, right?
> It looks ok to me, since there's an unconditional increment
>                 chars++;
> about 25 lines above, so the decrement would just undo that.

Right, I guess my question is more about the semantics of `wc -m'.
Should stand-alone zero-width characters such as the zero-width space be

The attached (UTF-8) file contains 3 characters according to HEAD, but
only two with the patch.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]