coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: speeding up `wc -m`


From: Pádraig Brady
Subject: Re: speeding up `wc -m`
Date: Sat, 23 Jun 2018 19:46:55 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

On 21/05/18 19:26, Pádraig Brady wrote:
> Basically, the two problems that the profiling found were:
> 
>   * It is pointless to call locale_charset repeatedly, because the
>     locale won't change while 'wc' is running.
> 
>   * glibc has a slow mbrtowc() implementation for UTF-8 locales.
> 
> Both problems can be addressed with the "abstract factory" design patterns.

I'm going to apply my whar-single module to gnulib to tweak
it so the main bottleneck of calling locale_charset repeatedly
is removed from wcwidth() and mbrtowc(), in a simple manner,
without the need for another API.

It's very interesting that the system mbrtowc() implementations
don't look to be optimized for utf8, being 4.5 and 2.3 times slower
than utf8_mbrtowc() on glibc and macOS respectively.
It would be useful to follow this up with the glibc folks at least,
so that everyone could benefit, without any code changes.

cheers,
Pádraig



reply via email to

[Prev in Thread] Current Thread [Next in Thread]