[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: speeding up `wc -m`
From: |
Pádraig Brady |
Subject: |
Re: speeding up `wc -m` |
Date: |
Mon, 21 May 2018 19:26:08 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 |
On 21/05/18 11:00, Bruno Haible wrote:
> Hi Pádraig,
>
>> $ yes áááááááááááááááááááá | head -n100000 > mbc.txt
>> $ yes 12345678901234567890 | head -n100000 > num.txt
>>
>> ===== Before ====
>>
>> $ time src/wc -m < mbc.txt
>> 2100000
>> real 0m0.186s
>>
>> $ time src/wc -m < num.txt
>> 2100000
>> real 0m0.056s
>
> Here's my take on improving this. I'm attaching draft patches that have
> this effect on the timings:
>
> * On glibc:
>
> num mbc
> Before 0.056 0.152
> After 0.057 0.089
> -------
> Speedup 1.0 1.7
> factor
>
> * On macOS 10.13:
>
> num mbc
> Before 0.153 0.229
> After 0.042 0.112
> -------
> Speedup 3.6 2.0
> factor
>
> Basically, the two problems that the profiling found were:
>
> * It is pointless to call locale_charset repeatedly, because the
> locale won't change while 'wc' is running.
>
> * glibc has a slow mbrtowc() implementation for UTF-8 locales.
>
> Both problems can be addressed with the "abstract factory" design patterns.
> Namely, instead of using the generic 'wcwidth'/'mbrtowc' function each
> time, let the program produce an optimized 'wcwidth'/'mbrtowc' function
> [pointer] once, and then call this optimized function pointer repeatedly
> for each character.
>
> While at it, let me also do the same for the initialization of an mbstate_t,
> because on macOS the mbstate_t is 128 bytes long but only the first 12 bytes
> actually matter.
>
> This factory of function pointers side-steps the portability problems of
> 'locale_t'.
>
> Notes:
> - When you use these new gnulib modules, you are programming against an API
> that is very similar to POSIX, but not exactly POSIX.
> - The platform-specific #ifs have to be adjusted, by the help of configure
> tests.
> - mbrtowc-factory needs a unit test (for which I have a draft).
Wow thanks for doing all that.
It's also worth noting that using wcwidth-factory etc.
will always use the replacement routines on utf8,
which would be a disadvantage on code size and divergence.
Another disadvantage is the change in API which is a bit awkward,
and would need tweaking elsewhere in coreutils to take advantage
of the speedup.
The first locale_charset() issue at least could be dealt with internally
to gnulib with a simple gnulib level config. The attached is a proposed
solution to the charset issue, that would just require depending
on the wchar-single gnulib module to indicate locales don't change across calls.
Now if utf8_mbrtowc() is about 4.5 and 2.3 times faster than mbrtowc()
on glibc and macOS respectively, it's probably worthwhile to replace
unconditionally.
Though it would be cool to do that behind the wchar-single setting
which could setup the appropriate calls/pointers internally to rpl_mbrtowc()?
As for the reduced mbstate settings, it might not be worth
complicating the interface for this perf gain?
cheers,
Pádraig.
gnulib-wchar-single.patch
Description: Text Data
- Re: performance bug of `wc -m`, (continued)
- Re: performance bug of `wc -m` on glibc systems, Bruno Haible, 2018/05/20
- Re: performance bug of `wc -m` on glibc systems, Bruno Haible, 2018/05/21
- Re: performance bug of `wc -m` on simulated macOS, Bruno Haible, 2018/05/20
- Re: performance bug of `wc -m` on macOS, Bruno Haible, 2018/05/20
- Re: performance bug of `wc -m` on macOS, Pádraig Brady, 2018/05/20
- Re: performance bug of `wc -m` on macOS, Bruno Haible, 2018/05/21
- Re: performance bug of `wc -m` on macOS, Bruno Haible, 2018/05/21
- speeding up `wc -m`, Bruno Haible, 2018/05/21
- Re: speeding up `wc -m`,
Pádraig Brady <=
- Re: performance bug of `wc -m`, Kaz Kylheku (Coreutils), 2018/05/17
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/17
Re: performance bug of `wc -m`, Kaz Kylheku (Coreutils), 2018/05/17
Re: performance bug of `wc -m`, Bruno Haible, 2018/05/20