[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: performance bug of `wc -m` on glibc systems
From: |
Bruno Haible |
Subject: |
Re: performance bug of `wc -m` on glibc systems |
Date: |
Sun, 20 May 2018 22:43:42 +0200 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-124-generic; KDE/5.18.0; x86_64; ; ) |
Pádraig Brady wrote:
> $ yes áááááááááááááááááááá | head -n100000 > mbc.txt
> $ yes 12345678901234567890 | head -n100000 > num.txt
>
> ===== Before ====
>
> $ time src/wc -Lm < mbc.txt
> 2100000 20
> real 0m0.186s
>
> $ time src/wc -m < mbc.txt
> 2100000
> real 0m0.186s
Profiling this on a glibc system, using the valgrind + kcachegrind
combination [1], I obtain the attached treemap.
$ valgrind --tool=callgrind src/wc -m < mbc.txt
$ kcachegrind callgrind.out.10379 # switch to callee map
My reading of this output is:
* The transformation of the byte strings to a character stream
takes more than 50% of the CPU time. I find this extraordinary
slow, given that what this parsing does is essentially to combine
2 adjacent bytes in a word, according to an algorithmic formula
(no table lookup).
What each step does:
- rpl_mbtowc: Implements the gnulib workaround against
"In the C or POSIX locales, this function can return @code{(size_t) -1}
and set @code{errno} to @code{EILSEQ}"
- mbrtowc: Adapt the invocation conventions of mbrtowc to the invocation
conventions of __gconv_transform_*.
- __gconv_transform_utf8_internal: This is the transformation UTF-8 ->
UCS-4.
It is slow because the __gconv_transform_* functions are optimized to do
long runs of characters all at once (i.e. for iconv() and mbstowcs).
However,
what 'wc' and the coreutils in general do is to convert one character, check
status and update application logic, then convert another character, check
status and update application logic again, etc.
There would be two ways to make 'wc' more performant:
- Change the 'wc' program to transform buffers at once. The problem here
is that it comes with degradation of error-handling logic (ability
to precisely diagnose errors and continue appropriately) and complicates
the code, making it less maintainable.
- Change glibc to add an optimized 'mbrtwoc' function for each encoding
(or at least for the most frequently used encodings: the built-in and
8-bit ones).
I estimate that this optimization would make 60% of the mbrtwoc times
go away, thus speeding up 'wc' by 30%.
* The other operations (print, space, and width, for each character)
are reasonably optimized in glibc.
Bruno
[1] http://www.pixelbeat.org/programming/profiling/
callgrind.out.10379.png
Description: application/kcachegrind
- Re: performance bug of `wc -m`, (continued)
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/16
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/16
- Re: performance bug of `wc -m`, Pádraig Brady, 2018/05/18
- Re: performance bug of `wc -m`, Pádraig Brady, 2018/05/18
- Re: performance bug of `wc -m`, Bernhard Voelker, 2018/05/18
- Re: performance bug of `wc -m`, Pádraig Brady, 2018/05/18
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/18
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/18
- Re: performance bug of `wc -m`, Pádraig Brady, 2018/05/18
- Re: performance bug of `wc -m`, L A Walsh, 2018/05/18
- Re: performance bug of `wc -m` on glibc systems,
Bruno Haible <=
- Re: performance bug of `wc -m` on glibc systems, Bruno Haible, 2018/05/21
- Re: performance bug of `wc -m` on simulated macOS, Bruno Haible, 2018/05/20
- Re: performance bug of `wc -m` on macOS, Bruno Haible, 2018/05/20
- Re: performance bug of `wc -m` on macOS, Pádraig Brady, 2018/05/20
- Re: performance bug of `wc -m` on macOS, Bruno Haible, 2018/05/21
- Re: performance bug of `wc -m` on macOS, Bruno Haible, 2018/05/21
- speeding up `wc -m`, Bruno Haible, 2018/05/21
- Re: speeding up `wc -m`, Pádraig Brady, 2018/05/21
- Re: performance bug of `wc -m`, Kaz Kylheku (Coreutils), 2018/05/17
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/17