[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: performance bug of `wc -m`
From: |
Philip Rowlands |
Subject: |
Re: performance bug of `wc -m` |
Date: |
Sun, 13 May 2018 23:05:11 +0100 |
On Sun, 13 May 2018, at 02:55, Peng Yu wrote:
> Hi,
>
> The following example shows that `wc -m` is even slower than the
> equivalent Python code. Can this performance bug be fixed?
I can reproduce the slow wc behaviour with UTF-8 enabled locales.
$ echo $LANG
en_GB.UTF-8
$ seq 1000000 | time -p wc -c
6888896
real 0.05
user 0.00
sys 0.02
$ seq 1000000 | time -p wc -m
6888896
real 0.60
user 0.58
sys 0.00
$ seq 1000000 | LANG=C time -p wc -m
6888896
real 0.05
user 0.00
sys 0.02
In the slow case, wc is spending most of its time in iswprint / wcwidth /
iswspace. Perhaps wc could learn a faster method of counting utf-8
(https://stackoverflow.com/a/7298149); this may be worthwhile as the trend to
utf-8 everywhere marches on.
I can't explain without more digging why Python's string decode('utf-8') is
better optimised for length calculations.
Cheers,
Phil
- performance bug of `wc -m`, Peng Yu, 2018/05/12
- Re: performance bug of `wc -m`,
Philip Rowlands <=
- Re: performance bug of `wc -m`, Assaf Gordon, 2018/05/14
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/16
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/16
- Re: performance bug of `wc -m`, Pádraig Brady, 2018/05/18
- Re: performance bug of `wc -m`, Pádraig Brady, 2018/05/18
- Re: performance bug of `wc -m`, Bernhard Voelker, 2018/05/18
- Re: performance bug of `wc -m`, Pádraig Brady, 2018/05/18
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/18
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/18
- Re: performance bug of `wc -m`, Pádraig Brady, 2018/05/18