Re: performance bug of `wc -m`

coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: performance bug of `wc -m`

From:	Philip Rowlands
Subject:	Re: performance bug of `wc -m`
Date:	Sun, 13 May 2018 23:05:11 +0100

On Sun, 13 May 2018, at 02:55, Peng Yu wrote:
> Hi,
> 
> The following example shows that `wc -m` is even slower than the
> equivalent Python code. Can this performance bug be fixed?

I can reproduce the slow wc behaviour with UTF-8 enabled locales.

$ echo $LANG
en_GB.UTF-8

$ seq 1000000 | time -p wc -c
6888896
real 0.05
user 0.00
sys 0.02

$ seq 1000000 | time -p wc -m
6888896
real 0.60
user 0.58
sys 0.00

$ seq 1000000 | LANG=C time -p wc -m
6888896
real 0.05
user 0.00
sys 0.02

In the slow case, wc is spending most of its time in iswprint / wcwidth / 
iswspace. Perhaps wc could learn a faster method of counting utf-8 
(https://stackoverflow.com/a/7298149); this may be worthwhile as the trend to 
utf-8 everywhere marches on.

I can't explain without more digging why Python's string decode('utf-8') is 
better optimised for length calculations.

Cheers,
Phil

[Prev in Thread]

Current Thread

[Next in Thread]

performance bug of `wc -m`, Peng Yu, 2018/05/12
- Re: performance bug of `wc -m`, Assaf Gordon, 2018/05/13
  - Re: performance bug of `wc -m`, Peng Yu, 2018/05/13
    - Re: performance bug of `wc -m`, Assaf Gordon, 2018/05/13
    - Re: performance bug of `wc -m`, Peng Yu, 2018/05/13
- Re: performance bug of `wc -m`, Philip Rowlands <=
  - Re: performance bug of `wc -m`, Assaf Gordon, 2018/05/14
  - Re: performance bug of `wc -m`, Eric Fischer, 2018/05/16
    - Re: performance bug of `wc -m`, Eric Fischer, 2018/05/16
    - Re: performance bug of `wc -m`, Pádraig Brady, 2018/05/18
    - Re: performance bug of `wc -m`, Pádraig Brady, 2018/05/18
    - Re: performance bug of `wc -m`, Bernhard Voelker, 2018/05/18
    - Re: performance bug of `wc -m`, Pádraig Brady, 2018/05/18
    - Re: performance bug of `wc -m`, Eric Fischer, 2018/05/18
    - Re: performance bug of `wc -m`, Eric Fischer, 2018/05/18
    - Re: performance bug of `wc -m`, Pádraig Brady, 2018/05/18

Prev by Date: Re: performance bug of `wc -m`
Next by Date: [PATCH] maint: make chmod/chgrp/chown leak free under valgrind
Previous by thread: Re: performance bug of `wc -m`
Next by thread: Re: performance bug of `wc -m`
Index(es):
- Date
- Thread