coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: performance bug of `wc -m` on macOS


From: Bruno Haible
Subject: Re: performance bug of `wc -m` on macOS
Date: Mon, 21 May 2018 12:15:07 +0200
User-agent: KMail/5.1.3 (Linux/4.4.0-124-generic; KDE/5.18.0; x86_64; ; )

Pádraig Brady wrote:
> system wcwidth is not implicated here.
> The slow down was attributed to locale_charset().
> At least this should be improved in the next coreutils release with:
> https://git.sv.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=214bf85

I was now able to get a profiling on macOS (with valgrind [note: the
latest release does not work yet, the latest MacPorts build neither,
but the current valgrind git repo works more or less]).

The profiler's output is:

===============================================================================
--------------------------------------------------------------------------------
Profile data file 'callgrind.out.55986' (creator: callgrind-3.14.0.GIT)
--------------------------------------------------------------------------------
I1 cache: 
D1 cache: 
LL cache: 
Timerange: Basic block 0 - 367572452
Trigger: Program termination
Profiled target:  src/wc -m (PID 55986, part 1)
Events recorded:  Ir
Events shown:     Ir
Event sort order: Ir
Thresholds:       99
Include dirs:     
User annotated:   
Auto-annotation:  off

--------------------------------------------------------------------------------
           Ir 
--------------------------------------------------------------------------------
1,517,818,871  PROGRAM TOTALS

--------------------------------------------------------------------------------
         Ir  file:function
--------------------------------------------------------------------------------
264,150,390  ???:_platform_strcmp [/usr/lib/system/libsystem_platform.dylib]
173,606,200  ???:_UTF8_mbrtowc [/usr/lib/system/libsystem_c.dylib]
132,210,556  ../src/wc.c:wc [src/wc]
124,000,000  ???:nl_langinfo_l [/usr/lib/system/libsystem_c.dylib]
106,000,000  ???:querylocale [/usr/lib/system/libsystem_c.dylib]
 88,000,000  ???:__maskrune [/usr/lib/system/libsystem_c.dylib]
 78,000,000  ../lib/localcharset.c:locale_charset [src/wc]
 71,403,400  ???:mbrtowc [/usr/lib/system/libsystem_c.dylib]
 66,000,000  ../lib/uniwidth/width.c:uc_width [src/wc]
 50,003,752  ???:_platform_strchr$VARIANT$Base 
[/usr/lib/system/libsystem_platform.dylib]
 46,200,000  ???:mbsinit [/usr/lib/system/libsystem_c.dylib]
 38,000,038  ???:uselocale [/usr/lib/system/libsystem_c.dylib]
 30,000,000  ???:nl_langinfo [/usr/lib/system/libsystem_c.dylib]
 26,000,000  ../lib/wcwidth.c:rpl_wcwidth [src/wc]
 24,400,862  ???:pthread_getspecific [/usr/lib/system/libsystem_pthread.dylib]
 24,000,000  ???:___mb_cur_max_l [/usr/lib/system/libsystem_c.dylib]
 22,000,000  ../lib/streq.h:rpl_wcwidth
 21,000,000  ???:_UTF8_mbsinit [/usr/lib/system/libsystem_c.dylib]
 20,000,000  /usr/include/_ctype.h:wc
 18,212,211  ???:__vsnprintf_chk [/usr/lib/system/libsystem_c.dylib]
 18,000,000  ../lib/nl_langinfo.c:rpl_nl_langinfo [src/wc]
 16,200,593  ???:rpl_wcwidth [src/wc]
 12,600,000  ../lib/mbchar.h:wc
 12,001,176  ???:os_unfair_lock_unlock [/usr/lib/dyld]
 10,000,055  ???:os_unfair_lock_lock [/usr/lib/dyld]
  8,000,000  ../lib/streq.h:uc_width
  4,596,013  ???:ImageLoader::trieWalk(unsigned char const*, unsigned char 
const*, char const*) [/usr/lib/dyld]
===============================================================================

This does not make perfect sense (no iswprint nor iswspace calls visible,
and ../lib/streq.h does not contain functions). But it still allows to
dissect the time:


mbrtowc:
173,606,200  ???:_UTF8_mbrtowc [/usr/lib/system/libsystem_c.dylib]
 88,000,000  ???:__maskrune [/usr/lib/system/libsystem_c.dylib]
 71,403,400  ???:mbrtowc [/usr/lib/system/libsystem_c.dylib]
 46,200,000  ???:mbsinit [/usr/lib/system/libsystem_c.dylib]
 21,000,000  ???:_UTF8_mbsinit [/usr/lib/system/libsystem_c.dylib]
-----------
400,209,600 = 26%

rpl_wcwidth:
 26,000,000  ../lib/wcwidth.c:rpl_wcwidth [src/wc]
  locale_charset:
124,000,000  ???:nl_langinfo_l [/usr/lib/system/libsystem_c.dylib]
106,000,000  ???:querylocale [/usr/lib/system/libsystem_c.dylib]
 78,000,000  ../lib/localcharset.c:locale_charset [src/wc]
 38,000,038  ???:uselocale [/usr/lib/system/libsystem_c.dylib]
 30,000,000  ???:nl_langinfo [/usr/lib/system/libsystem_c.dylib]
 24,400,862  ???:pthread_getspecific [/usr/lib/system/libsystem_pthread.dylib]
 24,000,000  ???:___mb_cur_max_l [/usr/lib/system/libsystem_c.dylib]
 18,000,000  ../lib/nl_langinfo.c:rpl_nl_langinfo [src/wc]
-----------
442,400,900 = 29%
  uc_width:
 66,000,000  ../lib/uniwidth/width.c:uc_width [src/wc]
-----------
 66,000,000 = 4%


So it is spending 26% in mbrtowc calls (unlike > 50% with glibc).

And it is spending at least 29% in locale_charset, mostly due to nl_langinfo_l
and its associates. I'm saying "at least" because I don't know where to count
the many _platform_strcmp calls.

And this is _after_ all the recent locale_charset optimizations.

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]