[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: performance bug of `wc -m` on macOS
From: |
Bruno Haible |
Subject: |
Re: performance bug of `wc -m` on macOS |
Date: |
Mon, 21 May 2018 20:02:24 +0200 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-124-generic; KDE/5.18.0; x86_64; ; ) |
With the proposed function-pointer-factory changes, I'm seeing
this speedup on macOS systems:
num mbc
Before 0.153 0.229
After 0.042 0.112
-------
Speedup 3.6 2.0
factor
The profiler's output now is:
===============================================================================
--------------------------------------------------------------------------------
Profile data file 'callgrind.out.64367' (creator: callgrind-3.14.0.GIT)
--------------------------------------------------------------------------------
I1 cache:
D1 cache:
LL cache:
Timerange: Basic block 0 - 158573914
Trigger: Program termination
Profiled target: src/wc -m (PID 64367, part 1)
Events recorded: Ir
Events shown: Ir
Event sort order: Ir
Thresholds: 99
Include dirs:
User annotated:
Auto-annotation: off
--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
546,413,341 PROGRAM TOTALS
--------------------------------------------------------------------------------
Ir file:function
--------------------------------------------------------------------------------
134,811,404 ../src/wc.c:wc [src/wc]
103,504,100 ../lib/mbrtowc-factory.c:utf8_mbrtowc [src/wc]
88,000,000 ???:__maskrune [/usr/lib/system/libsystem_c.dylib]
66,000,000 ../lib/uniwidth/width.c:uc_width [src/wc]
46,200,000 ???:mbsinit [/usr/lib/system/libsystem_c.dylib]
21,000,000 ???:_UTF8_mbsinit [/usr/lib/system/libsystem_c.dylib]
16,000,000 /usr/include/_ctype.h:wc
12,600,000 ../lib/mbchar.h:wc
12,200,674 ???:pthread_getspecific [/usr/lib/system/libsystem_pthread.dylib]
10,000,000 ../lib/wcwidth-factory.c:utf8_wcwidth [src/wc]
8,000,000 ../lib/streq.h:uc_width
6,112,126 ???:__vsnprintf_chk [/usr/lib/system/libsystem_c.dylib]
4,596,013 ???:ImageLoader::trieWalk(unsigned char const*, unsigned char
const*, char const*) [/usr/lib/dyld]
4,000,498 ???:rpl_wcwidth [src/wc]
2,067,598 ???:ImageLoaderMachOCompressed::rebase(ImageLoader::LinkContext
const&, unsigned long) [/usr/lib/dyld]
1,773,141 ???:ImageLoaderMachO::libPath(unsigned int) const [/usr/lib/dyld]
768,220 ???:ImageLoaderMachO::findExportedSymbol(char const*, bool, char
const*, ImageLoader const**) const'2 [/usr/lib/dyld]
758,551 ???:_mapStrHash(_NXMapTable*, void const*)
[/usr/lib/libobjc.A.dylib]
683,763 ???:ImageLoader::read_uleb128(unsigned char const*&, unsigned char
const*) [/usr/lib/dyld]
579,216 ???:ImageLoaderMachOCompressed::libReExported(unsigned int) const
[/usr/lib/dyld]
248,565 ???:ImageLoaderMachOCompressed::findShallowExportedSymbol(char
const*, ImageLoader const**) const [/usr/lib/dyld]
204,809 ???:ImageLoaderMachOCompressed::eachBind(ImageLoader::LinkContext
const&, unsigned long (ImageLoaderMachOCompressed::*)(ImageLoader::LinkContext
const&, unsigned long, unsigned char, char const*, unsigned char, long, long,
char const*, ImageLoaderMachOCompressed::LastLookup*, bool)) [/usr/lib/dyld]
203,803 ???:ImageLoaderMachO::findExportedSymbol(char const*, bool, char
const*, ImageLoader const**) const [/usr/lib/dyld]
200,688 ???:strcmp [/usr/lib/system/libsystem_kernel.dylib]
179,635 ???:dyld::loadPhase5(char const*, char const*, dyld::LoadContext
const&, unsigned int&, std::__1::vector<char const*, std::__1::allocator<char
const*> >*) [/usr/lib/dyld]
173,763 ???:_NXMapMember(_NXMapTable*, void const*, void**)
[/usr/lib/libobjc.A.dylib]
173,367 ???:_pthread_mutex_unlock_slow
[/usr/lib/system/libsystem_pthread.dylib]
===============================================================================
Let's dissect the time, as before:
mbrtowc:
103,504,100 ../lib/mbrtowc-factory.c:utf8_mbrtowc [src/wc]
46,200,000 ???:mbsinit [/usr/lib/system/libsystem_c.dylib]
21,000,000 ???:_UTF8_mbsinit [/usr/lib/system/libsystem_c.dylib]
-----------
170,704,100 = 31%
rpl_wcwidth:
locale_charset: 0%
uc_width:
66,000,000 ../lib/uniwidth/width.c:uc_width [src/wc]
10,000,000 ../lib/wcwidth-factory.c:utf8_wcwidth [src/wc]
8,000,000 ../lib/streq.h:uc_width
4,000,498 ???:rpl_wcwidth [src/wc]
-----------
88,000,498 = 16%
No more time is spent in locale_charset!
And the macOS-compatible rewrite of UTF-8 mbrtowc is about 2.3 times faster
than the macOS implementation.
Bruno
- Re: performance bug of `wc -m`, (continued)
- Re: performance bug of `wc -m` on glibc systems, Bruno Haible, 2018/05/20
- Re: performance bug of `wc -m` on glibc systems, Bruno Haible, 2018/05/21
- Re: performance bug of `wc -m` on simulated macOS, Bruno Haible, 2018/05/20
- Re: performance bug of `wc -m` on macOS, Bruno Haible, 2018/05/20
- Re: performance bug of `wc -m` on macOS, Pádraig Brady, 2018/05/20
- Re: performance bug of `wc -m` on macOS, Bruno Haible, 2018/05/21
- Re: performance bug of `wc -m` on macOS,
Bruno Haible <=
- speeding up `wc -m`, Bruno Haible, 2018/05/21
- Re: speeding up `wc -m`, Pádraig Brady, 2018/05/21
- Re: performance bug of `wc -m`, Kaz Kylheku (Coreutils), 2018/05/17
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/17
Re: performance bug of `wc -m`, Kaz Kylheku (Coreutils), 2018/05/17
Re: performance bug of `wc -m`, Bruno Haible, 2018/05/20