[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#21700: new snapshot available: grep-2.21.78-7da30
From: |
Gary Johnson |
Subject: |
bug#21700: new snapshot available: grep-2.21.78-7da30 |
Date: |
Thu, 22 Oct 2015 16:49:34 -0700 |
User-agent: |
Mutt/1.5.20 (2009-06-14) |
On 2015-10-21, Jim Meyering wrote:
> On Wed, Oct 21, 2015 at 1:09 PM, Gary Johnson wrote:
> > On 2015-10-18, Jim Meyering wrote:
> >> > I built the snapshot on two systems, a fairly old one running Ubuntu
> >> > 10.04.4 and a newer one running an up-to-date Linux Mint 17.2.
> >> > 'make check' reported the same two failures on both:
> >> >
> >> > XFAIL: backref-alt
> >> > XFAIL: triple-backref
> >>
> >> Thanks for building and reporting.
> >> Each of those "XFAIL"s indicates an expected failure, so that is the
> >> expected test result, for now.
> >
> > OK, thanks.
> >
> > I also built the snapshot successfully on a Fedora 17 system that I
> > use for real work. I just ran a performance test, FWIW. I searched
> > recursively in our source hierarchy of 6044 regular files and 1102
> > directories for a simple string.
> >
> > time grep -Rin mystring src > /dev/null
> >
> > Here are the results, averaged over three trials each, not including
> > any slow times clearly due to updating caches.
> >
> > 2.12 2.21 2.21.78-7da30
> > ----- ----- -----
> > real 18.0s 1.08s 2.36s
> > user 17.8s 0.96s 2.24s
> > sys 0.12s 0.11s 0.10s
> >
> > Version 2.12 was /bin/grep. The other two versions I built myself.
>
> Thank you for the timings. Next time, please include the following:
This is kind of long, so I'll summarize here. The relatively poor
performance I observed of grep-2.21.78 appears to have been due to
my having built it in an environment tainted with CFLAGS from the
build of another project. A clean build of grep-2.21.78 resulted in
performance only slightly worse than grep-2.21.
> - CPU type/speed
>From lshw (probably more than you wanted):
*-cpu:0
description: CPU
product: Quad-Core Xeon 5xxx
vendor: Intel Corp.
physical id: 5
bus info: address@hidden
version: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
slot: CPU0 PROCESSOR
size: 1596MHz
capacity: 2128MHz
width: 64 bits
clock: 505MHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce
cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
ht tm pbe syscall nx rdtscp constant_tsc arch_perf
mon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm
tpr_shadow vnmi flexpriority ept vpid cpufreq
configuration: cores=4 enabledcores=4 threads=4
*-cache:0
description: L1 cache
physical id: 7
slot: L1 Cache
size: 256KiB
capacity: 256KiB
capabilities: burst internal write-through unified
*-cache:1
description: L2 cache
physical id: 8
slot: L2 Cache
size: 1MiB
capacity: 1MiB
capabilities: burst internal write-back unified
*-cache:2
description: L3 cache
physical id: 9
slot: L3 Cache
size: 4MiB
capacity: 4MiB
capabilities: burst internal write-back unified
*-cpu:1
description: CPU
product: Quad-Core Xeon 5xxx
vendor: Intel Corp.
physical id: 6
bus info: address@hidden
version: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
slot: CPU1 PROCESSOR
size: 1596MHz
capacity: 2128MHz
width: 64 bits
clock: 505MHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce
cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3
cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority
ept vpid cpufreq
configuration: cores=4 enabledcores=4 threads=4
*-cache:0
description: L1 cache
physical id: a
slot: L1 Cache
size: 256KiB
capacity: 256KiB
capabilities: burst internal write-through unified
*-cache:1
description: L2 cache
physical id: b
slot: L2 Cache
size: 1MiB
capacity: 1MiB
capabilities: burst internal write-back unified
*-cache:2
description: L3 cache
physical id: c
slot: L3 Cache
size: 4MiB
capacity: 4MiB
capabilities: burst internal write-back unified
> - file system type (and SSD or spinning rust)
Type: ext4
Size: 1.1 TB
Spinning rust
The file system resides on an LVM logical volume composed of two
physical volumes. One physical volume is on a Seagate ST3250318AS
and the other is on a Western Digital WDC WD1002FAEX-0. I didn't
build the system, so I don't know very much about this.
> - OS version
Fedora 17
Kernel: 3.3.4-5.fc17.x86_64
> - options with which you configured/built grep
Version 2.21:
./configure --prefix=$HOME/src/grep-2.21
make
Version 2.21.78-7da30:
./configure --prefix=$HOME/src/grep-2.21.78
make
gcc is:
gcc (GCC) 4.7.0 20120507 (Red Hat 4.7.0-5)
> - your current locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
> While you see a performance degradation going from 2.21 to the
> first 2.22 release candidate, I see the opposite trend, albeit barely
> measurable:
>
> Searching the following hierarchies, I see a consistent 1% improvement
> going from 2.21 to 2.22 on an Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz.
> The files I searched were on an ext4 file system residing on an SSD
> (OCZ-VERTEX3).
> This system is using fedora rawhide.
>
> $ find [a-g]* -type f|wc -l
> 335065
> $ find [a-g]* -type d|wc -l
> 9667
> $ du -shc [a-g]*
> 25M autoconf
> 125M automake
> 129M bison
> 74M cppi
> 437M cu
> 103M diffutils
> 732M emacs
> 2.3G gcc
> 345M glibc
> 252M gnulib
> 187M grep
> 90M gzip
> 4.7G total
>
> Both grep binaries were compiled with gcc-6.0.something (built from git)
> using ./configure --enable-gcc-warnings && make
>
> Here are best-of-3 timings running this command:
>
> env LC_ALL=en_US.UTF-8 time grep -ri mystring [a-g]* > /dev/null
>
> grep-2.21: 8.05user 1.10system 0:09.17elapsed 99%CPU
> (0avgtext+0avgdata 32876maxresident)k
> 0inputs+0outputs (0major+9986minor)pagefaults 0swaps
>
> grep-2.22: 8.04user 1.04system 0:09.10elapsed 99%CPU
> (0avgtext+0avgdata 32940maxresident)k
> 0inputs+0outputs (0major+9988minor)pagefaults 0swaps
>
> It is critical to mention the locale you use.
> As you see above, I explicitly set LC_ALL=en_US.UTF-8.
> Note that when I switch to LC_ALL=C, it halves those times,
> although the ~1% win with 2.22 still remains
>
> Would you please compile 2.21 yourself, too? Otherwise, the timing may
> be biased by the fact that distribution-provided binaries are often
> better optimized than those one gets when building from sources with
> the default options. If we can identify a modern system for which
> there is anywhere near a 2x performance regression, I would be very
> interested to learn more.
Version 2.21 is one I compiled myself. The distribution-provided
version is 2.12.
Your comments encouraged me to pay more attention to what I was
doing. I compared the config.log files from the grep-2.21 and
grep-2.21.78-7da30 directories and noticed that the environments and
results were slightly different. I noticed that CFLAGS had been set
to "-g -DFEAT_CONCEAL" for a Vim build and had been used when I
built grep-2.21.78. Also, I had built grep-2.21 back in February
and couldn't be sure that nothing relevant had changed on the system
since then.
So I opened a new xterm window, created two new build directories
and untarred, configured and made both grep versions from scratch.
New measurements showed no difference between the two 2.21 builds,
but a significant improvement in the 2.21.78 times. Here are the
new results. The times of successive runs were very close, so I
just chose a representative example of each. In short, 2.21.78
appears _slightly_ slower than 2.21, but not enough (for me) to
worry about.
====================================================================
$ time ~/grep-2.21-new/bin/grep -ri mystring src > /dev/null
real 0m0.814s
user 0m0.725s
sys 0m0.081s
$ time LC_ALL=en_US.UTF-8 ~/grep-2.21-new/bin/grep -ri mystring src > /dev/null
real 0m0.817s
user 0m0.720s
sys 0m0.090s
$ time LC_ALL=C ~/grep-2.21-new/bin/grep -ri mystring src > /dev/null
real 0m0.350s
user 0m0.252s
sys 0m0.094s
====================================================================
$ time ~/grep-2.21.78-new/bin/grep -ri mystring src > /dev/null
real 0m0.849s
user 0m0.756s
sys 0m0.086s
$ time LC_ALL=en_US.UTF-8 ~/grep-2.21.78-new/bin/grep -ri mystring src >
/dev/null
real 0m0.849s
user 0m0.751s
sys 0m0.090s
$ time LC_ALL=C ~/grep-2.21.78-new/bin/grep -ri mystring src > /dev/null
real 0m0.354s
user 0m0.267s
sys 0m0.082s
====================================================================
I'm sorry for wasting your time on a wild goose chase. (But my new
grep works better now!)
Regards,
Gary
bug#21700: new snapshot available: grep-2.21.78-7da30, Jim Meyering, 2015/10/24