lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Horrible std::regex performance


From: Vadim Zeitlin
Subject: Re: [lmi] Horrible std::regex performance
Date: Fri, 15 Jul 2016 23:54:27 +0200

On Mon, 11 Jul 2016 22:18:23 +0000 Greg Chicares <address@hidden> wrote:

GC> The only one where speed matters is 'test_coding_rules.cpp', and there it
GC> does matter very much. Would it be easy for you to time that by measuring
GC> the 'check_concinnity' target with both regex implementations?

 Here are the results (all times in seconds):

Platform     Compiler   boost::regex        std::regex     PCRE
----------------------------------------------------------------------------
Linux         gcc 4.9            5.9              27.7      1.9
WINE          gcc 4.9            1.2              93.5
MSW           gcc 4.9            1.1              75.1
              msvc 14            9.9              32.9

Here boost::regex is the current version, std::regex is the version
replacing it with std equivalent and PCRE is an experimental version using
PCRE library which has a completely different API but I decided to check it
because it's supposed to be faster. And I included the benchmarks for the
MSVC version just for completeness.

 As usual with benchmarking, the results are quite incomprehensible. The
Linux and MSW machines are not the same, but should be roughly similar in
performance and the WINE line shows that the MSW build of the program
outperforms the native build by a factor 5 even including WINE overhead on
the _same_ machine which is very strange and must indicate either some
really brilliant optimizations in the MSW version of boost::regex or
something stupid done in the POSIX version. But the fact that std::regex
under MSW performs 2 orders of magnitude slower than boost::regex when
there is "only" 3-5 times difference in performance for the Linux and
MSVC-built MSW version is almost even more amazing. I honestly have no idea
how to explain this, FWIW I just see that all processes are CPU-bound. I
don't know if it's worth it to take the time to profile this, I'd be very
curious to do it, but I'm not sure if it's going to yield anything really
useful.

 I didn't bother benchmarking the PCRE version under MSW because
boost::regex performance is excellent there with MinGW anyhow. Even if PCRE
is 2-3 times faster than it there too, as it is under Linux, it probably
doesn't matter much because 1s or 0.3s doesn't make for a big difference.
So the only reason to prefer PCRE would be if you preferred its (C) API to
boost/std::regex, but you probably don't (see
http://www.pcre.org/original/doc/html/pcreapi.html if you're curious).


GC> Even if this makefile line:
GC>     @-$(TEST_CODING_RULES) *
GC> is a bottleneck, perhaps it could be parallelized.

 Before starting doing this at C++ level, notice that without any changes
to the code, GNU parallel can be used to parallelize the execution at the
shell level, e.g.

        % git ls-files|fgrep -v /|parallel ./test_coding_rules > /dev/null

(fgrep is used to avoid checking the files in subdirectories as "*" above
doesn't do it and ">/dev/null" is used to avoid tons of summary output).
Doing this brings down the time to 2.1s for the current version and 9.0s
for the std::regex one under Linux. Unsurprisingly, it doesn't really help
with the MSW boost::regex version running under WINE as it's already
blazingly fast and the extra process startup overhead makes it only slower.
It does help with the std::regex version under WINE, but it's still much
slower at ~30s. Unfortunately I don't see comparable speed up when using
parallel under Cygwin, the best I can get is 10% improvement which is not
really that significant.


 To summarize, I have no idea how to explain the results that I obtained.
All I can say is that I double-checked them multiple times and there
doesn't seem to be anything obviously wrong with my testing/benchmarking.
There definitely is something very wrong with std::regex which is much
slower than already not very fast (at least under Linux) boost::regex.
However by parallelizing the code using it we should be able to achieve
"just" a factor 2 slowdown compared to boost::regex, at least under Linux.
Under MSW I'd have to do it first to have any idea about its performance,
the results above convincingly show that it's just impossible to predict.

 Please let me know if you'd like to:

0. Not touch anything and stay with Boost.Regex.
1. Switch to std::regex and parallelize test_coding_rules.
2. Switch to something else (PCRE, PCRE2, another fast regex library).

or something else. All I can say is that I don't think switching to
std::regex without doing the parallelization work is a good idea because
replacing a roughly 1 second wait with a more than a minute and a half one
isn't very conductive to enhancing productivity.

 Thanks in advance,
VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]