lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Horrible std::regex performance


From: Vadim Zeitlin
Subject: [lmi] Horrible std::regex performance
Date: Sun, 10 Jul 2016 17:00:18 +0200

 Hello,

 I'm embarrassed to say that after spending quite some time on replacing
boost::regex with std::regex in lmi I realized that the performance of the
latter is absolutely horrendous relatively to the former. The tests in
regex_test.cpp don't make much sense from the point of view of matching
multiline strings because '.' never/always matches the new line in the
default ECMAScript/any of POSIX regex syntaxes, but their timings are still
instructive: in my Windows 7 VM using g++ 4.9.1 I measured the following:

// Current version using boost::regex
  early 0:   4.374e-006 s =       4374 ns, mean of 2287 iterations
  early 1:   2.362e-005 s =      23622 ns, mean of 424 iterations
  early 2:   1.393e-005 s =      13933 ns, mean of 718 iterations
  early 3:   1.064e-005 s =      10639 ns, mean of 941 iterations

  late  0:   4.924e-005 s =      49241 ns, mean of 204 iterations
  late  1:   2.640e-004 s =     264000 ns, mean of 100 iterations
  late  2:   1.635e-004 s =     163459 ns, mean of 100 iterations
  late  3:   1.114e-004 s =     111366 ns, mean of 100 iterations

  never 0:   4.247e-005 s =      42468 ns, mean of 236 iterations
  never 1:   2.291e-004 s =     229130 ns, mean of 100 iterations
  never 2:   9.301e-005 s =      93008 ns, mean of 108 iterations
  never 3:   7.082e-005 s =      70825 ns, mean of 142 iterations

// New version using std::regex
  early 0:   4.121e-006 s =       4121 ns, mean of 2427 iterations
  early 1:   3.064e-004 s =     306402 ns, mean of 100 iterations
  early 2:   3.010e-004 s =     300982 ns, mean of 100 iterations
  early 3:   2.977e-004 s =     297691 ns, mean of 100 iterations

  late  0:   5.138e-005 s =      51380 ns, mean of 195 iterations
  late  1:   3.681e-003 s =    3680758 ns, mean of 100 iterations
  late  2:   3.612e-003 s =    3611652 ns, mean of 100 iterations
  late  3:   3.649e-003 s =    3649249 ns, mean of 100 iterations

  never 0:   4.249e-005 s =      42492 ns, mean of 236 iterations
  never 1:   3.617e-003 s =    3617446 ns, mean of 100 iterations
  never 2:   3.540e-003 s =    3540137 ns, mean of 100 iterations
  never 3:   3.559e-003 s =    3559489 ns, mean of 100 iterations

// New version using std::regex with std::regex::optimize flag
  early 0:   4.110e-006 s =       4110 ns, mean of 2434 iterations
  early 1:   3.098e-004 s =     309810 ns, mean of 100 iterations
  early 2:   3.055e-004 s =     305491 ns, mean of 100 iterations
  early 3:   3.040e-004 s =     303952 ns, mean of 100 iterations

  late  0:   5.047e-005 s =      50465 ns, mean of 199 iterations
  late  1:   3.677e-003 s =    3677227 ns, mean of 100 iterations
  late  2:   3.605e-003 s =    3605134 ns, mean of 100 iterations
  late  3:   3.608e-003 s =    3607741 ns, mean of 100 iterations

  never 0:   4.291e-005 s =      42906 ns, mean of 234 iterations
  never 1:   3.634e-003 s =    3634065 ns, mean of 100 iterations
  never 2:   3.543e-003 s =    3542562 ns, mean of 100 iterations
  never 3:   3.543e-003 s =    3542668 ns, mean of 100 iterations


 Of course, measuring inside a VM is not recommended, generally speaking,
but the difference here is so huge that it's definitely not specific to
using a VM and I obtain very similar results on a physical Linux machine.

 So there is a worse than 10 *times* slowdown and "optimize" flag doesn't
help at all (not unexpectedly, with such extremely simple regexes). I
really don't know what were libstdc++ developers thinking and why couldn't
they adapt the existing Boost.Regex code, but in practice it's clear that
std::regex must not be used for anything remotely performance-sensitive
(notice that Boost.Regex was already known to be quite slow, e.g. PCRE is
significantly faster).

 I have to admit that I don't really know how to proceed from here. I can
finish my patches and submit them, but do we really want to apply them
considering the benchmark results above? Is the convenience of not having
to build Boost.Regex worth making regex matching ~15 times slower? Or is it
still worth finishing the patches even if they're not going to be applied
just to keep them for the future when libstdc++ implementation hopefully
becomes less awful? I could test std::regex performance with g++-6, should
I do this?

 Thanks in advance and sorry for failing to notice this (huge, IMO) problem
sooner,
VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]