lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Parallel blues


From: Vadim Zeitlin
Subject: [lmi] Parallel blues
Date: Mon, 18 Jul 2016 21:55:14 +0200

 Hello,

 So I tried to parallelize test_coding_rules. As expected, doing it turned
out to be very simple and it took me about 30 minutes to write the code (I
also had spent some time before profiling the existing code to see if it
really spent its time where I thought it did and, for once, there were no
surprises as 95% of the time was consumed by find_impl() inside
Boost.Regex).

 And then, and I can't even seriously claim that it was unexpected, even
though I definitely didn't expect this to happen, I ran into many problems.


 First of them, already seen with MSVC build, is that even though I used a
mutex to serialize the error messages logged to standard output and error
streams, the relative order of these messages is not defined any more and
so test_coding_rules.sh is broken by this change. This could be fixed by
storing all error messages in a map indexed by the order of the
corresponding file on the command line, but I'd rather like to propose to
change test_coding_rules.sh to sort both the expected and observed output
because it doesn't actually matter in which order it's given, I think, and
it doesn't seem worth complicating the code just to generate it in the same
order. But this was just something I didn't think about, so it's another
thing to do, or maybe to decide not to do, so it's not too bad so far.


 What was really bad is that I discovered that the compiler we use doesn't
support C++11 thread library *at all*. It comes with the required headers,
such as <thread>, <mutex>, ... but they don't actually define std::thread,
std::mutex etc. I must admit I've somehow completely missed this when
choosing the version of MinGW to use some time ago and now I feel pretty
stupid because of this as you can find that, indeed, the version of MinGW
using Win32 thread model does not support C++11 threading (although, in my
defense, it's hardly advertised in big letters everywhere neither).

 So to use the code I've written we would need to switch to using the POSIX
threads version and also distribute the POSIX threads emulation library
libwinpthread-1.dll used by it to implement its (inefficient, but better
than nothing) implementation of C++11 threading. Obviously, changing the
compiler -- again -- is not a step to be undertaken lightly, so I've looked
for possible alternatives. And, interestingly enough, I did find one at
https://github.com/meganz/mingw-std-threads. As explained there, this
provides a simple header-only implementation of C++11 threading reusing
some of the classes provided by libstdc++. This is clearly a hack but, at
least in my limited testing, it does work and if this is really going to be
integrated into libstdc++ (there are discussions about doing this...),
maybe this could be an acceptable temporary solution, especially as long as
threads are only used in test_coding_rules which is an internal utility
only. If you decide to use this solution, please copy the header files from
the repository above to /opt/lmi/local/include, so that they could be found
(see 
https://github.com/vadz/lmi/commit/4fb92ff4eeed9dcad344785acc6fc7a4b88ac7f0).

 For completeness, I'd also like to mention that the decision to not go
with TDM-GCC seems more and more regrettable retroactively as this compiler
does support C++11 threading out of the box (and even links libwinpthread-1
statically into its libstdc++, so saving the bother with distributing it)
and just generally seems to be more better thought through. Basically
whenever I am banging my head against the wall crying "What could MinGW-w64
developers be possibly thinking?", I discover that the TDM-GCC maintainer
has made a different, and better, choice, and it's a rather good sign,
isn't it?

 The main problem with TDM-GCC is, of course, that Debian doesn't provide a
cross-compiler based on it. And, at least in this case, it's really nice to
use Debian packages because they provide both Win32 and POSIX threading
versions of MinGW-w64 and switching from the former (default) to the latter
was as simple as

        # update-alternatives --set i686-w64-mingw32-g++ 
/usr/bin/i686-w64-mingw32-g++-posix

(of course, specifying compiler on the configure command line explicitly
would have worked as well). But except for this, and if we do decide to
switch the compiler again, TDM-GCC really seems like a much more reasonable
alternative. Just compare the time it takes to find the POSIX/SJLJ build of
MinGW-w64 (if you don't want to waste time on searching for it, let me
point to 
https://sourceforge.net/projects/mingw-w64/files/Toolchains%20targetting%20Win32/Personal%20Builds/mingw-builds/4.9.1/threads-posix/sjlj/i686-4.9.1-release-posix-sjlj-rt_v3-rev3.7z/download)
with the time it takes to select the SJLJ version (no need for POSIX here)
at http://tdm-gcc.tdragon.net/


 Anyhow, back to the unexpected problems, knowing that I use POSIX compiler
version when cross-compiling now and the mingw-std-threads hack when using
the official build system.

 The next discovery is that Boost.Regex is not thread-safe when compiled
with gcc. Maybe the way we use it (with static regex objects) is not
supported at all or maybe it's just a bug (the thread at
http://boost-users.boost.narkive.com/UFFZTr85/boost-1-33-1-boost-regex-constructor-is-not-threadsafe
seems to favour the latter hypothesis, but the link in it is dead now, so
I'm not sure). In any case, running the parallel version with it results in
assertion failures inside Boost.Regex code and wrong results. I didn't
bother debugging this or checking if the problem was fixed in later
Boost.Regex versions as the goal is to get rid of it anyhow, but this does
mean that I don't have benchmarks for it with gcc. It also might explain
why is it so much faster with gcc than with MSVC, as the latter one is
thread-safe.

 I do have benchmarks for the other configurations however, let me
summarize them by completing the previously posted table

                        Original single threaded      New parallel version
                       --------------------------+--------------------------
Platform     Compiler   boost      std    PCRE   |  boost    std       PCRE
-------------------------------------------------+--------------------------
Linux         gcc 4.9     5.9      27.7    1.9   |           8.8        0.5
MSW           gcc 4.9     1.1      75.1          |          13.8        0.4
              msvc 14     9.9      32.9          |    1.5    6.1
-------------------------------------------------+--------------------------

Linux machine has 8 logical cores, MSW one has 12 of them but it probably
doesn't affect the results that much because, at least under Linux, the
degree of concurrency doesn't seem to matter, see the benchmarks I did for
std::regex varying the number of threads:

        Threads Time (s)
        ----------------
        1       27.7
        2       16.4
        3       11.5
        4        8.7
        5        8.8
        6        8.7
        7        8.8
        8        8.8
        9        8.8
        10       8.9
        11       8.8
        12       8.8


 Conclusion: we get slightly bigger speed up from implementing parallelism
in the code than from using GNU parallel (remember that it ran in 9.0s
instead of 8.8 under Linux). The biggest advantage of doing it in the code
is, of course, that it works under MSW. The biggest problem is that the
code can't be compiled using the current compiler. The second problem is
that even after doing this, it still takes 13+ seconds which is not quite
instantaneous as I'd like (and it now uses all of the CPUs during this
time, so if one or more of them are already doing something, it could be
even longer). Switching to PCRE would totally avoid the second problem as
it will really be almost instantaneous then, but won't help with the first
one, unfortunately, except if we decide that ~2s is not too long and that
we can continue to use the current, non parallel, version.

 As usual, I'm afraid I have to conclude by admitting that I don't know
what to do next. If you have any preferences, please let me know, although
it's not urgent as I probably won't be able to do anything significant
until the end of month. FWIW my current work (not including various
experiments) is in the parallel-coding-rules-test branch on GitHub, see

https://github.com/vadz/lmi/compare/parallel-coding-rules-test?expand=1

Of course, this is not ready for integration yet. And if you want to use it
at all, you need to merge/apply either std::regex or pcre branch/patch
with/on top of this one as with Boost.Regex it will just crash (when doing
this merge/apply with std::regex, there will be a small conflict in the
beginning of the file which should be resolved in the straightforward way
by keeping both the changes from this branch and that from std::regex). And
you also need to use the POSIX threading version of MinGW or the workaround
for the Win32 threading version described above.


 Please let me know if you have any questions and whether I can do anything
else here, thanks in advance,
VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]