lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Best way to integrate PCRE


From: Vadim Zeitlin
Subject: Re: [lmi] Best way to integrate PCRE
Date: Wed, 29 Sep 2021 01:28:44 +0200

On Tue, 28 Sep 2021 20:09:03 +0000 Greg Chicares <gchicares@sbcglobal.net> 
wrote:

GC> On 9/18/21 1:26 PM, Vadim Zeitlin wrote:
GC> [...]
GC> > My branch is called "use-pcre" and the corresponding
GC> > pull request is at https://github.com/let-me-illustrate/lmi/pull/190
GC> After working with it for a fairly short while, I have two comments.

 Thanks for looking at it!

GC> (1) (Trivial but important) The 'pcre_regex.hpp' header should contain
GC> 'LMI_HAS_PCRE' guards, without which 'make check_physical_closure'
GC> fails for MinGW.

 Sorry about this, I've missed it once again because I was lulled into the
false sense of security by the CI builds passing. I'll have to understand
why did this happen and, ideally, how to prevent it from happening again in
the future, but I'd rather do it tomorrow to avoid doing something hasty
I'll regret later today.


GC> They're trivial to add, e.g.:
GC> 
GC> --8<--
GC>  #include "config.hpp"
GC> +
GC> +#if defined LMI_HAS_PCRE
GC> 
GC>  /// Currently we always use 8-bit version of PCRE2 library and assume UTF-8
GC> -->8--
GC> 
GC> ...and then at the bottom...
GC> 
GC> --8<--
GC> + 
GC> +#endif // defined LMI_HAS_PCRE
GC>  
GC>  #endif // pcre_regex_hpp
GC> -->8--
GC> 
GC> but I think I should ask you to do it, because I tried, and you can
GC> read my tale of woe in this modification that I tried adding via
GC> 'git rebase --interactive' to the first commit:
GC> 
GC>     IMPORTANT MODIFICATION: This commit doesn't match the original merge
GC>     request. The 'pcre_regex.hpp' header has been modified, to guard its
GC>     substance with a macro. Without this modification, the important
GC>     'check_physical_closure' makefile target failed; that could break any
GC>     later 'git bisect'. It's possible that 'git bisect' could fail anyway,
GC>     as this modification had to be committed with '--no-verify'.
GC> 
GC> By the last sentence, I realized I had departed the true path.

 I'm sorry, maybe I'm even more tired than I thought (and, to be honest,
this convinced me to postpone looking into the physical closure check
tomorrow), but what exactly is wrong with this change and why did it have
to be committed with '--no-verify'?


GC> (2) If 'test_coding_rules' must be rebuilt (e.g. because 
'test_coding_rules.cpp'
GC> has been modified) in a i686-w64-mingw32 environment, this failure occurs:

[... error due to trying to link x86_64-pc-linux-gnu executable with
     i686-w64-mingw32 libraries snipped ...]

GC> Alternatively, though, wouldn't it be better just to do the following:
GC> 
GC>   make LMI_TRIPLET=x86_64-pc-linux-gnu $coefficiency check_concinnity
GC> 
GC> as discussed here:
GC>   https://lists.nongnu.org/archive/html/lmi/2021-09/msg00004.html
GC> with obligatory changes to lists of pastable commands, git hooks,
GC> and the other few places identified by 'git grep -l check_concinnity'?

 It looks wrong to me that "$(MAKE) LMI_TRIPLET=whatever target" in the
makefile doesn't work and I think it'd be worth fixing this, but I'll again
postpone actually doing it until tomorrow.

GC> BTW, I'm not seeing much speed improvement yet, in these
GC> 'make check_concinnity' timings (all measured with 'test_coding_rules'
GC> already built):
GC> 
GC>                 i686  x86_64
GC>   before PCRE:  3.93   3.41
GC>   after  PCRE:  3.24   3.12 [here, both use an ELF binary]

 The only explanation I see is that your CPU (it doesn't matter that there
are many of them for this test, as only one is used) is so fast that the
execution time is dominated by the IO, but this seems very suspicious to me
because your IO subsystem should be pretty fast too.

GC> but parallelism can increase speed, later, and it's well worthwhile
GC> to get rid of boost-1.33.1 once and for all.

 Yes, the initial goal wasn't to optimize anything, but to get rid of Boost
and the optimization only arose as a way to avoid unacceptable slowdown
when using std::regex.

 BTW, things really didn't improve for std::regex since the decision not
to use it was taken 5 (?) years ago, as now there is talk about deprecating
it entirely in the standard library, due to having too many problems not
only with its performance, but also its API (which I agree is pretty bad,
using option bits selecting the regex dialect to use is a horrible idea)
and poor Unicode support.

 Thanks again for looking at this and I should be able to write a more
constructive reply tomorrow,
VZ

Attachment: pgpmLgMwCLzwP.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]