lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Best way to integrate PCRE


From: Vadim Zeitlin
Subject: Re: [lmi] Best way to integrate PCRE
Date: Sun, 5 Sep 2021 13:26:11 +0200

On Fri, 27 Aug 2021 16:21:44 +0000 Greg Chicares <gchicares@sbcglobal.net> 
wrote:

GC> On 8/21/21 6:37 PM, Vadim Zeitlin wrote:
GC> > On Wed, 28 Jul 2021 19:45:30 +0000 Greg Chicares 
<gchicares@sbcglobal.net> wrote:
GC> > 
GC> > GC> On 2021-07-28 18:45, Vadim Zeitlin wrote:
GC> > GC> > On Wed, 28 Jul 2021 16:10:34 +0000 Greg Chicares 
<gchicares@sbcglobal.net> wrote:
[...]
GC> I'm not as close to this as you are, so it's hard for me to say
GC> whether one strategy or another is better. But I can speak in
GC> terms of outcomes.

 Hello again,

 I'm sorry to return to this discussion, but after rereading this email
carefully, I've realized that I'm very confused by it because it seems to
directly contradict the "The solution is to withdraw msw support for those
things." phrase from your earlier (2021-07-28 16:10:34) reply, so I need
to understand whether we really can afford not building (a working version
of) test_coding_rules.exe under MSW or not.

GC> For instance:
GC> 
GC> > GC> We'd also need to change 'hooks/pre-commit', e.g. (untested):
GC> > 
GC> >  With my solution above this is not necessary any more as it would run and
GC> > just output an error under MSW.
GC> 
GC> Then I guess it might simply emit a message like:
GC>   "'test_coding_rules' not implemented for msw."

 Yes, exactly, except the exact message is

    test_coding_rules was compiled without PCRE2 support and can't be used.
    Please install PCRE2, define LMI_HAS_PCRE and rebuild it.

which should hopefully be a bit more helpful.

GC> That's okay with me if I deliberately try to run it under msw.
GC> But of course I wouldn't want that outcome otherwise. Let me
GC> try to say that more precisely:
GC> 
GC> (1) In this scenario:
GC> 
GC>   $ LMI_COMPILER=gcc ; LMI_TRIPLET=i686-w64-mingw32    ; . 
/opt/lmi/src/lmi/set_toolchain.sh
GC>   $ $make test_coding_rules.exe
GC> 
GC> if I get a useless binary that only says "not implemented",
GC> that's okay.

 Yes, this is what would happen with my current changes.

GC> (2) In a scenario where I may have done this several months ago:
GC> 
GC>   $ LMI_COMPILER=gcc ; LMI_TRIPLET=i686-w64-mingw32    ; . 
/opt/lmi/src/lmi/set_toolchain.sh
GC> 
GC> and then, having forgotten about that, I do this:
GC> 
GC>   $ make $coefficiency check_concinnity 2>&1 |less -S -N
GC> 
GC> then I want a concinnity check, not a "not implemented" message.

 This is rather surprising. Wouldn't this be misleading, as it could result
in a success even when there are coding rules violations? I would think
that you would want this command to fail rather than to pretend that it
succeeded when it didn't really perform all the checks.

GC> How could that be accomplished?

 The simplest is to just skip the invocation of $(TEST_CODING_RULES) in
check_concinnity target in GNUmakefile, so it's not at all hard to do, but
I'd appreciate if you could confirm that you actually want to skip the
coding rules check in this case, rather than give an error reminding you
that you're not using the LMI_TRIPLET you might be thinking you use
(because, if I understand correctly, the only difference between the
scenarios (1) and (2) is that in the first one you use this triplet
intentionally but in the second one you only use it because you had set it
before and forgot about setting it).

GC> IIRC, years ago we did that for $(XMLLINT).

 Sorry, but this situation doesn't seem similar at all: xmllint is a
standard system utility which (practically) never changes, while
test_coding_rules is lmi own utility which can, and does, change, so needs
to be rebuilt.

GC> As for binaries, if I type
GC>   $ grep
GC> I expect to get the operating system's 'grep', say, '/bin/grep';
GC> that's the kind of behavior I want to achieve--i.e., if I type
GC>   $ make $coefficiency check_concinnity
GC> then I just want a concinnity test: if it's identical for all
GC> architectures, that's okay, because its outcome shouldn't vary by
GC> architecture anyway.

 This is, of course, true, but if we can't build test_coding_rules.exe
for MSW, I'd rather not run concinnity test there at all than running only
a subset of it.

 IOW, it's precisely because the outcome of the test doesn't depend on the
platform and architecture that I think it should be ok to run it only under
Linux and give an error under MSW.

 To summarize, I think the choice is between:

0. Not changing anything in my current version, i.e. concinnity test does
   _not_ work under MSW because there is no working test_coding_rules.exe
   and you have to run it under Linux only. I thought this was acceptable
   from your previous messages, but now I'm not so sure any more.

1. Make concinnity test under MSW skip the coding rules test, but still
   check all the rest. I think this is what you're proposing right now,
   but I don't like this at all because I think it's really misleading.

2. Make concinnity test under MSW use some other test_coding_rules.exe
   that would come from somewhere else -- I only list this possibility
   because I think that you might actually be proposing this, rather than
   (1), but I don't see at all how could this work, because this program
   would need to be rebuilt whenever test_coding_rules.cpp changes, so I
   don't think it's a viable solution. If we really want to test coding
   rules under MSW, we need to build a functional version of the test
   there -- apparently contradicting the previous decision to not do it.

 Could you please confirm whether you actually mean (1) or (2)? And, in
either case, whether (0) might (still) be acceptable?


GC> >  Finally, the real reason for writing this email, is that I've realized I
GC> > forgot to ask another question: what do you think should be done with
GC> > regex_test.cpp?
GC> 
GC> Retain it, removing boost options with no standard parallels.

 At least here I've already done exactly that, so there is not much to
discuss.

GC> > And while it did make sense to compare Boost.Regex
GC> > performance with different combinations of "m" and "s" modifiers, such
GC> > modifiers are not supported by std::regex, so these tests don't make much
GC> > sense neither.
GC> 
GC> Apparently boost implemented "m" and "s" but std::regex did not.

 This is a bit more complicated because Boost uses "s" by default and
provides match_not_dot_newline flag, which is never used in lmi, to disable
this behaviour. I've decided to remain compatible with Boost in pcre::regex
by always using PCRE2_DOTALL flag, so that we don't have to change all the
regexes using "." to match anything, including "\n".

 But std::regex doesn't provide any control over "." atom behaviour when
using ECMAScript ("advanced", but also default) regex syntax and it never
matches line terminators (U+000D, U+000A, U+2029, or U+2028) when using it.

GC> Is it planned to add "m"?

 It's already supported since C++17.

GC> If so, then is contains_regex2()...
GC> 
GC>   /// Match a regex as with Perl's '-s'.
GC>   bool contains_regex2(std::string const& regex)
GC>   {return boost::regex_search(text, boost::regex("(?-s)" + regex));}
GC> 
GC> ...the only thing we should consider removing?

 We could remove it, but I've kept it, using POSIX regex syntax, where "."
always matches any character.

GC> > The only way to approximate these modifiers that I found was
GC> > to use basic regex syntax (as opposed to the ECMA script default
GC> > one), but it doesn't seem really useful to compare them as we're
GC> > almost certainly not going to use the basic syntax for anything.
GC> 
GC> I don't pretend to understand all of this, so let me ask: why wouldn't
GC> we ever use BREs?

 They're unwieldy (need even more backslashes) and strictly less powerful
than the default std::regex ECMAScript, so for me the question is: why
would we ever use them? Surely ideally we'd use a single regex syntax and
not ECMAScript, BREs and EREs (extended POSIX REs) all together.

GC> Is the "basic" syntax what's tested by contains_regex1(), which today says:
GC>     boost::regex const r(regex, boost::regex::sed);
GC> ?

 Yes, Boost defines

        static const syntax_option_type sed = basic;

GC> The comments preceding that function seem to explain why it's not
GC> necessarily a good idea. If that's exactly your point, then it would
GC> be helpful, for me personally, to retain that unit test, because I
GC> probably never will memorize the rationale--the test's purpose then is
GC> not to test std::regex's correctness, but rather to explore alternatives
GC> and document why some are not used.

 I did keep this test, but removed the long comment about boost regex
behaviour which doesn't seem useful if this class is not used any more.

GC> > OTOH it could be useful to compare std::regex performance with PCRE2,
GC> > but this would require adding tests for LMI_HAS_PCRE to this test as
GC> > well and I don't know if it's really worth it.
GC> 
GC> If that's not difficult, I think it would be most illuminating, and
GC> needing to add LMI_HAS_PCRE doesn't seem like a significant disadvantage.

 Thanks, I've added this now (but in a separate commit, to make reviewing
the changes easier if you do it commit-wise) and I don't think I have any
questions remaining about the test: even though most of the tests there
don't seem useful to me, personally, any longer, it's clear that you do see
value in keeping them and it's not a problem to do it.

 However, somewhat unexpectedly, I do have questions about the desired
concinnity check behaviour and it would be great if you could please answer
them. Or, perhaps, alternatively, if it's simpler for you, perhaps I could
submit the patch in the current state, with concinnity test not working
under MSW?

 Thanks in advance!
VZ

Attachment: pgpYMds5VyUVu.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]