bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24615: [PATCH] sed: handle the patterns which consist of ^ or $ manu


From: Assaf Gordon
Subject: bug#24615: [PATCH] sed: handle the patterns which consist of ^ or $ manually
Date: Thu, 20 Oct 2016 22:08:34 -0400

Hello Norihiro,

Thanks for the update, I hope to have the tests soon.

For ease of discussion, attached all the accumulated patches as one file.

I'd like to ask about the DFA improvements in general (for my understanding):
First, every regex (whether in a sed address or in s/// command)
is compiled by both the previous regex engine (re_compile_match from gnulib's 
regcomp.c)
and by the new DFA engine.
If the regex is simple anchor, the regex compilation is skipped (which is this 
optimization thread).

Then, every executed regex is tested as follow:
1. If begline/endline - match without running regex at all

2. If the regex search starts at the beginning of the buffer (i.e. not "s///g"),
   2.1. check with dfa/superset (if such superset exists).
        If no match is found - return "no-match".
   2.2. fall back to normal dfa.
        If no match is found - return "no-match".
   2.3. There one more case which returns a match:
           if (!regsize && (regex->flags & REG_NEWLINE) && !backref)
              return 1;
        I'm not sure which cases it covers?

3. If DFA did not reject the match,
   then the 'old' regex engine is executed with 're_search'
   (or the custom code for 're_search' for multiline+NUL)

Is this description correct ?

In otherwise, the optimization of DFA is meant to
cover the more common case of no-matching,
at the expense of slightly more work when there is a possible match?

Lastly,
Could you expand on dfa's 'superset' and 'isfast' ?
what are the cases which there is/no superset,
and when is a DFA regex fast?


Many thanks!
 - assaf

Attachment: sed-regexp-optimizations-2016-10-20.patch.gz
Description: GNU Zip compressed data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]