|
From: | Paolo Bonzini |
Subject: | bug#16481: dfa.c and Rational Range Interpretation |
Date: | Mon, 10 Feb 2014 23:13:42 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 |
Il 10/02/2014 20:50, Paul Eggert ha scritto:
If so, then the above comment doesn't sound right. Without the patch, the DFA matcher mishandles expressionsin some cases, as described in Bug#16481. For example, "grep -Xawk '[\[-\]]'" will cause dfa.c to try to compile the regular expression [[-]], which won't workregardless of whether --with-included-regex is being used.
Ok, so there is a real bug. But it is not immediately obvious what the problem is, and the bug has (AFAICS) no test case and no mention in the commit message. Without this, I am not sure that the fix should not be the one in this commit.
More generally, we already had the problem of subtle differences between dfa.c and full-regexp matching on platforms that do not observe RRI, because dfa.c already uses RRI in multibyte locales, regardless of whether the full matcher uses RRI.
It only does so if the fallback to regex is not requested (dfaexec invoked with backref = NULL). This is never the case for grep. In fact, as far as I know it is never the case, and I've been tempted many times to completely remove the mostly dead code dealing with multibyte ranges if backref = NULL.
The change causes non-"C" unibyte locales to behave consistently with multibyte locales, which in some sense is an improvement (though obviously not ideal; it'd be better if it was RRI everywhere).
It would be if glibc were fixed. For me, consistency with other GNU utilities---especially sed---trumps anything else, and this was the main point in fixing multibyte matching in GNU grep 2.6 and newer.
Non-"C" unibyte locales are dying out, so to some extent this is a minor issue. In practice most users these days won't notice or care about this change.
That's true. Paolo
[Prev in Thread] | Current Thread | [Next in Thread] |