[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#47264: [PATCH] pcre: migrate to pcre2
From: |
Paul Eggert |
Subject: |
bug#47264: [PATCH] pcre: migrate to pcre2 |
Date: |
Mon, 8 Nov 2021 11:53:47 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.1 |
On 11/8/21 01:47, Carlo Arenas wrote:
On Sun, Nov 7, 2021 at 4:30 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
Let me know how to help otherwise.
The main thing from my point of view is that I'd like to know what those
other bugs are. I can split out the patch into pieces if I know what to
look for.
I didn't want anyone hitting on those old PCRE2 bugs though with this
first release, hence why the configure rule is there for now (even if
I am likely going to remove it for the next version)
If it's a PCRE2 bug we can ask people to fix it in their PCRE2 library.
Possibly we should continue to support PCRE1 as a configure-time option;
that would assuage concerns about bugs in PCRE2. More work for us, though.
\C is supported with -P in the PCRE version now though, is removing that ok?
I guess I don't see the harm of supporting \C; why disable it?
If memory serves grep currently takes care to not pass invalid UTF-8 in
the buffer or pattern. Does PCRE2_MATCH_INVALID_UTF make this work obsolete?
not sure I understand what you mean
I guess I was thinking about an older grep version.
Currently grep compiles with PCRE_UTF8 and checks for PCRE_ERROR_BADUTF8
returns from pcre_exec, so it's relying on the count of bytes that this
pcre_exec returns in sub[0] before calling pcre_exec with
PCRE_NO_UTF8_CHECK. So, effectively it's using pcre_exec to check that a
buffer contains valid UTF-8.
I don't see how this works with the proposed patch. It uses sub[0] but I
don't see how it's set. What am I missing?
One more thing I just noticed: this test:
if (PCRE2_ERROR_UTF8_ERR1 <= e || e < PCRE2_ERROR_UTF8_ERR21)
is logically equivalent to the following (which is clearer to me):
if (! (PCRE2_ERROR_UTF8_ERR21 <= e && e < PCRE2_ERROR_UTF8_ERR1))
Shouldn't that be the following instead?
if (! (PCRE2_ERROR_UTF8_ERR21 <= e && e <= PCRE2_ERROR_UTF8_ERR1))