bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#47264: [PATCH] pcre: migrate to pcre2


From: Paul Eggert
Subject: bug#47264: [PATCH] pcre: migrate to pcre2
Date: Mon, 8 Nov 2021 11:53:47 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.1

On 11/8/21 01:47, Carlo Arenas wrote:
On Sun, Nov 7, 2021 at 4:30 PM Paul Eggert <eggert@cs.ucla.edu> wrote:

Let me know how to help otherwise.

The main thing from my point of view is that I'd like to know what those other bugs are. I can split out the patch into pieces if I know what to look for.

I didn't want anyone hitting on those old PCRE2 bugs though with this
first release, hence why the configure rule is there for now (even if
I am likely going to remove it for the next version)

If it's a PCRE2 bug we can ask people to fix it in their PCRE2 library.

Possibly we should continue to support PCRE1 as a configure-time option; that would assuage concerns about bugs in PCRE2. More work for us, though.

\C is supported with -P in the PCRE version now though, is removing that ok?

I guess I don't see the harm of supporting \C; why disable it?

If memory serves grep currently takes care to not pass invalid UTF-8 in
the buffer or pattern. Does PCRE2_MATCH_INVALID_UTF make this work obsolete?

not sure I understand what you mean

I guess I was thinking about an older grep version.

Currently grep compiles with PCRE_UTF8 and checks for PCRE_ERROR_BADUTF8 returns from pcre_exec, so it's relying on the count of bytes that this pcre_exec returns in sub[0] before calling pcre_exec with PCRE_NO_UTF8_CHECK. So, effectively it's using pcre_exec to check that a buffer contains valid UTF-8.

I don't see how this works with the proposed patch. It uses sub[0] but I don't see how it's set. What am I missing?

One more thing I just noticed: this test:

  if (PCRE2_ERROR_UTF8_ERR1 <= e || e < PCRE2_ERROR_UTF8_ERR21)

is logically equivalent to the following (which is clearer to me):

  if (! (PCRE2_ERROR_UTF8_ERR21 <= e && e < PCRE2_ERROR_UTF8_ERR1))

Shouldn't that be the following instead?

  if (! (PCRE2_ERROR_UTF8_ERR21 <= e && e <= PCRE2_ERROR_UTF8_ERR1))





reply via email to

[Prev in Thread] Current Thread [Next in Thread]