[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#62983: workaround PCRE2 bug affecting at least \D and \W

From: Paul Eggert
Subject: bug#62983: workaround PCRE2 bug affecting at least \D and \W
Date: Fri, 21 Apr 2023 11:42:50 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0

On 2023-04-20 19:04, Carlo Marcelo Arenas Belón wrote:
All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on
its JIT implementation that results in failure to match for the negative
perl classes, and seems to be easier to replicate when the matching
character is a multibyte one.

Unfortunately that is a little vague. I expect the issue is not limited to \D and \W, as there are other ways to specify negative Perl classes. And if the bug merely seems to be easier to replicate with multibyte characters, it sounds like we may have issues even when matching ASCII characters in a UTF-8 locale.

Furthermore, I'm leery of optimizing for PCRE2 10.42 and earlier. We should focus our optimization efforts on future PCRE2 versions, and not worry about optimizing earlier versions where optimizations complicate maintenance for a declining benefit, and are likely to provoke bugs in older versions that as time passes will be harder to debug.

Alternatively JIT could be disabled instead, but the option selected has
less of an impact on performance.

Disabling JIT sounds better, as correctness trumps performance. Until the bug is fixed (or at least better-understood so that we have a workaround we can trust), how about the attached patch instead?

Attachment: 0001-grep-use-PCRE2-JIT-only-in-unibyte-locales.patch
Description: Text Data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]