bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#47264: [PATCH v2] pcre: migrate to pcre2


From: Paul Eggert
Subject: bug#47264: [PATCH v2] pcre: migrate to pcre2
Date: Sun, 14 Nov 2021 12:45:29 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.1

On 11/9/21 02:58, Carlo Marcelo Arenas Belón wrote:
Sadly, hadn't been able to generate a release,

Does this mean you're having trouble running 'make dist'? If so, what's the trouble?


it seems to be ready for some broader testing, specially if the
attached patch is applied on top of a 10.37 release (tested that way
in OpenBSD i386)

OK, thanks, I installed it into the Savannah master copy of GNU grep, except that I didn't rename m4/pcre.m4 to m4/pcre2.m4, or rename the macros to use PCRE2. This made the change easier to audit. Revised patch 0001 attached.

Also, I followed up with several related patches (also attached as 0002-0012). Please take a look at them and let us know of any problems. In the attached patch "grep: prefer signed integers" I followed the usual grep approach of preferring signed to unsigned integers (e.g., idx_t to size_t) when either will do; this lets us debug better with -fsanitize=undefined to catch integer overflow.

One issue I discovered: PCRE2_EXTRA_MATCH_WORD (which is used by pcre2grep -w) is incompatible with 'grep -w'. For example, 'echo a%%a | grep -Pw %%' outputs nothing, whereas 'echo a%%a | pcre2grep -w %%' outputs 'a%%a'. I think the GNU grep behavior (which is the same as with 'grep -w', either on Linux or OpenBSD) is more intuitive here: do you happen to know why PCRE behaves the way it does? Is that worth a PCRE2 bug report? Anyway, the attached patches avoid using PCRE2_EXTRA_MATCH_WORD for that reason.


* no more version restrictions (should work with >~10.20)

I tested with 10.00 and found one more glitch (it doesn't have PCRE2_SIZE_MAX), which is fixed by the attached patch "grep: port to PCRE2 10.20".


Pending:
* what to do with the current support of \C (enabled for now)

Let's open another bug report for that; I'm still a bit fuzzy about what the pros and cons are.


* merge of non critical bugfix in #51710[1]

I plan to follow up in that bug report.

Marking this bug as done. Thanks again for working on this.

Attachment: 0001-grep-migrate-to-pcre2.patch
Description: Text Data

Attachment: 0002-maint-minor-rewording-and-reindenting.patch
Description: Text Data

Attachment: 0003-grep-Don-t-limit-jitstack_max-to-INT_MAX.patch
Description: Text Data

Attachment: 0004-grep-improve-pcre2_get_error_message-comments.patch
Description: Text Data

Attachment: 0005-grep-speed-up-fix-bad-UTF8-check-with-P.patch
Description: Text Data

Attachment: 0006-grep-prefer-signed-integers.patch
Description: Text Data

Attachment: 0007-grep-use-PCRE2_EXTRA_MATCH_LINE.patch
Description: Text Data

Attachment: 0008-grep-simplify-JIT-setup.patch
Description: Text Data

Attachment: 0009-grep-improve-memory-exhaustion-checking-with-P.patch
Description: Text Data

Attachment: 0010-grep-use-ximalloc-not-xcalloc.patch
Description: Text Data

Attachment: 0011-grep-fix-minor-P-memory-leak.patch
Description: Text Data

Attachment: 0012-grep-port-to-PCRE2-10.20.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]