bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spe


From: Ed Morton
Subject: Re: Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spec
Date: Tue, 20 Aug 2024 05:54:32 -0500
User-agent: Mozilla Thunderbird

I posted that question at bug-gnulib yesterday:

https://lists.gnu.org/archive/html/bug-gnulib/2024-08/msg00122.html

No response yet, I'll let you know if/when I hear anything.

    Ed.

On 8/20/2024 1:41 AM, arnold@skeeve.com wrote:
It adds considerable complexity into the regexp matchers. Doing this is
(way) beyond my capabilities.

Please ask the Gnulib guys about it (and let me know what they say).
As all of gawk, GNU grep and GNU sed use the routines from Gnulib,
this feature won't be available until they add it.

Arnold

Ed Morton<mortoneccc@comcast.net>  wrote:

Thanks for the quick response, just curious - what makes it a bad
addition, is it extra complexity or worse performance or something else?

      Ed.

On 8/19/2024 8:26 AM,arnold@skeeve.com  wrote:
Hi.

I am aware of it.  Support for this feature can't happen unless and until
GNU regex and GNU dfa, which are both part of Gnulib, support it.
So you might consider asking on the bug-gnulib list what their plans
are for it.

EVEN if those libraries support this feature, I may not add it;
I think this was a bad addition, and I'm quite certain that it's not on
the radar screen for almost any other version of awk.

Realistically, I wouldn't expect to see this appear any time soon.

Arnold

Ed Morton via "Bug reports only for gawk."<bug-gawk@gnu.org>   wrote:

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: cygwin
Compiler: gcc
Compilation CFLAGS: -ggdb -O2 -pipe -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -fstack-protector-strong
--param=ssp-buffer-size=4
-fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.3.0-1.x86_64/build=/usr/src/debug/gawk-5.3.0-1
-fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.3.0-1.x86_64/src/gawk-5.3.0=/usr/src/debug/gawk-5.3.0-1
-DNDEBUG
uname output: CYGWIN_NT-10.0-22631 TournaMart_2023 3.5.3-1.x86_64
2024-04-03 17:25 UTC x86_64 Cygwin
Machine Type: x86_64-pc-cygwin

Gawk Version: 5.3.0

Attestation 1:
       I have read
https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
       Yes

Attestation 2:
       I have not modified the sources before building gawk.
       True

Description:
       The latest POSIX ERE spec
(https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_04_06)
says:
       ----
       Each of the duplication symbols ('+', '*', '?', and intervals) can
be suffixed by the repetition modifier '?' (<question-mark>), in which
case matching behavior for that repetition shall be changed from the
leftmost longest possible match to the leftmost shortest possible match,
including the null match (see A.9 Regular Expressions ). For example,
the ERE ".*c" matches up to and including the last character ('c') in
the string "abc abc", whereas the ERE ".*?c" matches up to and including
the first character 'c', the third character in the string.
       ----
       Gawk doesn't do that (yet) but I assume you're already aware of it
and so this is probably more of a "do you plan to support it and, if so,
what's the current target release?" than a real bug report.

Repeat-By:

       $ echo 'abc abc' | awk '{sub(/b.*c/,"")} 1'
       a

       $ echo 'abc abc' | awk '{sub(/b.*?c/,"")} 1'
       a

       "1" above is correct but "2" should output "a abc" per the new
POSIX spec.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]