bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spe


From: arnold
Subject: Re: Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spec
Date: Wed, 21 Aug 2024 05:34:21 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Thanks for the update.

Ed Morton <mortoneccc@comcast.net> wrote:

> Arnold, I got a couple of responses:
>
> Bruno Haible:
> > What Arnold said [1], holds for me as well: It's well beyond my 
> > capabilities.
> > All I could help with are a test suite and some configure tests.
>
> Paul Eggert:
> > I don't know of any plans. It'd be nontrivial to add the 
> > functionality. We'd like to have it of course. 
>
>
> On 8/20/2024 6:02 AM, arnold@skeeve.com wrote:
> > OK, much thanks.
> >
> > Ed Morton<mortoneccc@comcast.net>  wrote:
> >
> >> I posted that question at bug-gnulib yesterday:
> >>
> >> https://lists.gnu.org/archive/html/bug-gnulib/2024-08/msg00122.html
> >>
> >> No response yet, I'll let you know if/when I hear anything.
> >>
> >>       Ed.
> >>
> >> On 8/20/2024 1:41 AM,arnold@skeeve.com  wrote:
> >>> It adds considerable complexity into the regexp matchers. Doing this is
> >>> (way) beyond my capabilities.
> >>>
> >>> Please ask the Gnulib guys about it (and let me know what they say).
> >>> As all of gawk, GNU grep and GNU sed use the routines from Gnulib,
> >>> this feature won't be available until they add it.
> >>>
> >>> Arnold
> >>>
> >>> Ed Morton<mortoneccc@comcast.net>   wrote:
> >>>
> >>>> Thanks for the quick response, just curious - what makes it a bad
> >>>> addition, is it extra complexity or worse performance or something else?
> >>>>
> >>>>        Ed.
> >>>>
> >>>> On 8/19/2024 8:26 AM,arnold@skeeve.com   wrote:
> >>>>> Hi.
> >>>>>
> >>>>> I am aware of it.  Support for this feature can't happen unless and 
> >>>>> until
> >>>>> GNU regex and GNU dfa, which are both part of Gnulib, support it.
> >>>>> So you might consider asking on the bug-gnulib list what their plans
> >>>>> are for it.
> >>>>>
> >>>>> EVEN if those libraries support this feature, I may not add it;
> >>>>> I think this was a bad addition, and I'm quite certain that it's not on
> >>>>> the radar screen for almost any other version of awk.
> >>>>>
> >>>>> Realistically, I wouldn't expect to see this appear any time soon.
> >>>>>
> >>>>> Arnold
> >>>>>
> >>>>> Ed Morton via "Bug reports only for gawk."<bug-gawk@gnu.org>    wrote:
> >>>>>
> >>>>>> Configuration Information [Automatically generated, do not change]:
> >>>>>> Machine: x86_64
> >>>>>> OS: cygwin
> >>>>>> Compiler: gcc
> >>>>>> Compilation CFLAGS: -ggdb -O2 -pipe -Wall -Werror=format-security
> >>>>>> -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector-strong
> >>>>>> --param=ssp-buffer-size=4
> >>>>>> -fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.3.0-1.x86_64/build=/usr/src/debug/gawk-5.3.0-1
> >>>>>> -fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.3.0-1.x86_64/src/gawk-5.3.0=/usr/src/debug/gawk-5.3.0-1
> >>>>>> -DNDEBUG
> >>>>>> uname output: CYGWIN_NT-10.0-22631 TournaMart_2023 3.5.3-1.x86_64
> >>>>>> 2024-04-03 17:25 UTC x86_64 Cygwin
> >>>>>> Machine Type: x86_64-pc-cygwin
> >>>>>>
> >>>>>> Gawk Version: 5.3.0
> >>>>>>
> >>>>>> Attestation 1:
> >>>>>>         I have read
> >>>>>> https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
> >>>>>>         Yes
> >>>>>>
> >>>>>> Attestation 2:
> >>>>>>         I have not modified the sources before building gawk.
> >>>>>>         True
> >>>>>>
> >>>>>> Description:
> >>>>>>         The latest POSIX ERE spec
> >>>>>> (https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_04_06)
> >>>>>> says:
> >>>>>>         ----
> >>>>>>         Each of the duplication symbols ('+', '*', '?', and intervals) 
> >>>>>> can
> >>>>>> be suffixed by the repetition modifier '?' (<question-mark>), in which
> >>>>>> case matching behavior for that repetition shall be changed from the
> >>>>>> leftmost longest possible match to the leftmost shortest possible 
> >>>>>> match,
> >>>>>> including the null match (see A.9 Regular Expressions ). For example,
> >>>>>> the ERE ".*c" matches up to and including the last character ('c') in
> >>>>>> the string "abc abc", whereas the ERE ".*?c" matches up to and 
> >>>>>> including
> >>>>>> the first character 'c', the third character in the string.
> >>>>>>         ----
> >>>>>>         Gawk doesn't do that (yet) but I assume you're already aware 
> >>>>>> of it
> >>>>>> and so this is probably more of a "do you plan to support it and, if 
> >>>>>> so,
> >>>>>> what's the current target release?" than a real bug report.
> >>>>>>
> >>>>>> Repeat-By:
> >>>>>>
> >>>>>>         $ echo 'abc abc' | awk '{sub(/b.*c/,"")} 1'
> >>>>>>         a
> >>>>>>
> >>>>>>         $ echo 'abc abc' | awk '{sub(/b.*?c/,"")} 1'
> >>>>>>         a
> >>>>>>
> >>>>>>         "1" above is correct but "2" should output "a abc" per the new
> >>>>>> POSIX spec.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]