Hi.
I am aware of it. Support for this feature can't happen unless and until
GNU regex and GNU dfa, which are both part of Gnulib, support it.
So you might consider asking on the bug-gnulib list what their plans
are for it.
EVEN if those libraries support this feature, I may not add it;
I think this was a bad addition, and I'm quite certain that it's not on
the radar screen for almost any other version of awk.
Realistically, I wouldn't expect to see this appear any time soon.
Arnold
Ed Morton via "Bug reports only for gawk."<bug-gawk@gnu.org> wrote:
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: cygwin
Compiler: gcc
Compilation CFLAGS: -ggdb -O2 -pipe -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -fstack-protector-strong
--param=ssp-buffer-size=4
-fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.3.0-1.x86_64/build=/usr/src/debug/gawk-5.3.0-1
-fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.3.0-1.x86_64/src/gawk-5.3.0=/usr/src/debug/gawk-5.3.0-1
-DNDEBUG
uname output: CYGWIN_NT-10.0-22631 TournaMart_2023 3.5.3-1.x86_64
2024-04-03 17:25 UTC x86_64 Cygwin
Machine Type: x86_64-pc-cygwin
Gawk Version: 5.3.0
Attestation 1:
I have read
https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
Yes
Attestation 2:
I have not modified the sources before building gawk.
True
Description:
The latest POSIX ERE spec
(https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_04_06)
says:
----
Each of the duplication symbols ('+', '*', '?', and intervals) can
be suffixed by the repetition modifier '?' (<question-mark>), in which
case matching behavior for that repetition shall be changed from the
leftmost longest possible match to the leftmost shortest possible match,
including the null match (see A.9 Regular Expressions ). For example,
the ERE ".*c" matches up to and including the last character ('c') in
the string "abc abc", whereas the ERE ".*?c" matches up to and including
the first character 'c', the third character in the string.
----
Gawk doesn't do that (yet) but I assume you're already aware of it
and so this is probably more of a "do you plan to support it and, if so,
what's the current target release?" than a real bug report.
Repeat-By:
$ echo 'abc abc' | awk '{sub(/b.*c/,"")} 1'
a
$ echo 'abc abc' | awk '{sub(/b.*?c/,"")} 1'
a
"1" above is correct but "2" should output "a abc" per the new
POSIX spec.