[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spec
From: |
Ed Morton |
Subject: |
Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spec |
Date: |
Mon, 19 Aug 2024 07:59:23 -0500 |
User-agent: |
Mozilla Thunderbird |
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: cygwin
Compiler: gcc
Compilation CFLAGS: -ggdb -O2 -pipe -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -fstack-protector-strong
--param=ssp-buffer-size=4
-fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.3.0-1.x86_64/build=/usr/src/debug/gawk-5.3.0-1
-fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.3.0-1.x86_64/src/gawk-5.3.0=/usr/src/debug/gawk-5.3.0-1
-DNDEBUG
uname output: CYGWIN_NT-10.0-22631 TournaMart_2023 3.5.3-1.x86_64
2024-04-03 17:25 UTC x86_64 Cygwin
Machine Type: x86_64-pc-cygwin
Gawk Version: 5.3.0
Attestation 1:
I have read
https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
Yes
Attestation 2:
I have not modified the sources before building gawk.
True
Description:
The latest POSIX ERE spec
(https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_04_06)
says:
----
Each of the duplication symbols ('+', '*', '?', and intervals) can
be suffixed by the repetition modifier '?' (<question-mark>), in which
case matching behavior for that repetition shall be changed from the
leftmost longest possible match to the leftmost shortest possible match,
including the null match (see A.9 Regular Expressions ). For example,
the ERE ".*c" matches up to and including the last character ('c') in
the string "abc abc", whereas the ERE ".*?c" matches up to and including
the first character 'c', the third character in the string.
----
Gawk doesn't do that (yet) but I assume you're already aware of it
and so this is probably more of a "do you plan to support it and, if so,
what's the current target release?" than a real bug report.
Repeat-By:
$ echo 'abc abc' | awk '{sub(/b.*c/,"")} 1'
a
$ echo 'abc abc' | awk '{sub(/b.*?c/,"")} 1'
a
"1" above is correct but "2" should output "a abc" per the new
POSIX spec.
- Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spec,
Ed Morton <=
- Re: Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spec, arnold, 2024/08/19
- Re: Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spec, Ben Hoyt, 2024/08/19
- Re: Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spec, Ed Morton, 2024/08/19
- Re: Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spec, arnold, 2024/08/20
- Re: Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spec, Ed Morton, 2024/08/20
- Re: Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spec, arnold, 2024/08/20
- Re: Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spec, Ed Morton, 2024/08/21
- Re: Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spec, arnold, 2024/08/21