bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spec


From: Ed Morton
Subject: Support for '.*?' meaning leftmost-shortest per latest POSIX ERE spec
Date: Mon, 19 Aug 2024 07:59:23 -0500
User-agent: Mozilla Thunderbird

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: cygwin
Compiler: gcc
Compilation CFLAGS: -ggdb -O2 -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.3.0-1.x86_64/build=/usr/src/debug/gawk-5.3.0-1 -fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.3.0-1.x86_64/src/gawk-5.3.0=/usr/src/debug/gawk-5.3.0-1 -DNDEBUG uname output: CYGWIN_NT-10.0-22631 TournaMart_2023 3.5.3-1.x86_64 2024-04-03 17:25 UTC x86_64 Cygwin
Machine Type: x86_64-pc-cygwin

Gawk Version: 5.3.0

Attestation 1:
    I have read https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
    Yes

Attestation 2:
    I have not modified the sources before building gawk.
    True

Description:
    The latest POSIX ERE spec (https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_04_06) says:
    ----
    Each of the duplication symbols ('+', '*', '?', and intervals) can be suffixed by the repetition modifier '?' (<question-mark>), in which case matching behavior for that repetition shall be changed from the leftmost longest possible match to the leftmost shortest possible match, including the null match (see A.9 Regular Expressions ). For example, the ERE ".*c" matches up to and including the last character ('c') in the string "abc abc", whereas the ERE ".*?c" matches up to and including the first character 'c', the third character in the string.
    ----
    Gawk doesn't do that (yet) but I assume you're already aware of it and so this is probably more of a "do you plan to support it and, if so, what's the current target release?" than a real bug report.

Repeat-By:

    $ echo 'abc abc' | awk '{sub(/b.*c/,"")} 1'
    a

    $ echo 'abc abc' | awk '{sub(/b.*?c/,"")} 1'
    a

    "1" above is correct but "2" should output "a abc" per the new POSIX spec.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]