[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Support for leftmost-shortest matches in EREs in POSIX-2024
From: |
Bruno Haible |
Subject: |
Re: Support for leftmost-shortest matches in EREs in POSIX-2024 |
Date: |
Tue, 20 Aug 2024 16:39:31 +0200 |
Ed Morton wrote:
> I asked in the bug-gawk mailing list (see
> https://lists.gnu.org/archive/html/bug-gawk/2024-08/msg00026.html)
> if/when GNU awk would support the `.*?` ERE construct described in the
> latest POSIX spec
> (https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_04_06)
>
> as:
>
> ----
> Each of the duplication symbols ('+', '*', '?', and intervals) can be
> suffixed by the repetition modifier '?' (<question-mark>), in which case
> matching behavior for that repetition shall be changed from the leftmost
> longest possible match to the leftmost shortest possible match,
> including the null match (see A.9 Regular Expressions ). For example,
> the ERE ".*c" matches up to and including the last character ('c') in
> the string "abc abc", whereas the ERE ".*?c" matches up to and including
> the first character 'c', the third character in the string.
> ----
>
> and was told:
>
> ----
> Support for this feature can't happen unless and until GNU regex and GNU
> dfa, which are both part of Gnulib, support it. So you might consider
> asking on the bug-gnulib list what their plans are for it.
> ----
>
> so - what are the plans, if any, for supporting that functionality?
What Arnold said [1], holds for me as well: It's well beyond my capabilities.
All I could help with are a test suite and some configure tests.
Bruno
[1] https://lists.gnu.org/archive/html/bug-gawk/2024-08/msg00030.html