[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#37659: rx additions: anychar, unmatchable, unordered-or
From: |
Robert Pluim |
Subject: |
bug#37659: rx additions: anychar, unmatchable, unordered-or |
Date: |
Tue, 22 Oct 2019 17:27:48 +0200 |
>>>>> On Tue, 22 Oct 2019 17:14:08 +0200, Mattias Engdegård <mattiase@acm.org>
>>>>> said:
Mattias> 'regexp-opt' always generates a regexp preferring long matches.
This
Mattias> is undocumented, but useful enough that I would be surprised if
this
Mattias> property wasn't exploited (perhaps unknowingly) by callers. It's
quite
Mattias> natural: given a set of strings, surely the caller want them all
to be
Mattias> candidates for a match, even if there is no following anchoring
Mattias> pattern.
Mattias> Thus, instead of 'unordered-or', define the operator in terms of
long
Mattias> matches: 'or-max' (working name) would work like 'or' but
guarantee a
Mattias> longest match, and only permit strings and 'or-max' forms as
Mattias> arguments. Thus, the rx user gets all the benefits from
'regexp-opt'
Mattias> in a composable way, without a need to sort the strings or
otherwise
Mattias> prepare them.
Mattias> (The old 'or' behaviour always used 'regexp-opt' when possible,
which
Mattias> was very fragile: (or "a" "ab") would match "ab", but (or "a" "ab"
Mattias> digit) would just match "a". 'or-max' is robust, without
surprises.)
Mattias> Of course, we should also guarantee the maximum-matching property
of
Mattias> regexp-opt. This is just a matter of documentation (and test); it
does
Mattias> not restrict optimisations as far as I can tell.
Mattias> Again, I'm open to suggestions about a better name than 'or-max'.
or-greedy?
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Mattias Engdegård, 2019/10/08
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Mattias Engdegård, 2019/10/09
- bug#37659: Mattias Engdegård <mattiase <at> acm.org>, Paul Eggert, 2019/10/11
- bug#37659: Mattias Engdegård <mattiase <at> acm.org>, Mattias Engdegård, 2019/10/12
- bug#37659: Mattias Engdegård <mattiase <at> acm.org>, Paul Eggert, 2019/10/13
- bug#37659: Mattias Engdegård <mattiase <at> acm.org>, Mattias Engdegård, 2019/10/13
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Mattias Engdegård, 2019/10/22
- bug#37659: rx additions: anychar, unmatchable, unordered-or,
Robert Pluim <=
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Paul Eggert, 2019/10/22
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Mattias Engdegård, 2019/10/23
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Paul Eggert, 2019/10/23
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Drew Adams, 2019/10/23
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Mattias Engdegård, 2019/10/24
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Drew Adams, 2019/10/24
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Phil Sainty, 2019/10/24
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Drew Adams, 2019/10/24
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Mattias Engdegård, 2019/10/24
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Mattias Engdegård, 2019/10/27