--- Begin Message ---
Subject: |
rx: (or ...) order unpredictable |
Date: |
Sun, 24 Feb 2019 19:40:33 +0100 |
The rx (or ...) construct sometimes reorders its subexpressions, which makes
its semantics unpredictable. For example,
(rx (or "ab" "a") (or "a" "ab"))
=>
"\\(?:ab?\\)\\(?:ab?\\)"
The user reasonably expects (or e1 e2) to translate to E1\|E2, where ei
translates to Ei, or a semantic equivalent. Not having this control makes rx
useless or dangerous for many purposes.
The reason for the reordering is the use of regex-opt behind the scenes.
Whether rx is the place to do this kind of optimisation is a matter of opinion;
mine is that it belongs in the regexp engine, together with other, more
aggressive optimisations (DFA, native-code generation, etc) could be performed
as well.
We could determine whether any string is a prefix of another. If not,
regexp-opt should be safe to call. Alternatively, this check could be done in
regexp-opt (activated by a flag). That would be my preferred short-term
solution.
(Speaking of regexp-opt, it has another bug that does not affect rx: it returns
the empty string if given an empty list of strings. The correct return value is
a regexp that never matches anything. Fix it, document it, or turn it into an
error?)
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#34641: rx: (or ...) order unpredictable |
Date: |
Sat, 2 Mar 2019 15:37:26 +0100 |
2 mars 2019 kl. 15.23 skrev Eli Zaretskii <address@hidden>:
>
> LGTM, thanks.
Thank you, pushed.
--- End Message ---