Re: Scan of regexps in Emacs (March 17)

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Scan of regexps in Emacs (March 17)

From:	Paul Eggert
Subject:	Re: Scan of regexps in Emacs (March 17)
Date:	Wed, 20 Mar 2019 15:01:51 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.3

On 3/19/19 7:20 PM, Stefan Monnier wrote:
> I wonder why the doc doesn't just say that `-` should be the last
> character and not mention the other possibilities which just make the
> rule unnecessarily complex.

'-' can also be the first character in a regular expression; this is
pretty common and is standard. POSIX also says '-' can be the upper
bound of a range, which is a bit weird (but hey! it's standard).

I went through the documentation and attempted to fix the doc to
describe this mess better by installing the attached patch into the
emacs-26 branch. The basic ideas are:

* The doc already says that regular expressions like "*foo" and "+foo"
are problematic (they're confusing, and POSIX says the behavior is
undefined) and should be avoided. REs like "[a-m-z]" and "[!-[:alpha:]]"
and "[[:alpha:]-~]" are problematic in the same way and also should be
avoided.

* The doc doesn't clearly say when the Emacs range behavior is an
extension to POSIX; saying this will help people know better when they
can export Emacs regular expressions to other programs.

* The doc is confused (and there's a comment about this) about what
happens when one end of a range is unibyte and the other is multibyte. I
added something saying that if one bound is a raw 8-bit byte then the
other should be a unibyte character (either ASCII, or a raw 8-bit byte).
I don't see any good way to specify the behavior when one bound is a raw
8-bit byte and the other bound is a multibyte character, in such a way
that it's a natural extension of the documented behavior, so the
documentation now recommends against that.

* We might as well go ahead and say that [b-a] matches nothing, as
enough code (ab)uses regexps in that way, and there is value in having a
simple regular expression that always fails to match. However, I expect
that we should say that users should avoid wilder examples like [~-!] so
that the trawler can catch them as typos.

These new recommendations ("should"s in the attached patch) will give
the trawler license to diagnose questionable REs like "[a-m-z]",
"[!-[:alpha:]]", "[~-!]", and (my favorite) "[\u00FF-\xFF]". There is no
change to actual Emacs behavior.

0001-Say-which-regexp-ranges-should-be-avoided.patch
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

Scan of regexps in Emacs (March 17), Mattias Engdegård, 2019/03/17
- Re: Scan of regexps in Emacs (March 17), Paul Eggert, 2019/03/18
  - Re: Scan of regexps in Emacs (March 17), Mattias Engdegård, 2019/03/19
    - Re: Scan of regexps in Emacs (March 17), Paul Eggert, 2019/03/19
    - Re: Scan of regexps in Emacs (March 17), Stefan Monnier, 2019/03/19
    - Re: Scan of regexps in Emacs (March 17), Paul Eggert <=
    - RE: Scan of regexps in Emacs (March 17), Drew Adams, 2019/03/20
    - Re: Scan of regexps in Emacs (March 17), Paul Eggert, 2019/03/20
    - Re: Scan of regexps in Emacs (March 17), Eli Zaretskii, 2019/03/20
    - RE: Scan of regexps in Emacs (March 17), Drew Adams, 2019/03/21
    - Re: Scan of regexps in Emacs (March 17), Eli Zaretskii, 2019/03/21
    - Re: Scan of regexps in Emacs (March 17), Stefan Monnier, 2019/03/20
    - Re: Scan of regexps in Emacs (March 17), Mattias Engdegård, 2019/03/21
    - Re: Scan of regexps in Emacs (March 17), Richard Stallman, 2019/03/20
    - Re: Scan of regexps in Emacs (March 17), Stephen Leake, 2019/03/22
    - Re: Scan of regexps in Emacs (March 17), Mattias Engdegård, 2019/03/22

Prev by Date: Re: [Emacs-diffs] master b0e318d 2/2: Score flex-style completions according to match tightness
Next by Date: Re: pcase and the unpopular backquote pattern
Previous by thread: Re: Scan of regexps in Emacs (March 17)
Next by thread: RE: Scan of regexps in Emacs (March 17)
Index(es):
- Date
- Thread