Re: Scan of regexps in Emacs (March 17)

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Scan of regexps in Emacs (March 17)

From:	Paul Eggert
Subject:	Re: Scan of regexps in Emacs (March 17)
Date:	Tue, 2 Apr 2019 00:33:28 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

Mattias Engdegård wrote:
>

don't we also need a precise description of exactly how they are interpreted by 
the engine?

In other parts of Emacs, we are typically OK with specs that don't completelyspecify behavior. This gives us more freedom to make changes in the undocumentedbehavior later. I think it makes sense to do that here too, for regularexpressions like "[z-a-m]" that most readers would find confusing.

I'm with Stefan here; `-' should go last. Anything else is a gritty detail.

Stefan already changed the doc in master to say that. The attached patchtightens up the wording (and still says that "-" should go last).

Documenting differences from POSIX regexps is useful. Do you prefer having 
those differences being spread out, or all concentrated into one section?

I don't have a strong preference. I wrote it concentrated originally, and thatform seems to work well.

These days, a user may be more familiar with the various PCRE dialects than 
traditional or extended POSIX. Should that be taken into account?

It might be helpful. However, PCRE is further away from Emacs regexps than POSIXis, and a comparison of PCRE and POSIX regexps is probably best put into adifferent section. It's not a section I'd like to write, to be honest; PCRE ispretty hairy.

The terminology is a bit confusing. Is 'raw 8-bit byte' included in 'unibyte'? 
Is \x7f ever a raw 8-bit byte?
I agree that [å-\xff], say, should be invalid but I've never seen such 
constructs.

After looking into it I realized that I don't really know the semantics here(the text I recently added there seems to be wrong, in some cases), and I havemy doubts that anyone else knows the semantics either. The attached patch simplygets rid of that section, leaving the area undocumented. User beware!

It already does, and some bugs were found that way. As a special case, it no 
longer complains about z-a because that is unlikely to be an accident and 
occurs in some code on purpose.

OK, then we should document z-a as the preferred syntax (best go with theflow...). Done in the attached patch.

As an experiment, I added detection of 'chained' ranges like [a-m-z] to xr and 
found a handful in both Emacs and GNU ELPA, but none of them carried a freeload 
of bugs. Keeping that check didn't seem worthwhile; the regexps may be a bit 
odd-looking, but aren't wrong.

It depends on what one means by "wrong". If one wants to use the ranges in bothEmacs and grep they are "wrong", so it's reasonable for the manual to recommendagainst them.

a rule finding [X-Y] where Y=X+1 found one or two questionable cases in a sea 
of false positives (also in the attachment).

It might also help for the trawler to warn about [X-Z] where Z = X+2. [XYZ] isclearer and less error-prone than [X-Z]. I shoehorned that into the attachedpatch too.

0001-More-regexp-advice-and-clarifications.patch
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Scan of regexps in Emacs (March 17), Paul Eggert <=
- Re: Scan of regexps in Emacs (March 17), Mattias Engdegård, 2019/04/02
  - Re: Scan of regexps in Emacs (March 17), Noam Postavsky, 2019/04/02
    - Re: Scan of regexps in Emacs (March 17), Mattias Engdegård, 2019/04/02
  - Re: Scan of regexps in Emacs (March 17), Stefan Monnier, 2019/04/02
  - Re: Scan of regexps in Emacs (March 17), Paul Eggert, 2019/04/02
    - Re: Scan of regexps in Emacs (March 17), Eli Zaretskii, 2019/04/03
    - Re: Scan of regexps in Emacs (March 17), Paul Eggert, 2019/04/03
    - Re: Scan of regexps in Emacs (March 17), Mattias Engdegård, 2019/04/06
    - Re: Scan of regexps in Emacs (March 17), Michael Albinus, 2019/04/07
    - Re: Scan of regexps in Emacs (March 17), Paul Eggert, 2019/04/07

Prev by Date: Drawing dirty rectangles with expose_window: row->clip = fr
Next by Date: Re: TRAMP VC optimization fails: non-TRAMP filenames handled incorrectly in async operations.
Previous by thread: Drawing dirty rectangles with expose_window: row->clip = fr
Next by thread: Re: Scan of regexps in Emacs (March 17)
Index(es):
- Date
- Thread