[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regexp does not work as documented

From: Thomas Lord
Subject: Re: regexp does not work as documented
Date: Mon, 12 May 2008 08:55:14 -0700
User-agent: Thunderbird (X11/20060808)

Stefan Monnier wrote:
That's what I do in lex.el.

Sounds nice.
Last bits of experience report, then:

If it isn't so already, it may be easy to make it so
that a choice of which DFA is being used, plus a choice of the
"current state" can be represented as lisp objects and cheaply
copied.  That gives the essence of "regular expression continuations".

Handy features that shouldn't be difficult to add (if not present):

Let programmers specify "labels" for each NFA state and then,
for each DFA state, have either a list of all NFA labels that
correspond to that DFA state and/or a more general way to
"combine" NFA state labels to make the DFA label.  You can
wind up with many NFA states combined to a single DFA state,
of course, so a "combine" function might be important.

Include scanning functions to:

~ advance the DFA at most N characters (or until failure)
~ advance the DFA to the next non-nil state label (or failure)

In both cases, give a way for lisp programs to get back not only
the label (or failure indication) but also the regular expression

Those features are handy so that (for example) lisp programs can
hang a suspended regexp continuation on a buffer character as
a property, doing incremental "re-lexing" in application-specific

The "advance to non-nil label" feature is useful for writing lisp
programs that *do not* need back-referencing or sub-exp locations
per se.

It is a bit more speculative but also consider functions to:

~ advance the state of a DFA based on characters provided
  in a function call rather than read from a buffer -- e.g., a
  buffer position should not have to be part of the state of a
running DFA. (advance-dfa re-continuation chr) => re-continuation

Why that last one?  Because then you can probably use the same
DFA engine as the heart of a shift-reduce parser and (for languages
that admit such things) write an incremental parser.  (You'd be using
non-buffer-position DFAs to process token ids emitted by the lexer.)
You can also use such a feature for things like serial I/O protocols.

Incremental parsers open the door to robust "syntax directed editing"
which I think could be an exciting direction for IDE features to take.
(Years ago, Thomas Reps and Tim Teitelbaum worked on the "Synthesizer
Generator" which I recall had features along these lines (their parser
guts were probably different from what I suggest).  As I (now vaguely)
recall there is a book that talks about their Emacs-based implementation.)

Bye.  Thanks.  And good luck!

reply via email to

[Prev in Thread] Current Thread [Next in Thread]