Re: "Font-lock is limited to text matching" is a myth

i very much second this! PEG's the next level of regex, and i expect it to replace regex in some sense in the coming years for the whole field of text processing.

there are currently 2 of them in elisp as far as i know:

* http://www.emacswiki.org/cgi-bin/wiki/ParserCompiler (2008) by Mike Mattie.

* http://www.emacswiki.org/emacs/ParsingExpressionGrammars (2008) by Helmut Eller.

it'd be much better if PEG is integrated from the ground up in elisp, possibly implemented in C or from other libs for speed. I imagine functions that takes a regex can have a version with PEG.

Xah

On Tue, Aug 11, 2009 at 11:43 PM, Miles Bader <address@hidden> wrote:

"Eric M. Ludlam" <address@hidden> writes:
> As far as how to define tables for a parsing system written in C, an
> old-school solution is to just use the flex/bison engines under the
> Emacs Lisp API. There are a lot of new parser generator systems
> though, and I don't really know what the best one might be.
>

> One of the hairier parts of the CEDET parser is the lexical analyzer.

Slightly off-topic, but I'm a huge fan of "LPeg" [1], which is a
pattern-matching library for Lua, based on Parsing _expression_ Grammars
(PEGs).

I've always wished for something like LPeg in elisp, and since Lua is at
heart quite lisp-like (despite the very different syntax), I think it
could work very well. Maybe it wouldn't be too hard to adapt LPeg's
core to elisp (it's licensed under the BSD license).

[There's a popular implementation technique for PEGs called "packrat
parsers", and many PEG libraries use that technique -- however
apparently packrat parsers have some serious problems in practice, so
LPeg uses a different technique. See [2] for a discussion of this, and
of the LPeg implementation in detail.]

Some nice things about LPeg:

(1) It's very fast.

(2) It's very concise; for typical usage, it's essentially like
writing a parser in yacc or whatever.

(3) It makes it trivial to insert code and hooks at any point in the
parse; not just "actions", but code that can determine how the
parsing happens. This give a _huge_ amount of flexibility.

(4) It's very easy to "think about", despite the flexibility and
presence of arbitrary code driving parsing, because it works kind
of like a recursive descent parser, operating greedily (but
provides mechanisms to do automatic backtracking when necessary).

(5) Because it's so fast and flexible, typical practice is to _not_
have a separate lexical analyzer, but just do lexical analysis in
the parser. This easier and more convenient, and also makes it
easier to use parser information in lexical analysis (e.g., the
famous "typedef" vs. "id" issue in C parsers).

(6) It's very small -- the entire implementation (core engine and Lua
interface) is only 2000 lines of C.

[The standard way to use LPeg in Lua uses Lua's ability to easily
overload standard operators, giving them LPeg-specific meanings when
invoked on first-class "pattern" objects. That can't be done in elisp,
but I think a more lispy approach should be easy.]

[1] http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html

[2] http://www.inf.puc-rio.br/~roberto/docs/peg.pdf

-Miles

--
Zeal, n. A certain nervous disorder afflicting the young and inexperienced.

From:	Xah Lee
Subject:	Re: "Font-lock is limited to text matching" is a myth
Date:	Wed, 12 Aug 2009 04:28:51 -0700