emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Make peg.el a built-in library?


From: Augusto Stoffel
Subject: Re: Make peg.el a built-in library?
Date: Sun, 26 Sep 2021 12:59:03 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

I think it would be really cool to have PEGs built into Emacs.  Things
like json.el could be simplified by at least (log10 2) orders of
magnitude with PEGs.  Whatever the use case of `rx' is, PEGs are
probably the "real" solution.

But I suspect this would only take traction with a fast and robust C
implementation like Lua's LPEG (see below for a reason).

On Fri, 27 Aug 2021 at 08:41, Helmut Eller <eller.helmut@gmail.com> wrote:

> On Thu, Aug 26 2021, Eric Abrahamsen wrote:

>
>> Whoo, I've been trying to get enough of a handle on the parsing actions
>> to write a documentation patch for them -- now I'm seeing what Helmut
>> meant by "semantically unintuitive".
>
> What I actually meant with "semantically unintuitive" are issues
> described in Roman Redziejowski's "Trying to understand PEG" paper[*].
> He writes:
>
>    The problem with limited backtracking is that by not trying hard it
>    may miss some inputs that it should accept.  A notorious example is
>    the rule A = aAa | aa that defines the set of strings of a’s of even
>    length. Implemented with limited backtracking, this rule accepts only
>    strings of length 2^n.

When I started to write PEGs intensively, I thought the limited
backtracking would be a problem.  It's not.  In fact, I find the
regexp-style backtracking great, but only for “quick and dirty” things
(e.g., those throw-away little programs one writes for grep or isearch).
But if are trying to write a more complex parser, aggressive
backtracking actually gets in the way.

The example above is kind of silly.  You can parse an even number of a's
with the rule A = aaA | ε.  This is still kind of bad, because (unless
peg.el is way fancier than I'm imagining), it consumes the call stack.
LPEG has a kind of “tail call optimization” that allows you to do this.

Obviously, the sane way to parse an even number of a's is the rule
(aa)*, aka (* "aa").  But there are many justifiable use-cases for the
tail call optimization.  For instance, given a pattern P, produce a new
pattern that looks ahead for the first match of P.  This would be
P | .P, or

    (or P (and (any) P))

in peg.el notation.  Is there a simple an efficient way to do this in
peg.el, that allows to skip over thousands of characters without a new
call stack entry for each one of them?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]