[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GSOC PEG project

From: Michael Lucy
Subject: Re: GSOC PEG project
Date: Sun, 11 Jul 2010 02:48:39 -0500

On Thu, Jul 8, 2010 at 11:21 AM, Andy Wingo <address@hidden> wrote:
> Hi Michael,
> On Tue 06 Jul 2010 00:59, Michael Lucy <address@hidden> writes:
>> (use-modules (ice-9 peg))
>> (peg-find "'b'+" "aabbcc")
>> --> (2 4 "bb")
> Humm, another thing to think about: (ice-9 regex) returns "match
> structures", which are really just vectors; have a look at them, and if
> it makes sense to mimic that interface, re-exporting those accessors
> somehow, please do.

So, there are a few places where the interfaces don't quite match up:

1. match:substring
Problem: It's perfectly legal to pass peg-match a parsing nonterminal
and have it give you a parse tree rather than a substring.
Potential solutions:
1.a. Just have match:substring return either the substring or the
parse tree.  The problem with this is that it may violate
1.b. Have match:substring collapse the parse tree into a string, and
have another function match:parse-tree that would return the parse
tree.  The problem with this is that the parse tree might have
discarded text, which would once again violate expectations.

2. match:count
Problem: the notion of a "parenthesized sub-expression" doesn't really
map cleanly onto PEGs.  This information isn't tracked while parsing
and wouldn't be very meaningful.
Potential solutions:
2.a. Ignore it (not that bad a solution).
2.b. Track that information.  I'd rather not do this because it would
slow down the parser and wouldn't be very useful.  Which brings us

3. submatch numbers
Problem: there's no notion of a "submatch" right now.  People should
be getting this information by building a parsing nonterminal and then
traversing the resulting parse-tree.  I'd rather not wire in a whole
separate system just to provide an alternative way of getting
information about what parts of an expression matched what (it would
also slow down parsing).
Potential solutions:
3.a. Ignore it (would violate expectations in a big way).
3.b. Wire it in (I'd really rather not do this).

So, there would be some gaps if I shimmied the match structure
interface onto PEGs.

The problem is that, while it would be useful to have a consistent
interface for matching both regexps and PEGs, they're different things
and naming the accessor functions the same things might lead people to
assume things that aren't true.

So, three potential paths from here:
1. Mimic the match structure interface as much as possible.
2. Have a similar but differently-named "peg-match structure"
interface that behaves mostly the same but has a few different
functions (I think naming them something slightly different would lead
to fewer people assuming they worked exactly the same as match
3. Just having a different interface.

I'm leaning toward (2); what do other people think?  I'd probably:
1. Not have a peg-match:count function at all.
2. Not have the functions take submatch numbers.
3. Have peg-match:substring return the actual substring.
4. Have another function peg-match:parse-tree that returns the parse tree.

>> And when I use it with --no-autocompile I don't get any errors:
>> What does this mean?
> Probably some eval-when tomfoolery. See "Eval When" in the manual.
> Cheers,
> Andy
> --

reply via email to

[Prev in Thread] Current Thread [Next in Thread]