help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-greedy wildcard possible? (Long)


From: Magnus Lie Hetland
Subject: Re: Non-greedy wildcard possible? (Long)
Date: Fri, 21 May 2004 00:55:51 +0200
User-agent: Mutt/1.4i

Hans Aberg <address@hidden>:
>
[snip]
> >It's completely deterministic, actually.
> 
> A parser written in a non-deterministic style may be slow, even
> though the actual parsing is deterministic:

I'm sure -- but it's written in a deterministic, O(n) style. I only
use backtracking when necessary, in explicitly specified portions of
the grammar. That's not the issue. (Unless the non-deterministic style
you're talking about is something else?)

[snip]
> So then you must tweak your LL(1) parser in order to filter out legal
> tokens, because otherwise you would only need to feed the Bison parser with
> the same legal tokens. What exactly kind of tweaks that you use in the
> LL(1) setup do not work in the LALR(1) setup?

As long as I only use a lookahead of 1, there should be no problem.
Then the next token will always be the first occurrence of a legal
token in the input. So, as I said, as long as I can get an LR(1) (or
LALR(1)) parser to behave this way (either by implementing it myself
or by somehow tweaking Bison/Flex) there shouldn't be a problem.

> Specifically, if I go back to your first post:
> >The simple (but possibly messy, implementation-wise) solution would be
> >to analyse the grammar and create specific wildcard rules for each
> >plain text segment, allowing only the tokens that can't start any
> >legal rules. This would be sort of an LL(1) way if doing it, and it
> >would require some possibly quite hairy grammar analysis
> >(especially with empty rules).

Yeah -- this is if I want to create a CFG which includes the
wildcards, and use a standard parser out of the box.

> You clearly can't create wildcard rules with Bison,

Bison would work as well as any other parser if I managed to create a
standard CFG after adding the wildcards (as described above). If I
disallow empty rules and insist that each production is terminated by
a token/terminal, that shouldn't be a problem. Otherwise, it probably
wouldn't be too easy...

> but you say here you have found an LL(1) way to do it. What kind of
> tweak would you need in the LARL(1) parser?

(1) Only ask for one token of lookahead before deciding what to do (so
you don't risk asking for a token that "belongs" to a different
context/production) and (2) make the lexer return the first occurrence
of a legal token (as defined by the lookahead set).

I thought this would be easy with Flex/Bison, but it seems Bison
somehow asks for more than one lookahead token, which would foul it
up. (This is the "easy" solution -- the "hard" version of the problem
would be to have a full CFG-specified terminator expression, rather
than a single token. I've abandoned that, though.)

[snip]
> Because such tweaks are very difficult to do with LR parsers, and probably
> with other more generator parsing algorithms, I tend to think that your
> setup should really be
>   ATOX language description
>     --> semantic grammar (i.e., grammar plus action translation descriptions)
>     --> parser (via parser generator)
> That is, your intuitive ATOX matching rules should be made more alike
> grammar rules with actions via some more rigid mathematical analysis (which
> is probably a difficult problem).

Yeah, that's the approach I describe above, i.e. turning the
"wildcard" portions into CFG fragments -- basically

  some_wildcard: TOKEN1 | TOKEN2 | TOKEN3 ...

I.e. all the tokens that can't terminate the wildcard. If the wildcard
can end a rule or can be terminated by several possible productions,
or if there are empty productions, this isn't trivial, and the
transformed grammar will be too strict. I guess I could restrict the
format language in order to make this translation easier.

> In doing this, the original intuitive ATOX formats may have to be
> changed when they become "grammarized".

Yup.

>   Hans Aberg

-- 
Magnus Lie Hetland              "Wake up!"  - Rage Against The Machine
http://hetland.org              "Shut up!"  - Linkin Park




reply via email to

[Prev in Thread] Current Thread [Next in Thread]