[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Non-greedy wildcard possible? (Long)
From: |
Magnus Lie Hetland |
Subject: |
Re: Non-greedy wildcard possible? (Long) |
Date: |
Fri, 21 May 2004 00:55:51 +0200 |
User-agent: |
Mutt/1.4i |
Hans Aberg <address@hidden>:
>
[snip]
> >It's completely deterministic, actually.
>
> A parser written in a non-deterministic style may be slow, even
> though the actual parsing is deterministic:
I'm sure -- but it's written in a deterministic, O(n) style. I only
use backtracking when necessary, in explicitly specified portions of
the grammar. That's not the issue. (Unless the non-deterministic style
you're talking about is something else?)
[snip]
> So then you must tweak your LL(1) parser in order to filter out legal
> tokens, because otherwise you would only need to feed the Bison parser with
> the same legal tokens. What exactly kind of tweaks that you use in the
> LL(1) setup do not work in the LALR(1) setup?
As long as I only use a lookahead of 1, there should be no problem.
Then the next token will always be the first occurrence of a legal
token in the input. So, as I said, as long as I can get an LR(1) (or
LALR(1)) parser to behave this way (either by implementing it myself
or by somehow tweaking Bison/Flex) there shouldn't be a problem.
> Specifically, if I go back to your first post:
> >The simple (but possibly messy, implementation-wise) solution would be
> >to analyse the grammar and create specific wildcard rules for each
> >plain text segment, allowing only the tokens that can't start any
> >legal rules. This would be sort of an LL(1) way if doing it, and it
> >would require some possibly quite hairy grammar analysis
> >(especially with empty rules).
Yeah -- this is if I want to create a CFG which includes the
wildcards, and use a standard parser out of the box.
> You clearly can't create wildcard rules with Bison,
Bison would work as well as any other parser if I managed to create a
standard CFG after adding the wildcards (as described above). If I
disallow empty rules and insist that each production is terminated by
a token/terminal, that shouldn't be a problem. Otherwise, it probably
wouldn't be too easy...
> but you say here you have found an LL(1) way to do it. What kind of
> tweak would you need in the LARL(1) parser?
(1) Only ask for one token of lookahead before deciding what to do (so
you don't risk asking for a token that "belongs" to a different
context/production) and (2) make the lexer return the first occurrence
of a legal token (as defined by the lookahead set).
I thought this would be easy with Flex/Bison, but it seems Bison
somehow asks for more than one lookahead token, which would foul it
up. (This is the "easy" solution -- the "hard" version of the problem
would be to have a full CFG-specified terminator expression, rather
than a single token. I've abandoned that, though.)
[snip]
> Because such tweaks are very difficult to do with LR parsers, and probably
> with other more generator parsing algorithms, I tend to think that your
> setup should really be
> ATOX language description
> --> semantic grammar (i.e., grammar plus action translation descriptions)
> --> parser (via parser generator)
> That is, your intuitive ATOX matching rules should be made more alike
> grammar rules with actions via some more rigid mathematical analysis (which
> is probably a difficult problem).
Yeah, that's the approach I describe above, i.e. turning the
"wildcard" portions into CFG fragments -- basically
some_wildcard: TOKEN1 | TOKEN2 | TOKEN3 ...
I.e. all the tokens that can't terminate the wildcard. If the wildcard
can end a rule or can be terminated by several possible productions,
or if there are empty productions, this isn't trivial, and the
transformed grammar will be too strict. I guess I could restrict the
format language in order to make this translation easier.
> In doing this, the original intuitive ATOX formats may have to be
> changed when they become "grammarized".
Yup.
> Hans Aberg
--
Magnus Lie Hetland "Wake up!" - Rage Against The Machine
http://hetland.org "Shut up!" - Linkin Park
- Re: Non-greedy wildcard possible? (Long), (continued)
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/16
- Re: Non-greedy wildcard possible? (Long), Hans Aberg, 2004/05/17
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/17
- Re: Non-greedy wildcard possible? (Long), Hans Aberg, 2004/05/18
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/18
- Re: Non-greedy wildcard possible? (Long), Hans Aberg, 2004/05/19
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/19
- Re: Non-greedy wildcard possible? (Long), Hans Aberg, 2004/05/20
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/20
- Re: Non-greedy wildcard possible? (Long), Hans Aberg, 2004/05/20
- Re: Non-greedy wildcard possible? (Long),
Magnus Lie Hetland <=
- Re: Non-greedy wildcard possible? (Long), Hans Aberg, 2004/05/23
- Re: Non-greedy wildcard possible? (Long), Frank Heckenbach, 2004/05/27
- Re: Non-greedy wildcard possible? (Long), Hans Aberg, 2004/05/20
- Re: Non-greedy wildcard possible? (Long), Frank Heckenbach, 2004/05/28
- Re: Non-greedy wildcard possible? (Long), Frank Heckenbach, 2004/05/26
- Re: Non-greedy wildcard possible? (Long), Hans Aberg, 2004/05/19
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/19
- Re: Non-greedy wildcard possible? (Long), Hans Aberg, 2004/05/20
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/20
- Re: Non-greedy wildcard possible? (Long), Hans Aberg, 2004/05/21