help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-greedy wildcard possible? (Long)


From: Hans Aberg
Subject: Re: Non-greedy wildcard possible? (Long)
Date: Thu, 20 May 2004 23:44:32 +0200

At 18:40 +0200 2004/05/20, Magnus Lie Hetland wrote:
>> Why is your LL(1) parser slow?
>
>Because my implementation isn't really performance-oriented, I guess
>(and, partly, because it's implemented in Python). And because I
>ended up with lots of separate regexps to search for, among other
>things. So my original plan for optimization was to use Flex for the
>tokenization, and to build my LL(1) parser on top of that. Still a
>viable option, I think.
>
>> Is it because it is written in Python, or because it is
>> non-deterministic?
>
>It's completely deterministic, actually.

A parser written in a non-deterministic style may be slow, even though the
actual parsing is deterministic: The Mini-Prolog that comes with Hugs
http://haskell.org/hugs is slow, even when translated into C++. When I
altered the parser to use Flex/Bison, it suddenly became lightening fast.
So it seems that the slowness was entirely due to the parser written in a
non-deterministic style, even though the parsing in both cases where
deterministic.

>> And what problems appear when you translate it into a LALR(1)
>> grammar?
>
>The problem is that my parser is really not straight LL(1), but
>"LL(1)-like" in that it always uses the first occurrence of a legal
>lookahed token in the text. This is quite easy when I have full
>control over parser and lexer (and especially easy with the LL(1)
>parsing strategy). I suppose using LR(1) might work as well if I can
>make it work in this somewhat non-standard way.

So then you must tweak your LL(1) parser in order to filter out legal
tokens, because otherwise you would only need to feed the Bison parser with
the same legal tokens. What exactly kind of tweaks that you use in the
LL(1) setup do not work in the LALR(1) setup? Specifically, if I go back to
your first post:
>The simple (but possibly messy, implementation-wise) solution would be
>to analyse the grammar and create specific wildcard rules for each
>plain text segment, allowing only the tokens that can't start any
>legal rules. This would be sort of an LL(1) way if doing it, and it
>would require some possibly quite hairy grammar analysis
>(especially with empty rules).

You clearly can't create wildcard rules with Bison, but you say here you
have found an LL(1) way to do it. What kind of tweak would you need in the
LARL(1) parser?

>It seems that making
>Flex/Bison do this isn't altogether trivial (as has been demonstrated
>repeatedly in these discussions).

Because such tweaks are very difficult to do with LR parsers, and probably
with other more generator parsing algorithms, I tend to think that your
setup should really be
  ATOX language description
    --> semantic grammar (i.e., grammar plus action translation descriptions)
    --> parser (via parser generator)
That is, your intuitive ATOX matching rules should be made more alike
grammar rules with actions via some more rigid mathematical analysis (which
is probably a difficult problem). In doing this, the original intuitive
ATOX formats may have to be changed when they become "grammarized".

  Hans Aberg






reply via email to

[Prev in Thread] Current Thread [Next in Thread]