help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing a language with optional spaces


From: Akim Demaille
Subject: Re: Parsing a language with optional spaces
Date: Mon, 13 Jul 2020 07:56:57 +0200


> Le 12 juil. 2020 à 19:47, Christian Schoenebeck <schoenebeck@crudebyte.com> a 
> écrit :
> 
>> Additionally,the programmer must fully understand how
>> and when the parser handles tokens, otherwise subtle bugsmay be introduced.
> 
> I don't see what they exactly had in mind here for that to claim.

Bison is "lazy" wrt fetching tokens, in the sense that if it can decide
what the next action is without looking at the lookahead, it won't fetch
one.  Besides, when it can, it merges "groups of similar reductions" under
the label of "default reduction".

So on situations that are similar, the parser may have fetched the lookahead
before the reduction(s), or after.  Hence if you want tight coupling bw
the scanner and the parser, you need to be aware of this and know when
this will happen.

That's what lr.default-reduction controls.
(https://www.gnu.org/software/bison/manual/html_node/Default-Reductions.html)


> Note that the latter example would of course [require to] work differently 
> than the common greedy and context unaware scanner:

Yes, indeed.  That's something which Joel detailed in his PhD.  Note
though that PSLR targets deterministic parsing.  Bison would (first)
target that, not GLR.  But contributions are always welcome :)


> Besides being more easy for the developer to understand and write, and the 
> language definition being easier to read and maintain, it would also have 
> other benefits:
> 
> The scanner steps could be optimized as well: If scanners are unaware (like 
> they are today) of any grammatical context, they must assume that any pattern 
> may match at any time. Hence the lexical automat typically runs much longer, 
> on a much more complex state machine than it needs to be, as a context 
> unaware 
> scanner always must assume that any of the globally defined tokens might 
> match, even though in a specific grammatical context only e.g. 1 or 2 tokens 
> may actually be possible being matched.

You might be very right here, but I don't remember well enough of the
details.

> Akim, question regarding Bison's GLR parser ...
> https://www.gnu.org/software/bison/manual/html_node/Simple-GLR-Parsers.html :
> 
>> In general, a GLR parser can take quadratic or cubic worst-case time, and
>> the current Bison parser even takes exponential time and space for some
>> grammars. In practice, this rarely happens, and for many grammars it is
>> possible to prove that it cannot happen.
> 
> ... why isn't it O(n^3)? Or more specifically, which common GLR fork 
> optimization(s) are yet missing?

This is fuzzy in my memory :(  It might refer to what follows.

Parsers such as SGLR have stacks which are graphs, while Bison's is
always a tree.  All GLR parsers manage to share the past of concurrent
stacks, they don't have a set of separate stacks.  Bison's GLR stacks,
of course, share their common past, it's stored only once.  I believe
SGLR is also able to share the common "future", so their stacks are
no longer trees.  This allows SGLR to keep decent performances when parsing
ambiguous grammars and you want to fetch the resulting parse forest.


> And BTW:
> 
>> The GLR parsers require a compiler for ISO C89 or later.
> 
> Tough requirement! ;-)

Yes, we could get rid of that mention, indeed.  Besides, Valentin recently
pointed me to one very discreet place where glr.c actually had a declaration
after a statement, so it was actually C99.

I'll clean that up, thanks!


reply via email to

[Prev in Thread] Current Thread [Next in Thread]