[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Parsing a language with optional spaces
From: |
Christian Schoenebeck |
Subject: |
Re: Parsing a language with optional spaces |
Date: |
Sun, 12 Jul 2020 19:47:41 +0200 |
On Donnerstag, 9. Juli 2020 06:51:38 CEST Akim Demaille wrote:
> > Le 8 juil. 2020 à 22:14, Akim Demaille <akim@lrde.epita.fr> a écrit :
> >
> > That reminds me of a paper I read long ago, someone proposing
> > "heisentokens": having the scanner return multiple tokens concurrently,
> > for the different scanning options, branched into GLR to follow the
> > various options and let dead ends die (that would be a second time?).
>
> I couldn't find references to that for a good reason: it was referring
> to Schrödinger, not Heisenberg.
>
> https://webhome.cs.uvic.ca/~nigelh/Publications/Schrodinger.pdf
Yeah, almost what I had in mind. But where I disagree, quote:
> Lexical feedback
>
> The scanner can be made aware of context if the parser and scanner share
> some state. The parseruses this shared state to communicate context
> information to the scanner; this is referred to as lexicalfeedback
> [6,7].
>
> Lexical feedback has a number of problems in practice. It couples the
> scanner and parser tightly:not only do they share state, but the parser and
> scanner must operate in lock-step. The scanner cannot,for example, tokenize
> the entire input in a tight loop, or operate in a separate thread of
> execution,because at any moment the parser may direct it to change how it
> interprets the input.
Running scanner and parser in different threads is IMO quite exotic. And
multi-threading could still be handled on GLR parser side for the individual
branches that fork and pop up, if that multi-threading optimization would
really be necessary for an application.
> Additionally,the programmer must fully understand how
> and when the parser handles tokens, otherwise subtle bugsmay be introduced.
I don't see what they exactly had in mind here for that to claim. I would even
claim the opposite; their suggestion involves a much more developer-aware
approach than Lexical feedback (bidirectional communication between parser and
scanner) would require.
In their example, and using their suggested approach, a developer would need
to be aware that there is an ambiguity in the language for detecting the
character sequence 'if' as potentially two different tokens (IF or ID), so
(s)he would write:
if return schrodinger(IF, ID);
That's error prone. In this particular example it might be obvious, in other
scanarios it certainly isn't.
If both lexical patterns, and grammar rules are defined and handled together
'married' on parser side (like I actually had it in mind), the parser could
automatically take care of these ambiguities on behalf of the developer in a
very simple and intuitive way:
%token IF
%token THEN
%token ID
%glr-parser
/* token ID -> lexical RegEx patterns */
IF: if
THEN: then
ID: [a-zA-Z][a-zA-Z0-9_]*
/* grammar rules */
stmt: ifstmt | asgnstmt
ifstmt: IF expr THEN stmt
asgnstmt: ID '=' expr
expr: ID = ID
I find that pretty much developer friendly.
Note that the latter example would of course [require to] work differently
than the common greedy and context unaware scanner:
As a 'married' GLR parser would now have knowledge of both the vocabulary
(tokens and their raw character patterns) and the grammar rules, it could
handle the ambiguity between the 2 tokens ID and IF as 'Schrödinger' tokens
automatically for the developer: it would activate only those tokens/patterns
which would be able to produce the next possible grammar rules, and if a
context would allow both tokens, it would automatically process the 2 possible
tokens as that particular set of 'Schrödinger' tokens and eventually fork the
GLR state accordingly.
Besides being more easy for the developer to understand and write, and the
language definition being easier to read and maintain, it would also have
other benefits:
The scanner steps could be optimized as well: If scanners are unaware (like
they are today) of any grammatical context, they must assume that any pattern
may match at any time. Hence the lexical automat typically runs much longer,
on a much more complex state machine than it needs to be, as a context unaware
scanner always must assume that any of the globally defined tokens might
match, even though in a specific grammatical context only e.g. 1 or 2 tokens
may actually be possible being matched.
Akim, question regarding Bison's GLR parser ...
https://www.gnu.org/software/bison/manual/html_node/Simple-GLR-Parsers.html :
> In general, a GLR parser can take quadratic or cubic worst-case time, and
> the current Bison parser even takes exponential time and space for some
> grammars. In practice, this rarely happens, and for many grammars it is
> possible to prove that it cannot happen.
... why isn't it O(n^3)? Or more specifically, which common GLR fork
optimization(s) are yet missing?
And BTW:
> The GLR parsers require a compiler for ISO C89 or later.
Tough requirement! ;-)
Best regards,
Christian Schoenebeck
- Re: Parsing a language with optional spaces, (continued)
- Re: Parsing a language with optional spaces, John P. Hartmann, 2020/07/07
- Re: Parsing a language with optional spaces, uxio prego, 2020/07/07
- Re: Parsing a language with optional spaces, Akim Demaille, 2020/07/08
- Re: Parsing a language with optional spaces, Christian Schoenebeck, 2020/07/08
- Re: Parsing a language with optional spaces, Akim Demaille, 2020/07/08
- Re: Parsing a language with optional spaces, Christian Schoenebeck, 2020/07/08
- Re: Parsing a language with optional spaces, Daniele Nicolodi, 2020/07/08
- Re: Parsing a language with optional spaces, Akim Demaille, 2020/07/09
- Re: Parsing a language with optional spaces, Akim Demaille, 2020/07/09
- Re: Parsing a language with optional spaces, uxio prego, 2020/07/11
- Re: Parsing a language with optional spaces,
Christian Schoenebeck <=
- Re: Parsing a language with optional spaces, Akim Demaille, 2020/07/13
- Re: Parsing a language with optional spaces, Akim Demaille, 2020/07/14
- Re: Parsing a language with optional spaces, Christian Schoenebeck, 2020/07/14
- Re: Parsing a language with optional spaces, Akim Demaille, 2020/07/18
- Re: Parsing a language with optional spaces, Christian Schoenebeck, 2020/07/14
- Re: Parsing a language with optional spaces, Akim Demaille, 2020/07/18
- Re: Parsing a language with optional spaces, Christian Schoenebeck, 2020/07/20
- Re: Parsing a language with optional spaces, Akim Demaille, 2020/07/26
- Re: Parsing a language with optional spaces, John P. Hartmann, 2020/07/26
- Re: Which lexer do people use?, uxio prego, 2020/07/06