[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFA] GLR parsing
From: |
Akim Demaille |
Subject: |
Re: [RFA] GLR parsing |
Date: |
27 Jun 2002 13:33:49 +0200 |
User-agent: |
Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Honest Recruiter) |
| I finally received the necessary release from the University of
| California, and have started checking in changes for an experimental
| GLR parsing feature in Bison.
Yahoo!
| So far, I have added data/bison.glr and
Arg. How about glr.c instead? Or even glalr1.c? The other skeletons
will be renamed too, the current naming scheme is not satisfying.
| tests/cxx-type.at, which (being additions) have no effect on existing
| stuff.
That's great!
| I have appended a mongo patch for connecting things in, on
| which (by some access of caution) I would appreciate review before
| checkin.
I'll try myself on this exercise :)
All the documentation stuff is great.
| +This models a problematic part of the C++ grammar---the ambiguity between
| +certain declarations and statements. For example,
| address@hidden
| +T (x) = y+z;
| address@hidden example
You want @noindent here.
| +parses as either an @code{expr} or a @code{stmt}
| +(assuming that @samp{T} is recognized as a TYPENAME and @samp{x} as an ID).
| +Bison detects this as a reduce/reduce conflict between the rules
| address@hidden : ID} and @code{declarator : ID}, which it cannot resolve at
the
| +time it encounters @code{x} in the example above. The two @code{%dprec}
| +declarations, however, give precedence to interpreting the example as a
| address@hidden, which implies that @code{x} is a declarator. The parser
therefore
| +prints
Please, try to be within 72 columns.
| +Suppose that instead of resolving the ambiguity, you wanted to see all
| +the possibilities. For this purpose, we must @dfn{merge} the semantic
| +actions of the two possible parsers, rather than choosing one over the other.
| +To do so, you could change the declaration of @code{stmt} as follows:
| address@hidden
| +stmt : expr ';' %merge <stmtMerge>
| + | decl %merge <stmtMerge>
| + ;
| address@hidden example
@noindent
| +and define the @code{stmtMerge} function as:
| address@hidden
| +static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1)
| address@hidden
| + printf ("<OR> ");
| + return "";
| address@hidden
| address@hidden example
@noindent
| +with an accompanying forward declaration
| +in the C declarations at the beginning of the file:
| address@hidden
| address@hidden
| + #define YYSTYPE const char*
| + static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1);
| address@hidden
| address@hidden example
| +With these declarations, the resulting parser will parse the first example
| +as both an @code{expr} and a @code{decl}, and print
| address@hidden
| +"x" y z + T <init-declare> x T <cast> y z + = <OR>
| address@hidden example
| @node Locations Overview
| @section Locations
| @cindex location
| @@ -2913,7 +3091,7 @@ the location of the grouping (the result
| is an array holding locations of all right hand side elements of the rule
| being matched. The last one is the size of the right hand side rule.
|
| -By default, it is defined this way:
| +By default, it is defined this way for simple LALR(1) parsers:
|
| @example
| @group
| @@ -2925,6 +3103,19 @@ By default, it is defined this way:
| @end group
| @end example
|
| address@hidden
| +and like this for GLR parsers:
| +
| address@hidden
| address@hidden
| +#define YYLLOC_DEFAULT(Current, Rhs, N) \
| + Current.first_line = YYRHSLOC(Rhs,1).first_line; \
| + Current.first_column = YYRHSLOC(Rhs,1).first_column; \
| + Current.last_line = YYRHSLOC(Rhs,N).last_line; \
| + Current.last_column = YYRHSLOC(Rhs,N).last_column;
| address@hidden group
| address@hidden example
We should to unify this. We still have some freedom on locations, and
we probably want to escape from macros, and use some new %directives
instead.
| +It is possible to use a data structure for the GLR parsing tree that
| +permits the processing of any LALR(1) grammar in linear time (in the
| +size of the input), any unambiguous (not necessarily LALR(1)) grammar in
| +quadratic worst-case time, and any general (possibly ambiguous)
| +context-free grammar in cubic worst-case time. However, Bison currently
| +uses a simpler data structure that requires time proportional to the
| +length of the input times the maximum number of stacks required for any
| +prefix of the input. Thus, really ambiguous or non-deterministic
| +grammars can require exponential time and space to process. Such badly
| +behaving examples, however, are not generally of practical interest.
| +Usually, non-determinism in a grammar is local---the parser is ``in
| +doubt'' only for a few tokens at a time. Therefore, the current data
| +structure should generally be adequate. On LALR(1) portions of a
| +grammar, in particular, it is only slightly slower than with the default
| +Bison parser.
Hm, I thought you were using one such structure. I guess you're
saying the stacks are not maximally shared, right?
| +/*--------------------------------------.
| +| Free all merge-function definitions. |
| +`--------------------------------------*/
| +
| +void
| +free_merger_functions (void)
| +{
| + merger_list *L0;
| + if (! glr_parser)
| + return;
| + L0 = merge_functions;
| + while (L0 != NULL)
| + {
| + merger_list *L1 = L0->next;
| + XFREE (L0);
I think you mean free here, XFREE just checks if L0 != NULL before
freeing. I know there remains many XFREE where free is meant, but
that's because the code cleaning is not finished.
| + L0 = L1;
| + }
| +}
| +
| @@ -477,6 +557,7 @@ save_row (int state)
| short *sp;
| short *sp1;
| short *sp2;
| + unsigned int *sp3;
|
| count = 0;
| for (i = 0; i < ntokens; i++)
| @@ -488,12 +569,18 @@ save_row (int state)
|
| froms[state] = sp1 = sp = XCALLOC (short, count);
| tos[state] = sp2 = XCALLOC (short, count);
| + if (glr_parser)
| + conflict_tos[state] = sp3 = XCALLOC (unsigned int, count);
| + else
| + conflict_tos[state] = NULL;
Hm, don't we want something more ``typed''? These are state_number_t,
right? Even if it's only typedef'd to unsigned int, reading
state_number_t makes it much easier to follow.
| +/*--------------------------------------.
| +| Output the merge functions to OOUT. |
| +`--------------------------------------*/
s/OOUT/OUT/.
| +
| +void
| +merger_output (FILE *out)
| +{
| + int n;
| + merger_list* p;
| +
| + fputs ("m4_define([b4_mergers], \n[[", out);
| + for (n = 1, p = merge_functions; p != NULL; n += 1, p = p->next)
| + {
| + if (p->type[0] == '\0')
| + fprintf (out, " case %d: yyval = %s (*yy0, *yy1); break;\n",
| + n, p->name);
| + else
| + fprintf (out, " case %d: yyval.%s = %s (*yy0, *yy1); break;\n",
| + n, p->type, p->name);
| + }
| + fputs ("]])\n\n", out);
| +}
Maybe we can move this out from bison, and have M4 do?
| static void
| +output_conflicts (void)
| +{
| + /* GLR parsing slightly modifies yytable and yycheck
| + (and thus yypact) so that in states with unresolved conflicts,
| + the default reduction is not used in the conflicted entries, so
| + that there is a place to put a conflict pointer. This means that
| + yyconflp and yyconfl are nonsense for a non-GLR parser, so we
| + avoid accidents by not writing them out in that case. */
| + if (! glr_parser)
| + return;
| +
| + muscle_insert_unsigned_int_table ("conflp", conflict_table,
| + conflict_table[0], 1, high+1);
| + muscle_insert_unsigned_int_table ("confl", conflict_list,
| + conflict_list[0], 1, conflict_list_cnt);
| +
This is something else that ought to change in CVS Bison: there is no
reason to use obscure muscle names. `conflp' and `confl' are really
cryptic. I agree things like `stos' are very cryptic too, but they
will be changed to be more readable. New muscles should not continue
on the original names.
| Index: bison-1_5.26/src/conflicts.h
| --- bison-1_5.26/src/conflicts.h Sun, 26 May 2002 14:18:11 -0700 hilfingr
(glrbison/b/29_conflicts. 1.1.1.2 644)
| +++ 2.64(w)/src/conflicts.h Sun, 26 May 2002 15:13:10 -0700 hilfingr
(glrbison/b/29_conflicts. 1.1.1.3 644)
| @@ -24,6 +24,7 @@
|
| void conflicts_solve PARAMS ((void));
| void conflicts_print PARAMS ((void));
| +int count_total_conflicts PARAMS ((void));
I tried to stick within pseudo namespaces, i.e.,
conflicts_total_count, or conflicts_count_total if you prefer.
| void conflicts_output PARAMS ((FILE *out));
| void conflicts_free PARAMS ((void));
|
| Index: bison-1_5.26/NEWS
| --- bison-1_5.26/NEWS Tue, 18 Jun 2002 17:04:55 -0700 hilfingr
(glrbison/c/23_NEWS 1.1.1.3.1.1.1.1.1.1.1.1 644)
| +++ 2.64(w)/NEWS Tue, 18 Jun 2002 17:05:53 -0700 hilfingr (glrbison/c/23_NEWS
1.1.1.3.1.1.1.1.1.1.1.2 644)
| @@ -3,6 +3,14 @@ Bison News
|
| Changes in version 1.49b:
|
| +* GLR parsing
| + The declaration
| + %glr-parser
| + causes Bison to produce a Generalized LR (GLR) parser, capable of handling
| + almost any context-free grammar, ambiguous or not. The new declarations
| + %dprec and %merge on grammar rules allow parse-time resolution of
| + ambiguities.
Man, this deserves a `contributed by Paul Hilfinger'!
It looks great!
- [RFA] GLR parsing, Paul Hilfinger, 2002/06/26
- Re: [RFA] GLR parsing,
Akim Demaille <=