Re: [RFA] GLR parsing

bison-patches
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFA] GLR parsing

From:	Akim Demaille
Subject:	Re: [RFA] GLR parsing
Date:	27 Jun 2002 13:33:49 +0200
User-agent:	Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Honest Recruiter)
| I finally received the necessary release from the University of
| California, and have started checking in changes for an experimental
| GLR parsing feature in Bison. 

Yahoo!

| So far, I have added data/bison.glr and

Arg.  How about glr.c instead?  Or even glalr1.c?  The other skeletons
will be renamed too, the current naming scheme is not satisfying.

| tests/cxx-type.at, which (being additions) have no effect on existing
| stuff.

That's great!

| I have appended a mongo patch for connecting things in, on
| which (by some access of caution) I would appreciate review before
| checkin.

I'll try myself on this exercise :)
 

All the documentation stuff is great.


| +This models a problematic part of the C++ grammar---the ambiguity between
| +certain declarations and statements.  For example,
| address@hidden
| +T (x) = y+z;
| address@hidden example

You want @noindent here.

| +parses as either an @code{expr} or a @code{stmt}
| +(assuming that @samp{T} is recognized as a TYPENAME and @samp{x} as an ID).
| +Bison detects this as a reduce/reduce conflict between the rules
| address@hidden : ID} and @code{declarator : ID}, which it cannot resolve at 
the 
| +time it encounters @code{x} in the example above.  The two @code{%dprec} 
| +declarations, however, give precedence to interpreting the example as a 
| address@hidden, which implies that @code{x} is a declarator.  The parser 
therefore
| +prints

Please, try to be within 72 columns.


| +Suppose that instead of resolving the ambiguity, you wanted to see all
| +the possibilities.  For this purpose, we must @dfn{merge} the semantic 
| +actions of the two possible parsers, rather than choosing one over the other.
| +To do so, you could change the declaration of @code{stmt} as follows:
| address@hidden
| +stmt : expr ';'  %merge <stmtMerge>
| +     | decl      %merge <stmtMerge>
| +     ;
| address@hidden example

@noindent

| +and define the @code{stmtMerge} function as:
| address@hidden
| +static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1)
| address@hidden
| +  printf ("<OR> ");
| +  return "";
| address@hidden
| address@hidden example

@noindent

| +with an accompanying forward declaration
| +in the C declarations at the beginning of the file:
| address@hidden
| address@hidden
| +  #define YYSTYPE const char*
| +  static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1);
| address@hidden
| address@hidden example
| +With these declarations, the resulting parser will parse the first example
| +as both an @code{expr} and a @code{decl}, and print
| address@hidden
| +"x" y z + T <init-declare> x T <cast> y z + = <OR> 
| address@hidden example


|  @node Locations Overview
|  @section Locations
|  @cindex location
| @@ -2913,7 +3091,7 @@ the location of the grouping (the result
|  is an array holding locations of all right hand side elements of the rule
|  being matched. The last one is the size of the right hand side rule.
|  
| -By default, it is defined this way:
| +By default, it is defined this way for simple LALR(1) parsers:
|  
|  @example
|  @group
| @@ -2925,6 +3103,19 @@ By default, it is defined this way:
|  @end group
|  @end example
|  
| address@hidden
| +and like this for GLR parsers:
| +
| address@hidden
| address@hidden
| +#define YYLLOC_DEFAULT(Current, Rhs, N)          \
| +  Current.first_line   = YYRHSLOC(Rhs,1).first_line;      \
| +  Current.first_column = YYRHSLOC(Rhs,1).first_column;    \
| +  Current.last_line    = YYRHSLOC(Rhs,N).last_line;       \
| +  Current.last_column  = YYRHSLOC(Rhs,N).last_column;
| address@hidden group
| address@hidden example

We should to unify this.  We still have some freedom on locations, and
we probably want to escape from macros, and use some new %directives
instead.


| +It is possible to use a data structure for the GLR parsing tree that
| +permits the processing of any LALR(1) grammar in linear time (in the
| +size of the input), any unambiguous (not necessarily LALR(1)) grammar in
| +quadratic worst-case time, and any general (possibly ambiguous) 
| +context-free grammar in cubic worst-case time.  However, Bison currently
| +uses a simpler data structure that requires time proportional to the
| +length of the input times the maximum number of stacks required for any
| +prefix of the input.  Thus, really ambiguous or non-deterministic
| +grammars can require exponential time and space to process.  Such badly
| +behaving examples, however, are not generally of practical interest.
| +Usually, non-determinism in a grammar is local---the parser is ``in
| +doubt'' only for a few tokens at a time.  Therefore, the current data
| +structure should generally be adequate.  On LALR(1) portions of a
| +grammar, in particular, it is only slightly slower than with the default
| +Bison parser.

Hm, I thought you were using one such structure.  I guess you're
saying the stacks are not maximally shared, right?


| +/*--------------------------------------.
| +| Free all merge-function definitions.       |
| +`--------------------------------------*/
| +
| +void
| +free_merger_functions (void)
| +{
| +  merger_list *L0;
| +  if (! glr_parser)
| +    return;
| +  L0 = merge_functions;
| +  while (L0 != NULL)
| +    {
| +      merger_list *L1 = L0->next;
| +      XFREE (L0);

I think you mean free here, XFREE just checks if L0 != NULL before
freeing.  I know there remains many XFREE where free is meant, but
that's because the code cleaning is not finished.

| +      L0 = L1;
| +    }
| +}
| +


| @@ -477,6 +557,7 @@ save_row (int state)
|    short *sp;
|    short *sp1;
|    short *sp2;
| +  unsigned int *sp3;
|  
|    count = 0;
|    for (i = 0; i < ntokens; i++)
| @@ -488,12 +569,18 @@ save_row (int state)
|  
|    froms[state] = sp1 = sp = XCALLOC (short, count);
|    tos[state] = sp2 = XCALLOC (short, count);
| +  if (glr_parser)
| +    conflict_tos[state] = sp3 = XCALLOC (unsigned int, count);  
| +  else 
| +    conflict_tos[state] = NULL;

Hm, don't we want something more ``typed''?  These are state_number_t,
right?  Even if it's only typedef'd to unsigned int, reading
state_number_t makes it much easier to follow.


| +/*--------------------------------------.
| +| Output the merge functions to OOUT.   |
| +`--------------------------------------*/

s/OOUT/OUT/.

| +
| +void
| +merger_output (FILE *out)
| +{
| +  int n;
| +  merger_list* p;
| +
| +  fputs ("m4_define([b4_mergers], \n[[", out);
| +  for (n = 1, p = merge_functions; p != NULL; n += 1, p = p->next) 
| +    {
| +      if (p->type[0] == '\0') 
| +     fprintf (out, "  case %d: yyval = %s (*yy0, *yy1); break;\n",
| +              n, p->name);
| +      else
| +     fprintf (out, "  case %d: yyval.%s = %s (*yy0, *yy1); break;\n",
| +              n, p->type, p->name);
| +    }
| +  fputs ("]])\n\n", out);
| +}

Maybe we can move this out from bison, and have M4 do?



|  static void
| +output_conflicts (void)
| +{
| +  /* GLR parsing slightly modifies yytable and yycheck
| +     (and thus yypact) so that in states with unresolved conflicts,
| +     the default reduction is not used in the conflicted entries, so
| +     that there is a place to put a conflict pointer.  This means that
| +     yyconflp and yyconfl are nonsense for a non-GLR parser, so we
| +     avoid accidents by not writing them out in that case. */
| +  if (! glr_parser)
| +    return;
| +
| +  muscle_insert_unsigned_int_table ("conflp", conflict_table, 
| +                                 conflict_table[0], 1, high+1);
| +  muscle_insert_unsigned_int_table ("confl", conflict_list, 
| +                          conflict_list[0], 1, conflict_list_cnt);
| +

This is something else that ought to change in CVS Bison: there is no
reason to use obscure muscle names.  `conflp' and `confl' are really
cryptic.  I agree things like `stos' are very cryptic too, but they
will be changed to be more readable.  New muscles should not continue
on the original names.



| Index: bison-1_5.26/src/conflicts.h
| --- bison-1_5.26/src/conflicts.h Sun, 26 May 2002 14:18:11 -0700 hilfingr 
(glrbison/b/29_conflicts. 1.1.1.2 644)
| +++ 2.64(w)/src/conflicts.h Sun, 26 May 2002 15:13:10 -0700 hilfingr 
(glrbison/b/29_conflicts. 1.1.1.3 644)
| @@ -24,6 +24,7 @@
|  
|  void conflicts_solve PARAMS ((void));
|  void conflicts_print PARAMS ((void));
| +int count_total_conflicts PARAMS ((void));

I tried to stick within pseudo namespaces, i.e.,
conflicts_total_count, or conflicts_count_total if you prefer.

|  void conflicts_output PARAMS ((FILE *out));
|  void conflicts_free PARAMS ((void));
|  
| Index: bison-1_5.26/NEWS
| --- bison-1_5.26/NEWS Tue, 18 Jun 2002 17:04:55 -0700 hilfingr 
(glrbison/c/23_NEWS 1.1.1.3.1.1.1.1.1.1.1.1 644)
| +++ 2.64(w)/NEWS Tue, 18 Jun 2002 17:05:53 -0700 hilfingr (glrbison/c/23_NEWS 
1.1.1.3.1.1.1.1.1.1.1.2 644)
| @@ -3,6 +3,14 @@ Bison News
|  
|  Changes in version 1.49b:
|  
| +* GLR parsing
| +  The declaration
| +     %glr-parser
| +  causes Bison to produce a Generalized LR (GLR) parser, capable of handling
| +  almost any context-free grammar, ambiguous or not.  The new declarations
| +  %dprec and %merge on grammar rules allow parse-time resolution of 
| +  ambiguities.

Man, this deserves a `contributed by Paul Hilfinger'!


It looks great!
[Prev in Thread]
Current Thread
[Next in Thread]
[RFA] GLR parsing, Paul Hilfinger, 2002/06/26
- Re: [RFA] GLR parsing, Akim Demaille <=
  - Re: [RFA] GLR parsing, Akim Demaille, 2002/06/27
  - Re: [RFA] GLR parsing, Paul Hilfinger, 2002/06/27
    - Re: [RFA] GLR parsing, Akim Demaille, 2002/06/28
Prev by Date: [RFA] GLR parsing
Next by Date: Re: [RFA] GLR parsing
Previous by thread: [RFA] GLR parsing
Next by thread: Re: [RFA] GLR parsing
Index(es):
- Date
- Thread