[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Factor %FLAG at scan level.

From: Akim Demaille
Subject: Re: [PATCH] Factor %FLAG at scan level.
Date: Wed, 8 Apr 2009 14:39:07 +0200

Le 8 avr. 09 à 10:36, Joel E. Denny a écrit :

On Mon, 6 Apr 2009, Akim Demaille wrote:

Some day, I would really like to find some means for the user to pass some yytext to yyerror_syntax error, so that we can have the genuine look-ahead
string reported in the error message.

I think that'll be solved when we finally implement %error-report to
replace yyerror.  That is, the user can then access anything that a
semantic action can access without dealing with a fluctuating yyerror
argument list.


I can see how users with existing grammar files with existing %printer's
would benefit if Bison automatically provided an operator<< for
symbol_type that invoked those %printer's.

Exactly.  It is also a very simple means to check a yylex.

Even though that approach is
not actually possible, the user still has an opportunity to share code
between the two by specifying that %printer invoke operator<< instead.
If we make the symbol_type object directly accessible in %printer, the
user can even write a single %printer that invokes operator<< for all
symbols, which are specified by <*> and <>.

This would be a significant change of interface: %printer and %destructor work on the semantic value only, after the dispatch on the type, while symbol_type is the triple. And this is a nice feature IMHO, as it is modular: you don't have to write a single %printer which must handle the dispatching, so you would have to concentrate the code there. Rather, it is scattered along the %type directives.

Yet a single %printer { debug_stream() << $$ } <*> works fine in most (C++) cases, so there is not so much clutter for the user. Sure, there remains clutter in the output code where 'debug_stream() << $$' is repeated many many times, but each $$ is actually different. We need this repetition, it cannot be factored (unless we investigate template based answers, similar to make_SYMBOL afterall).

BTW, I'm using the attached script in my production parser to factor case-clauses that are equal. I don't know if compilers do it by default, but I doubt they would all do it. It would be nice from Bison to do it for us.

Attachment: fuse-switch
Description: Binary data

- should -Derror-verbose and -Derror_verbose be the same?

What if we just accept "%define error-verbose"?  I'd like to convert
api.push_pull and lr.keep_unreachable_states to use dashes as well.

Yes, I agree, that looks much nice.  Will do (at some point :).

Maybe we should go ahead and rename api.push_pull and
lr.keep_unreachable_states in 2.5.

There's some ambiguity here: do you mean to make - vs. _ indifferent, or simply move to using - ? I first thought about normalizing _ into -, but afterall I prefer a single name.

I'd also like to formally state that
we don't plan to remove the old names.


 api.pure (Boolean)
 api.push-pull (pull, push, both)
 lr.type (LALR, IELR, or canonical)
 lr.default-rules (full, consistent-states, accept)
 lr.keep-unreachable-states (Boolean)

In most cases, I use namespaces to make the purpose clearer. Ironically,
we didn't think of a good namespace for namespace.

Why not 'api.'?

These all describe different aspects of the run-time parser behavior.
Maybe they should be:


Good with me.

Is assert too general?  Will there ever be other kinds of parser
assertions that should be controlled independently?

I don't think we should go too much in the details. Maybe I should have put it into api.debug, but it incurs some speed penalty, while api.debug does not (or almost) and can be left active in production parsers, while, IMHO, assert should not.

BTW, I would prefer api.trace ove api.debug for %debug. No big deal. And then, maybe %api.debug to be what I called %api.assert.

- %define lex_symbol
yylex returns a symbol_type instead of taking pointers to value/ location and
returning the type.

I would prefer:


I really am not happy with "lex-symbol", but if you can't do better, I certainly can't :)

This could not be used in conjunction with api.pure, right? I mean, it
implies purity, so specifying api.pure is illogical.

Yes, definitely.

- %define locations

This makes an api change and adds a new task to the parse, so I'm not sure
any specific namespace is appropriate.

So maybe it should be a namespace per se. People (I know some) would certainly appreciate more control over the types used here. It's already using %define filename_type to be told... the type of the filename field, and a hidden control to specify whether we want ctors or not (as it forbids them from being included in unions).

- %define variant (, and in my branch)
 Use variants instead of union.

Maybe use "variants" for consistency with "locations".  That fits the
phrase "use variants".  Is it appropriate to think of this as an api

Well, that depends what you call "api": AFAIR it does not change anything to yylex/yyparse api, but it does change the Bison API. For a start %type <> expects genuine types, no longer type tags bouncing to %union. Here is examples/variant.yy:

%skeleton ""
%define assert
%define variant
%define lex_symbol

%code requires // *.hh
#include <list>
#include <string>
typedef std::list<std::string> strings_type;

%code // *.cc
#include <algorithm>
#include <iostream>
#include <iterator>
#include <sstream>

  // Prototype of the yylex function providing subsequent tokens.
  static yy::parser::symbol_type yylex ();

  // Printing a list of strings.
  // Koening look up will look into std, since that's an std::list.
  namespace std
    operator<< (std::ostream& o, const strings_type& s)
      std::copy (s.begin (), s.end (),
std::ostream_iterator<strings_type::value_type> (o, "\n"));
      return o;

  // Conversion to string.
  template <typename T>
    string_cast (const T& t)
    std::ostringstream o;
    o << t;
    return o.str ();

%token <::std::string> TEXT;
%token <int> NUMBER;
%printer { debug_stream () << $$; }
   <int> <::std::string> <::std::list<std::string>>;
%token END_OF_FILE 0;

%type <::std::string> item;
%type <::std::list<std::string>> list;


  list  { std::cout << $1 << std::endl; }

  /* nothing */ { /* Generates an empty string list */ }
| list item     { std::swap ($$, $1); $$.push_back ($2); }

  TEXT          { std::swap ($$, $1); }
| NUMBER        { $$ = string_cast ($1); }

// The yylex function providing subsequent tokens:
// TEXT         "I have three numbers for you:"
// NUMBER       1
// NUMBER       2
// NUMBER       3
// TEXT         " and that's all!"

yylex ()
  static int stage = -1;
  yy::parser::location_type loc(0, stage + 1, stage + 1);
  switch (stage)
    case 0:
return yy::parser::make_TEXT ("I have three numbers for you.", loc);
    case 1:
    case 2:
    case 3:
      return yy::parser::make_NUMBER (stage, loc);
    case 4:
      return yy::parser::make_TEXT ("And that's all!", loc);
      return yy::parser::make_END_OF_FILE (loc);

// Mandatory error function
yy::parser::error (const yy::parser::location_type& loc, const std::string& msg)
  std::cerr << loc << ": " << msg << std::endl;

main ()
  yy::parser p;
  p.set_debug_level (!!getenv ("YYDEBUG"));
  return p.parse ();

(Should the location be before the value in the make_SYMBOLs? It would probably look nicer.)

[Forking the end of the message to another message]

reply via email to

[Prev in Thread] Current Thread [Next in Thread]