help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: why are locations dictated by bison?


From: Hans Aberg
Subject: Re: why are locations dictated by bison?
Date: Tue, 15 Jan 2002 19:53:17 +0100

At 16:25 +0100 2002/01/15, Akim Demaille wrote:
>| In addition to the YY_USER_INIT and YY_USER_ACTION, one has two macros
>| YY_LOCATION_BEGIN and YY_LOCATION_END executed at the same spots.
>
>I have nothing against this, but really, that's not what I call a
>feature...  But we agree, that's the only reasonable one.

The locations values are just a semantic addition, by which there has been
some defaults provided for the convenience of the programmer in the cases
it is usable.

>Oh, BTW, my code is wrong: you cannot use YY_USER_INIT safely, since
>if the same scanner is run onto another file, it will keep the old
>name.  I need to do that in the parser (I'm using a reentrant
>parser/scanner), in the axiom.

That's one reason for bringing up this topic: Flex can better keep track of
some such subtleties.

One other subtlety is that the Flex scanner sometime can back-track, and
any %locations feature should keep track of that: Either by implementing a
semantically correct version, or telling it cannot handle it.

>How do you know if yylloc is a pointer (case reentrant), or directly
>the structure (case not reentrant).

My idea is that the synched Flex/Bison defaults provides some solution that
is as lightweight as possible, and which the programmers that decide to use
it must adhere to.

A fully re-entrant parser, I recall is on the Flex development list.

I made a C++ re-entrant parser by tweaking the FlexLexer.h file. It is not
fully re-entrant, because Bison does not have any way of putting extra
stuff in the parser body.

Otherwise, I am not sure what you mean by this difference
pointer/non-pointer difference you are speaking about, perhaps I am
thinking all C++: I use
  class FlexLexer {
  public:
    ...
    virtual int yylex(YYLEX_ARG) = 0;
    ...
  };
where YYLEX_ARG is set to a suitable argument. For the re-entrant parser, I use
  #include "pg_parser.tab.h"
  #define YYLEX_ARG YYSTYPE& yylval, YYLTYPE& yylloc
  #define YY_DECL int yyFlexLexer::yylex(YYLEX_ARG)
  #define yyFlexLexer pg_parserFlexLexer
  #include "FlexLexer.h"

The C++.bison (C++ simple.bison) skeleton file I have tweaked to:
#if YYPURE
# if YYLSP_NEEDED
#  ifdef YYLEX_PARAM
#   define YYLEX                yylex (yylval, yylloc, YYLEX_PARAM)
#  else
#   define YYLEX                yylex (yylval, yylloc)
#  endif
# else /* !YYLSP_NEEDED */
#  ifdef YYLEX_PARAM
#   define YYLEX                yylex (yylval, YYLEX_PARAM)
#  else
#   define YYLEX                yylex (yylval)
#  endif
# endif /* !YYLSP_NEEDED */
#else /* !YYPURE */
# define YYLEX                  yylex ()
#endif /* !YYPURE */

Thus the typing of yylex is by references:
  state_type yylex(YYSTYPE&, YYLTYPE&, ...);

>  How do you know that the user is
>not coding locations in yylval?  That's what non Bison users do to
>track locations.

I am not sure what you mean here; please explain or give example.

>| This Flex default locations structure will be synched with the new Bison
>| locations default.
>
>Flex cannot make assumptions over yylloc, just as it cannot make
>assumptions over yylval.  If you break that, I guarantee the future
>maintainers of these tools will hate you.  Don't tie the others' hand.

The picture I have in my mind is:

It can give assumptions over the default structures. But by giving new
macro definitions, the user can choose what they want. Then Flex should
only supply access to the data chosen by %locations.

>Please, show me how you simplify a scanner, then we can talk.

I already simpified you example; right (see below)?

Also, I do use the %option lineno feature; so they already are "simplified".

Do yo want me to send them to you, and you "simplify" them further by your
method?

>| Thus the simplification of your code would be to use %locations instead of
>| YY_USER_INIT and YY_USER_ACTION macros, and then you do not need to put in
>| locations actions into the rules, except to override the default.
>
>I don't follow you here.  Show me.

One would have to figure out how to design the default YYLTYPE is designed.
Then the code you wrote would be the same, except that the yylloc stuff is
removed, replaced by %locations ...

The error stuff in your code, you would still have to write out yourself,
but that would be all in your particular example.

If one has special tokens, like ofre example I use
%{
#define get_text(pos, red) yylval.text \
   = std::string(yytext + (pos), yyleng - (pos + red))
%}
to zip out first pos and last red characters, then one should able to to
the same for the locations data. (But I have not thought much about that
problem.)

>| In view of that you already use std::operator new() and such C++ stuff, the
>| locations overhead is likely to be insignificant.
>
>As far as *I*'m concerned, I agree.

Those that disagree need not to use this feature. But the idea is that it
should be as lightweight as possible.

The only problem here, I think is to provide a char* to the beginning of
the line. I think this might be implemented into Flex by defining a maximum
line size (which could be like 1 or 2 kB): If the buffer has less than this
at the start of the scan, it will switch to a new buffer. Then one is
guaranteed of being able to display at least 1 - 2 k characters in an error
message.

  Hans Aberg





reply via email to

[Prev in Thread] Current Thread [Next in Thread]