[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: why are locations dictated by bison?

From: Hans Aberg
Subject: Re: why are locations dictated by bison?
Date: Fri, 18 Jan 2002 21:21:39 +0100

At 13:58 -0500 2002/01/18, Bruce Lilly wrote:
>Stop! '\0' is used as a string terminator by convention; any
>program that wishes to handle ASCII NUL must use a different
>convention regardless of the programming language (and flex
>provides both yytext and yyleng).

I am well aware of this fact, but as a matter of practise, programs written
in C will often not be able to treat \0 as a special character, as it will
interpreted as the end-of-string.

>and the following C++ program doesn't:
>#include <iostream>
>using namespace std;
>int main(int argc, char **argv) { /* Premature exasperation */
>        cout << "foo\0bar";
>        return 0;

>so it's clearly not a simple matter of "C++ strings do
>not have that restriction".

This is not the C/C++ programming list, but "..." strings, which are not
the C++ std::string class I had in my mind, are implemented as the full
string plus a \0; one then first has to convert it to a std::string for it
to work.

>> So it might be prudent to treat \0 as a full character in all
>> circumstances: Not doing so may cause people to overlook proper treatment
>> on this silly little detail.
>It's unclear what you mean by a "full" character...

Just that it does not make (the corresponding runtime array of) "foo\0bar"
appear as only "foo" by some routine (in this context, also see below).

>yylex has always been defined as a function that returns
>an integer value representing some lexical token or end
>of input (and the value zero has been reserved for that
>purpose). If a programmer has a need to return a token
>for the single character '\0', some integer value is
>assigned to a token and that value is returned and used
>by the parser.

The thing is that token codes 1..255 are reserved for characters; no other
token can ever have a value in that range, so it is possible for the Bison
parser (if Bison is upgraded) to issue a special error message "undefined
<character>". \0 is an exception to that, in view of that 0 is YYEOF.

One then needs not defining a special token for single characters, even
though it is of course wellknown that one can do so.

>> The idea with using the YYEOF macro would be prudent rather regardless of
>> its value, as one should normally use macros, instead of values.
>So, in your opinion, we should have yyparse() return
>instead of 0 if no parse errors are encountered?

The C/C++ standards now say that main() should return EXIT_SUCCESS or
EXIT_FAILURE (defined in <cstdlib>) because the traditional values may not
be, and in fact are not, used on some platforms. So following that scheme,
one could use YYEXIT_SUCCESS and YYEXIT_FAILURE, or merely YYSUCCESS and
YYFAILURE. (Under C++, a failure might throw an approproite exception.)

>And in the case of bison,
>instead of 2,

Under C++, I use a standard container class (like std::deque), which can
throw an exception (I am not an ABC programmer myself). But you might use
YYSTACK_OVERFLOW or something.

> etc., etc., etc.

That is what is appropriate with errors: They are classified, and macros or
language names are defined for them.

>On a more serious note,


> what applies to boilerplate code
>applies equally well to the grammar rulesets in the
>parser.  One should therefore use a token like NUL in
>those rulesets rather than hard-coding '\0' (which
>can't appear there anyway). So it doesn't matter what
>integer value is assigned to that token (NUL) and
>therefore no reason to go through all sorts of contortions
>to try to force a particular value (viz. zero).

The thing that the values 1..255 are already reserved for characters, and
can be reached from within Bison by a '...' construct. But it does not work
with '\0'.

But one can write
  %token NUL "\0"
and the use "\0" instead of '\0' or NUL, I think.

>> would not cause any incompatibilities, but would admit those that want to
>> change it to do so, plus by using YYEOF explicitly, it would communicate to
>> the human reader that this is an end of parser condition.
>                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>No, an end of input condition, not end of parser (see, it
>doesn't help; it confused you already :-).  Because of
>yywrap(), parsing may continue.

How do you get the yywrap() to be called once YYEOF has been returned by
yylex?: When the Bison parser encounters YYEOF, or any value < 0, from the
yylex function, it treats that as the end state, thus not calling yylex

There is a difference though: Perhaps Bison's YYEOF should be renamed to
YYEOP, as for "end-of-parse". Then YYEOF is used to indicate the
end-of-stream condition for possible warp, whereas EOP is used to indicate
the end state.

>  In order to do proper regression
>testing, the generator will have to be built and tested on
>multiple platforms, with multiple compilers using every
>significant variation of every macro.  Does the term
>"combinatorial explosion" sound familiar?

I am not sure what you mean: The macros would default to the old values, so
no one not altering them deliberately would notice any change. If one
alters them, one is on ones own.

It is more tricky if one at some point alters the YYEOF default from 0 to
-1. One way is to allow the old default in the compatibility mode. If one
is reasonably sure the change is OK, then I figure people will report after
the new release if it is not. :-)

Or how do you think that the current Flex/Bison were developed?

> Flex already
>has macros yyterminate and YY_NULL; why do you think yet
>another one is necessary?

So then you only have to change its name to YYEOF: Why giving a macro a
name after its value and not intended use? That sound strange.

>... You seem to be suggesting that this should instead
>be something like: ...

Warning: I am not that good at reading extensive polemics. So I skip over
this stuff. :-)

  Hans Aberg

reply via email to

[Prev in Thread] Current Thread [Next in Thread]