bug-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/3] yacc: compute the best type for the state number


From: Kaz Kylheku
Subject: Re: [PATCH 0/3] yacc: compute the best type for the state number
Date: Tue, 01 Oct 2019 11:40:24 -0700
User-agent: Roundcube Webmail/0.9.2

On 2019-10-01 01:35, Paul Eggert wrote:
On 9/29/19 11:34 AM, Akim Demaille wrote:

As a matter of fact, we used two types:

- most arrays (such as state stack, and its correspondance in the LAC
  infrastructure) are using int16_t.  A few "temporary" variables also
  have this type.

- yystate, which is an intermediate variable, was an int.

Actually those arrays use int_fast16_t not int16_t, as C99/C11/C18
does not require support for int16_t. It could well be more efficient
for them to use int_least16_t instead, for better caching; see below.

I would make two typedefs: one for the storage type of Yacc
state values, and one for the "register" type for manipulating
them in local variables.

The latter of course, being capable of representing all the values of
the former. E.g. example definitions:

   typedef short yy_small_state_t;
   typedef int yy_state_t;

The special value -1 should be given a manifest constant, like
YY_BAD_STATE or whatever. This constant should have a value
which is representable in the small type yy_small_state_t,
not a value which changes when converted to yy_small_state_t.

In other words:

   YY_BAD_STATE == (yy_state_t) YY_BAD_STATE

This is currently not true of the value -1 for various combinations
of the two types. Obviously if both are signed, there isn't
any issue.

(Needless to say, states should be compared to this value exactly,
never using "state < 0"; the constant is not even required to
to be negative.)

In my proposal below, I fuse both cases to use a single type,
yy_state_num, which is the smallest type that contains the set of
states: an unsigned type.

In other GNU applications, we've been moving away from using unsigned
types due to their confusing behavior (particularly when comparing
signed vs unsigned), and because modern tools such as 'gcc

The unsigned type are indeed silly when arithmetic is involved
because they break basic arithmetic identities like. For instance,
given:

      a > b - c

we'd like to be free to move the c to the other side:

  a + c > b

if all three are signed types, with small-ish values clustered
around zero, then there is no overflow and the derivation holds.
If they are unsigned types, even if the values are small, the
derivation is not justified, because b - c may produce a huge value.
That's the crux; unsigned has an overflow cliff right at zero,
creating a pitfall for calculations involving trivial numbers.

That said, parser states are more of an enumeration. They are identifiers. We shouldn't be doing any math on them of this sort. Given a parser state S, there is nothing interesting about S+1; they are not related. When would
we ever multiply two yacc states, or find their difference and such, or
care whether one is greater than the other?

The unsigned types provide a greater range for the possible states,
without having to use negative values, so they are understandably an
attractive tool for combating the problem of running out of states.

I don't see a big problem with using them. This is all internal to
a skeleton; it's not an API being perpetrated on the user. That's why
there is so much latitude to fiddle with these types at parser generation
time in the first place. Just the skeleton code and any generated
bits have to get it right; the parser generator user isn't burdened
with anything.

Rearding indexing, it's difficult to keep unsigned types entirely
out of indexing calculations because an unsigned type is injected
into the mix whenever we invoke the sizeof operator.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]