bug-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/3] yacc: compute the best type for the state number


From: Akim Demaille
Subject: Re: [PATCH 0/3] yacc: compute the best type for the state number
Date: Wed, 2 Oct 2019 07:27:39 +0200

Hi Paul,

Thanks for the answer!  (And thanks to Kaz too)

> Le 1 oct. 2019 à 10:35, Paul Eggert <address@hidden> a écrit :
> 
> On 9/29/19 11:34 AM, Akim Demaille wrote:

> 
>> In my proposal below, I fuse both cases to use a single type,
>> yy_state_num, which is the smallest type that contains the set of
>> states: an unsigned type.

For the record, eventually I had changed my mind to use (i) int for the scalar 
variables, and (ii) for arrays of states, the smallest unsigned types of at 
most 16 bits, then an int.  Which Kaz named (i) yy_state_t and (ii) 
yy_small_state_t.  That's this patch:

https://lists.gnu.org/archive/html/bug-bison/2019-09/msg00026.html


> In other GNU applications, we've been moving away from using unsigned types 
> due to their confusing behavior

I fully subscribe to this view.  I think the trend started in C++ first, and 
followed in C.  I was unaware though that GNU projects followed this path.


> (particularly when comparing signed vs unsigned), and because modern tools 
> such as 'gcc -fsanitize=undefined' can check for signed integer overflow but 
> (obviously) not for unsigned integer overflow. In this newer style, it's OK 
> to use unsigned types for bit vectors and hash values since these typically 
> don't involve integer comparisons and integer overflow is typically harmless; 
> for indexes, though, unsigned types are so error-prone that they should be 
> avoided.

On this last sentence, though, I was a bit concerned, since I read this as 
"don't use unsigned type".  Actually, you still agree to use unsigned type when 
we need their last bit, otherwise prefer signed:

>  # b4_int_type(MIN, MAX)
>  # ---------------------
> -# Return the smallest int type able to handle numbers ranging from
> -# MIN to MAX (included).
> +# Return a narrow int type able to handle integers ranging from MIN
> +# to MAX (included) in portable C code.  Assume MIN and MAX fall in
> +# 'int' range.
>  m4_define([b4_int_type],
> -[m4_if(b4_ints_in($@,      [0],   [255]), [1], [unsigned char],
> -       b4_ints_in($@,   [-128],   [127]), [1], [signed char],
> +[m4_if(b4_ints_in($@,      [0],   [127]), [1], [char],
> +       b4_ints_in($@,   [-127],   [127]), [1], [signed char],
> +       b4_ints_in($@,      [0],   [255]), [1], [unsigned char],
>  
> +       b4_ints_in($@, [-32767], [32767]), [1], [short],
>         b4_ints_in($@,      [0], [65535]), [1], [unsigned short],
> -       b4_ints_in($@, [-32768], [32767]), [1], [short],
> -
> -       m4_eval([0 <= $1]),                [1], [unsigned],
>  
>                                                 [int])])

and I'm fine with this.  I don't understand why you shy away from -128 and 
-32768 though.  I would have understood 0..254 as a means to "reserve" -1, but 
I don't understand -127..127.

For the record, Bison's own parser has currently 166 states: its tables can fit 
in 8 bits, and it's a good thing that we do it.

> In the past, Bison has gotten away with using uint8_fast_t and uint16_fast_t 
> for storing indexes, because on machines with 32-bit int (which is the 
> minimum supported by GNU) these types typically promote to 'int' and so avoid 
> most of the signed-vs-unsigned confusion. But if Bison starts using 32-bit 
> unsigned types it won't have this luxury since these types won't promote to 
> 'int'.

I don't want to use 'unsigned int', I fully agree.  That was one of the changes 
in https://lists.gnu.org/archive/html/bug-bison/2019-09/msg00027.html.

> And anyway, Bison shouldn't assume that uint8_fast_t etc. promote to 'int', 
> because the C standard doesn't guarantee that.
> 
> So, I suggest using signed types for indexes in Bison templates; please see 
> the attached patch. I haven't checked for compatibility between this and your 
> proposed patches, but I assume that's straightforward.

I have installed my commits as published, and then yours.  For some reason git 
am failed, I had to use patch, but after a few conflict resolutions, I think I 
had your patch properly adjusted.

Let's start from this to finish the changes wrt state types.

Apparently like Kaz, I like to name my types *_t, so I'd be happy with 
yy_state_t (scalars) and yy_state_least_t/yy_small_state_t (arrays) instead of 
int and yy_state_num currently.  But we've had disagreements on this regard, 
yet I don't know where you stand today (given that here, we're kind of 
protected by yy_*: POSIX/ISO etc. would be really bad people to start using 
them.)

Cheers!


reply via email to

[Prev in Thread] Current Thread [Next in Thread]