[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: enum instead of #define for tokens

From: Hans Aberg
Subject: Re: RFC: enum instead of #define for tokens
Date: Thu, 4 Apr 2002 19:06:44 +0200

At 12:11 +0200 2002/04/04, Akim Demaille wrote:
>>> At home, I'm working on moving the engine from using shorts
>>> everything as indices into arrays to using actual pointers.
>Paul> Doesn't this grow the table size by a factor of 4 on 64-bit
>Paul> hosts?  For typical parsers it wouldn't matter too much, but for
>Paul> parsers with large tables this could be a big hit.  In
>Paul> particular, I worry that it might hurt performance for
>Paul> dynamically linked modules, since the dynamic linker might have
>Paul> to relocate all those pointers individually when the module is
>Paul> loaded.  I recall that Ulrich Drepper went through the GNU C
>Paul> library recently, replacing many pointers with integers,
>Paul> precisely to improve performance this way.
>Oh man, don't tell me this :(  I view these changes are really needed
>for Bison.  There are so many different uses of short, that it becomes
>quite unreadable.  It also results in many bizarre indirections via
>arrays in several different places.

Are you speaking about Bison or its generated parser. -- As for Bison
itself, does speed really matter, if it is just a constant factor 4? -- It
is fast enough.

As for the generated parser, that might be needed if one should introduce
LR(1) with table compression.

But apart from that, the Bison sources surely need better typing. -- I
found them quite hard to read myself.

>Paul> I tend to agree, but I also think we're stuck with it, as POSIX
>Paul> requires it and it's extremely common practice.
>POSIX is probably not referring to Unicode anyway.  And IIRC, POSIX
>mandates 257 as first symbol number, so if we move to Unicode
>char-tokens, we are no longer POSIX compliant.  Well, that's my
>understanding, but I'm ready to be corrected.

Perhaps POSIX doesn't deal with Unicode. But somebody said that Linux uses
the correct 2^21 or so Unicode characters.

Anyway, anything for Unicode would probably be an option. Eventually,
though, one will migrate more and more towards Unicode, when that becomes
more ubiquitous.

>Paul> At best we can warn in the documentation that it doesn't work if
>Paul> you change encodings between the Bison run and the cc+runtime
>Paul> runs.
>Or we should find a means not to output the characters as
>shorts/integer, but as the characters themselves.

Is this cross-compilation problem common?

-- I mean, anyone changing between encodings would know about the problem,
so if you haven't heard much about it, people probably know how to work
around it. (The problem is in part in the C/C++ standards, which do not
define platform universal ways to reach characters.)

A warning should suffice, until you start to hear from folks.

  Hans Aberg

reply via email to

[Prev in Thread] Current Thread [Next in Thread]