tinycc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] parsing 0x1e+1 as 0x1e +1


From: Vincent Lefevre
Subject: Re: [Tinycc-devel] parsing 0x1e+1 as 0x1e +1
Date: Wed, 27 Apr 2016 17:50:22 +0200
User-agent: Mutt/1.6.0-6623-vl-r87826 (2016-04-14)

On 2016-04-27 18:21:11 +0300, Sergey Korshunoff wrote:
> > CompCert 2.4 outputs 0x10 0x1e (following its interpretation of 6.4.8)
> > though if the user expects a subtraction in both cases, he probably
> > expects that E yields the same value in both cases
> 
> pcc outputs 0x10 0x1e too.

Note that this is fixed in CompCert 2.5, as the authors now agree that
the previous interpretation was wrong.

> But tcc with first patch outputs 0x10 0x1d (as user expects)

This may be the most intuitive behavior, but not conforming to the
ISO C standard. 0x1e-E is a preprocessing token (Clause 6.4.8), more
precisely a pp-number, and "each preprocessing token is converted
into a token" (Clause 5.1.1.2). Since the token is invalid, one gets
an error. Initially, CompCert didn't take Clause 5.1.1.2 into account
here: the pp-number 0x1e-E was further parsed (*after* preprocessing)
into 3 tokens 0x1e, - and E; hence the result 0x1e.

One may find the rules for pp-number awkward, but they have been
designed on purpose. According to the C rationale:

  The notion of preprocessing numbers was introduced to simplify the
  description of preprocessing. It provides a means of talking about
  the tokenization of strings that look like numbers, or initial
  substrings of numbers, prior to their semantic interpretation. In
  the interests of keeping the description simple, occasional spurious
  forms are scanned as preprocessing numbers. For example, 0x123E+1 is
  a single token under the rules. The C89 Committee felt that it was
  better to tolerate such anomalies than burden the preprocessor with
  a more exact, and exacting, lexical specification. It felt that this
  anomaly was no worse than the principle under which the characters
  a+++++b are tokenized as a ++ ++ + b (an invalid expression), even
  though the tokenization a ++ + ++ b would yield a syntactically
  correct expression. In both cases, exercise of reasonable precaution
  in coding style avoids surprises.

and it is also important that things like 1m be seen as a single
preprocessing token (thus a pp-number) so that the following code
yields an identifier:

  #define mkident(s) s ## 1m
  int mkident(int) = 0;

-- 
Vincent Lefèvre <address@hidden> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]