bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: unions and aliasing (was: Re: 'signbit' patch to use 'copysign' if a


From: Bruno Haible
Subject: Re: unions and aliasing (was: Re: 'signbit' patch to use 'copysign' if available)
Date: Tue, 10 Apr 2007 23:05:35 +0200
User-agent: KMail/1.5.4

Hello Paul,

> >> Normally the 'signbit' implementation relies on undefined behavior, as
> >> it accesses the "wrong" member of a union
> >
> > This is the point of a 'union'. The code is not casting pointers; it is
> > using 'union's for the purpose they were defined for.
> 
> Nevertheless, the behavior is undefined (see section 6.2.6 of C99),
> and optimizing compilers are free to substitute other behavior.  The
> code might work in the Autoconf test, but fail in other uses.  This
> sort of thing has caused real problems in the Linux kernel, and it's
> why the Linux folks compile with special GCC options.

In C99 6.2.6.1.(4) there is a definition of the term "object representation",
together with a guarantee that memcpy() can be used to extract it.

In C99 6.5.(7) there is text which explicitly allows this kind of code:

   union { long double x; int y[6]; } m;
   m.x = ...;
   return m.y[0];

In C99 6.7.2.1.(14) there is text that allows to cast a pointer to a member
of the union to a pointer to the union itself, and vice versa. Hence:

   union { long double x; int y[6]; } m;

implies

   (void*) &m.x == (void*) &m == (void*) &m.y.

So I don't even buy that the standard is muddy on this area. It looks rather
like some implementations (including GCC) may not implement 6.5.(7) item 5
("an aggregate or union type ...") correctly.

In practice I can see three indications that the use of unions in signbit*.c
is OK:

1) In http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm
   there is a single defect report about aliasing, and it contains the text:
     "The current situation requires more consideration, but general consensus
      seems to be;
      * Limit the use of pointers to union members,
      * Consensus for the visible alias rule exists,
      * The requirement of global knowledge is problematic,
      * Common understanding is that the union declaration must be visible in
        the translation unit."

2) This web page "Understanding Strict Aliasing"
     
http://www.cellperformance.com/mike_acton/2006/06/understanding_strict_aliasing.html
   says:
     "Strictly speaking, reading a member of a union different from the one
      written to is undefined in ANSI/ISO C99 except in the special case of
      type-punning to a char*, similar to the example below: Casting to char*.
      However, it is an extremely common idiom and is well-supported by all
      major compilers. As a practical matter, reading and writing to any
      member of a union, in any order, is acceptable practice."

3) The gcc documentation is clear about it too:
    "Pay special attention to code like this:
          union a_union {
            int i;
            double d;
          };

          int f() {
            a_union t;
            t.d = 3.0;
            return t.i;
          }
     The practice of reading from a different union member than the one
     most recently written to (called "type-punning") is common.  Even
     with `-fstrict-aliasing', type-punning is allowed, provided the
     memory is accessed through the union type.  So, the code above
     will work as expected."

Do you have an indication that the problems faced in the Linux kernel
apply also to the case of references through the union variable, or only
when using pointers to the union members?

> I am worried about compilers that optimize away references to memory,
> as the C standard entitles them to.

I'm not worried by it, because if a compiler does this optimization, the
unit test will catch it.

> Come to think of it, perhaps a better approach is to replace 'unsigned
> int' with 'unsigned char' in the implementation of signbit, of course
> redoing all the macros accordingly.  That would remove the objection
> of undefined behavior.

This can be a good fallback strategy. But given the evidence that the code
in place is fine (see above), and that 'unsigned int' accesses are an epsilon
more efficient than 'unsigned char' accesses, I prefer to challenge the future
:-)

Bruno





reply via email to

[Prev in Thread] Current Thread [Next in Thread]