[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#8794: cons_to_long fixes; making 64-bit EMACS_INT the default

From: Eli Zaretskii
Subject: bug#8794: cons_to_long fixes; making 64-bit EMACS_INT the default
Date: Fri, 03 Jun 2011 22:43:55 +0300

> Date: Fri, 03 Jun 2011 10:53:55 -0700
> From: Paul Eggert <address@hidden>
> CC: address@hidden
>   int
>   main (void)
>   {
>     int big = 536870913;
>     int *p = malloc (big * sizeof *p);
>     if (!p)
>       return 1;
>     memset (p, 0xef, big * sizeof *p);
>     printf ("%x %x\n", p[0], p[big - 1]);
>     return 0;
>   }
> On my RHEL 5.6 host, built as a 32-bit executable, this outputs:
>   $ gcc -m32 t.c
>   $ ./a.out
>   efefefef efefefef

How does this work on the machine code level?  Doesn't the code need
to load a pointer to p into a 32-bit register, in order to reference
the array?  On Windows, I see that the GCC-produced code does this:

  movl   $0x20000001,0xfffffffc(%ebp)
  mov    0xfffffffc(%ebp),%eax
  shl    $0x2,%eax

and then uses EAX to reference the array elements.

That last left shift by 2 bits will surely overflow for values of
`big' that are larger that 0x3fffffff (not 0x20000001, the value you
used).  So maybe 2GB is not the limit, but 4GB surely is.  You promise
much more.

> Perhaps you're thinking of pointer subtraction?  That often stops working on
> arrays larger than 2 GiB.  But this is easy to program around.

Well, then we need to program around that, _before_ we promise buffers
larger than 2GB on 32-bit hosts.  E.g., look how we address characters
in buffers:

  /* Address of beginning of buffer.  */
  #define BUF_BEG_ADDR(buf) ((buf)->text->beg)

  /* Return character code of multi-byte form at byte position POS in BUF.
     If POS doesn't point the head of valid multi-byte form, only the byte at
     POS is returned.  No range checking.  */

  #define BUF_FETCH_MULTIBYTE_CHAR(buf, pos)                            \
    (_fetch_multibyte_char_p                                            \
       = (((pos) >= BUF_GPT_BYTE (buf) ? BUF_GAP_SIZE (buf) : 0)        \
          + (pos) + BUF_BEG_ADDR (buf) - BEG_BYTE),                     \
     STRING_CHAR (_fetch_multibyte_char_p))

The pointer arithmetics will wrap around on 32-bit hosts here, because
a pointer is loaded into a 32-bit register before it's dereferenced.
Am I missing something?

> And anyway, even if we assume buffers and strings are all smaller
> than 2 GiB, an EMACS_INT wider than 32 bits is still needed for
> large buffers and strings, due to the tag bits.

I wasn't saying a 64-bit EMACS_INT wasn't an advantage.  It is.  But I
very much doubt that we could have buffers and strings larger than 4GB
on 32-bit hosts.  Your changes to the docs seem to promise much larger
buffers, which I don't think is feasible.

> > The *_MAX macros need limits.h, but I don't see it being included by
> > data.c.  Did I miss something?
> Those are OK because lisp.h includes inttypes.h.  INTMAX_MAX and
> UINTMAX_MAX are defined by inttypes.h (actually, stdint.h, but
> inttypes.h includes stdint.h).

What about ULONG_MAX in this patch to xselect.c:

> -      *data_ret = (unsigned char *) xmalloc (sizeof (long) + 1);
> -      (*data_ret) [sizeof (long)] = 0;
> -      (*(unsigned long **) data_ret) [0] = cons_to_long (obj);
> +      *data_ret = (unsigned char *) xmalloc (sizeof (unsigned long) + 1);
> +      (*data_ret) [sizeof (unsigned long)] = 0;
> +      (*(unsigned long **) data_ret) [0] = cons_to_unsigned (obj, ULONG_MAX);

?  There are also USHRT_MAX, LONG_MAX, CHAR_MAX, and SHRT_MAX there,
but I see no limits.h being included.  How did that compile for you?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]