emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal: immediate strings


From: Stefan Monnier
Subject: Re: Proposal: immediate strings
Date: Thu, 24 May 2012 01:17:31 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1.50 (gnu/linux)

> +#if __GNUC__ > 1 /* Any GCC still in use should support this.  */
> +#define PACKED __attribute__((packed))
> +#else
> +#define PACKED
> +#endif

Let's not use "packed" structures: too much trouble.

> +      if (nbytes >= STRING_IMM_MAX)
> +     /* Impossible immediate string.  */

Actually, we can allocate all the pure strings as "immediate strings"
(within the limits of imm.nbytes, of course), in which case we'd have
strings with more bytes than STRING_IMM_MAX, so while this assertion may
have been useful for debugging, we'll want to get rid of it.

> +      if (nbytes < STRING_IMM_MAX)
> +     /* Impossible normal string.  */
> +     abort ();

Please use "eassert (nbytes >= STRING_IMM_MAX);" instead.

> +#ifdef GC_STRING_STATS
> +  total_imm_strings = total_dat_strings = 0;
> +  total_imm_bytes = total_dat_bytes = 0;
> +  total_imm_intervals = total_dat_intervals = 0;
> +#endif

Have you looked at some of those stats?  I'd be interested to know which
proportion of the strings are "immediate and using intervals",
v.s. "just slightly too big to be immediate, but not using intervals".

> +             /* Fill data with special pattern. Used by
> +                GC to find dead immediate strings.  */
> +             memset (s->u.imm.data, 0xff, STRING_IMM_MAX);

Is that really 100% reliable?  Why not use the `intervals' field with
a 0xdeadbeef pointer instead?

> +#define SDATA(string)                (XSTRING (string)->u.imm.immbit ? \
> +                              (XSTRING (string)->u.imm.data + 0) : \
> +                              (XSTRING (string)->u.dat.data + 0))

IIUC the "+ 0" are unneeded here, because using a "..?..:.." already
makes sure we have an rvalue.  Same for SCHARS.

> +   This assumes that sizeof (EMACS_INT) is equal to sizeof (void * ).  */
> +#define STRING_IMM_MAX (3 * sizeof (EMACS_INT) - 2)

Miles already pointed out that this is not a valid assumption.

>    do { if (EQ (STR, empty_multibyte_string))  \
>        (STR) = empty_unibyte_string;  \
> -    else XSTRING (STR)->size_byte = -1; } while (0)
> +    else if (XSTRING (STR)->u.imm.immbit) \
> +      XSTRING (STR)->u.imm.size_byte = -1; \
> +    else \
> +      XSTRING (STR)->u.dat.size_byte = -1; } while (0)

BTW, TAB in cc-mode will not only properly indent the code, but also
nicely align the \ at the end of lines.  ;-)

> +    union {
> +
> +      /* GC mark bit and subtype bit are in IMM just by convention - when
> +      IMMBIT is 0, the DAT field is used except it's UNUSED field.  */
> +
> +      struct {
> +     unsigned gcmarkbit : 1;
> +     unsigned immbit : 1;
> +     EMACS_INT size : 7;
> +     EMACS_INT size_byte : 7;
> +     unsigned char data[STRING_IMM_MAX];
> +      } PACKED imm;
> +      
> +      struct {
> +     unsigned unused : 2; /* Do not access this placeholder.  */
> +     EMACS_INT size : BITS_PER_EMACS_INT - 1;
> +     EMACS_INT size_byte : BITS_PER_EMACS_INT - 1;
> +     unsigned char *data;
> +      } PACKED dat;
> +
> +    } u;
>    };

Access to your `size' field is inefficient because it will straddle
two words.  We could move the gcmarkbit as Paul suggests (at the cost of
having to check immbit before knowing where gcmarkbit is located), or
maybe we can move both the mark bit and the immbit into the `intervals'
fields (after all, we know those are aligned on a multiple of at least
4 on all architectures on which Emacs is known to run, so we have
2 bits free for (ab)use there).


        Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]