emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Quesition about Lisp_Vector


From: Stephen J. Turnbull
Subject: Re: Quesition about Lisp_Vector
Date: Sat, 12 Nov 2011 12:42:42 +0900

Ken Raeburn writes:

 > This is a common idiom for handling a structure with a
 > variable-sized array at the end.

More precisely, "a concrete struct moduling a (abstract) structure
containing exactly one array whose size is variable at compile time".
It's always possible to move one array to the end of the struct, but
this only works well if there's just one.  (If there's more than one,
I would consider the cognitive load of figuring out which one to
optimize and then later remembering that decision too great to be
worth it.)

 > The latest spec for the C language has support built in for a
 > slightly different mechanism, but this scheme used above has been a
 > common way to get around the absence of variable-sized arrays in
 > structures in traditional C for

That is to say, it's an optimization hack.  It saves the space of one
pointer, and the time of one pointer dereference (plus possible
allocation overhead), versus a member that is a pointer to an array
allocated elsewhere.  The alternative of a fixed-size array would be
unacceptable in Lisp, but obviously in most cases you'd end up with a
huge amount of space wasted since the size distribution of arrays very
likely follows a power law, so the fixed size "should" be "big".

 > a long time, and it works on most if not all C compilers.

clang (llvm) warns about accesses with a constant index greater than
zero, but it works.

 > The "len-1" means we want an array with "len" elements in it, but
 > the first element is covered by "sizeof *p" because "struct
 > Lisp_Vector" includes one array element already.  If you use an
 > index greater than zero, the value referenced may be stored in
 > extra padding space at the end of the structure, if there is any,

Ah, now that interests me.  If there actually is such space, most
constant index accesses are to a[1], and in theory we could allocate
array space to fill out the padding.  However, since the structure
alignment requirement is usually no more than sizeof(pointer), which
is the same as sizeof(Lisp_Object), and IIRC all of the Lisp_Vectors
warned about are general (ie, the elements are Lisp_Objects), we
probably don't buy anything.

I wonder if there is any real loss to allocating two or three elements
in the base structure (ie, how often vectors of length less than three
or four are used).

Aside to OP: In Emacs Lisp although strings (ie, arrays with bytes as
elements and a non-uniform indexing scheme) are vectors in the API,
they don't share an implementation with general vectors.  In fact the
string storage is allocated separately (another optimization, because
strings with len(data) % pointersize != 0 would waste substantial
amounts of memory, at least by 1980 standards), so there's always a
pointer indirection.  This usually is not too inefficient since many
operations on strings are of the form "get string data, do O(len)
operation on it", so the overhead of a pointer dereference to "get
string data" is tolerable.

 > If you use a compiler or interpreter designed to run C code with
 > extensive, very pedantic run-time checks, it *might* notice if the
 > array index used is greater than 0 and warn you.  But under pretty
 > much any normal compiler this should just work without any
 > complaint.

I guess you would consider clang "perverse" then.  I agree, but don't
care:  it saves about 40% of compilation time, and 10% of total build
time for me. :-)

It would also be possible to do dataflow analysis on loops and
determine that for non-degenerate termination values there would be an
access to elements after the declared end.  Ie,

    for (i=0; i < n; i++) foo(lispvec.data[i]);

will be flagged for n > 1.  Since there's no real point to a `for'
loop if n <= 1 always, issue a warning.  This would be stupid in C, of
course, because this idiom is very common (and typically used to
implement a structure that enables the program to check bounds etc at
runtime, as Emacs Lisp does).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]