[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal: block-based vector allocator

From: Dmitry Antipov
Subject: Re: Proposal: block-based vector allocator
Date: Sun, 11 Dec 2011 17:18:09 +0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:8.0) Gecko/20111115 Thunderbird/8.0

On 12/10/2011 01:04 AM, Stefan Monnier wrote:

BTW, your code seems to compute the free list index by dividing by
word_size (= 4 on 32bit systems) rather than by 8, so half of the slots
will always be unused, IIUC.

You're correct, and this is fixed.

Let us how it turns out.

It needs to be run again since I introduced some fixes, including
those that suggested by you.

If it's an arbitrary choice, it should not be called "page_size" but
"vector_block_size" or some such.  Otherwise, we should fetch its value
from the OS.

Better name is OK (old one was a trace of mmap()-related experiments :-).

Where do you make use of this alignment?

Not needed any more.

This test looks dangerous (we if we allocate a large vector of this

It's impossible because VECTOR_BLOCK_BYTES is 'exact-fit' for the largest
possible block-allocated vector. That's why I'm using xmalloc/mem_insert
and not lisp_malloc: VECTOR_BLOCK_SIZE bytes are allocated for the block,
but only block->data..block->data + VECTOR_BLOCK_BYTES range is marked as
vector memory within mem_node tree. So, any MEM_TYPE_VECTORLIKE mem_node
with VECTOR_BLOCK_BYTES size memory range is guaranteed to match the
vector block, and this vector block may contain the only vector of

Hmm... I didn't think of it, but this is a disadvantage of
non-segregated blocks: during conservative scanning, you need to scan
the vector block to determine if we really have a live_vector_p whereas
segregated blocks would just divide the offset by the block's
vector size.

Yes, but this linear scan may be optimized:
 1) scan is required only if examined pointer is less than
    block bump allocation pointer;
 2) scan should not look at addresses beyond examined pointer.

For small enough vector blocks, it's probably not a big deal, but for
10-slot vectors on 32bit systems with 4KB blocks, that's about
100 iterations.

This is a worst-case scenario when it's needed to scan up to the end of vector
block, and the block is full of small vectors. Although the most vectors
are small, small vectors usually allocated and died (becomes unreachable)
in groups, and such a group of adjacent died vectors is a subject for 
which reduces the number of vectors in the block. For the byte-compile
benchmark, this loop does ~12 iterations on the average (for 64-bit code; expect
larger value for 32-bit since average vector size is ~2x smaller).

BTW, what do you think about hooking buffer allocation into common vector
allocation code? I found it controversial (buffers are mid-sized vector-like
objects, so it might be reasonable to generalize buffer allocation with
other vectors; on the other side, buffers needs special treatment for
liveness detection and special processing during sweep), but want to try


Attachment: vector_alloc.patch
Description: Text document

reply via email to

[Prev in Thread] Current Thread [Next in Thread]