The cost (CPU or memory, it doesn't matter much here, and we ignore
fragmentation) of allocating a vector is something like:
and CPU costs alike, ignores fragmentation):
- outside of a vector-block: malloc + mem_node
- inside a vector-block: vector-block + frac * (malloc + mem_node + a-bit-more)
where "frac" is a fraction that's more or less vector-size /
VECTOR_BLOCK_BYTES.
So for small vectors, "frac" is small and since we expect the
vector-block overhead to be significantly smaller than malloc+mem_node
we win. But past a certain value of "frac", we're better off allocating
outside of a vector-block. As I said, I don't know exactly where that
threshold is, but using VECTOR_BLOCK_BYTES / 2 should be good enough.