[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: hash_rehash allocatino failure
From: |
Jim Meyering |
Subject: |
Re: hash_rehash allocatino failure |
Date: |
Wed, 17 Jun 2009 23:43:12 +0200 |
Eric Blake wrote:
> Jim Meyering <jim <at> meyering.net> writes:
>> Eric Blake wrote:
>> > + /* Guarantee that once we start moving entries into the new table,
>> > + we will not need any further memory allocations. We only need
>> > + new allocations if the hasher consolidates multiple buckets from
>> > + the old size into one bucket with the new size (a sign of a bad
>> > + hasher, but we must deal with it). Therefore, we need merely
>> > + check that there are at least table->n_buckets_used-1 entries on
>> > + the free entry list for the worst case collision (everything gets
>> > + combined into one bucket during the rehash). */
>>
>> That strikes me as pessimistic.
>> Imagine that we have a full table, with n_buckets_used == 20
>> and n_buckets == 23.
>>
>> When rehashing to say 31, odds are very good that we can
>> get by with *no* entries on the free list -- if we're careful.
>> So requiring 22 is a recipe for unnecessary failure.
>
> You are correct that in the common case, we probably won't need any extra free
> entries. And I can prove that in the worst case, we need 19 free entries (not
> 22), when all 20 buckets of the old table hash to the same bucket of the new
Yep. n_buckets_used - 1 is what I meant, not 22.
> (and this would be easy to reproducibly test, by writing a hasher that is
> well-
> behaved for the initial bucket size but returns a constant for the new size).
> So the question is now whether we want to be pessimistic and cause unnecessary
> memory pressure, or optimistic but have more work to undo if our optimism was
> wrong. But it is usually the right approach to cater to the common case, so
> I'm starting to lean towards the optimistic approach with complicated undo if
> we guessed wrong.
>
> My first patch attempt[1] was to provide cleanup on allocation failure; it was
> only my second version that switched over to the pessimistic version of
> guaranteeing no possibility of allocation failure to avoid the need for
> cleanup
> in the first place. The algorithm you describe below rather closely resembles
> my first version (unfortunately, my first attempt is not posted in any public
> repo at the moment).
>
> [1] http://article.gmane.org/gmane.comp.lib.gnulib.bugs/17849
>
>>
>> Right after your initial report, I began rewriting hash_rehash
>> to work as follows:
>>
>> allocate new table
>>
>> [ optimization to minimize possibility of unnecessary failure:
>> in the old table,
>> liberate any overflow (chained) entries by moving "data"
>> into empty buckets ]
>
> I hadn't thought of that. It frees (n_buckets - n_buckets_used) entries up
> front, leaving less pressure for the rest of the algorithm. But it is
> worthless for my pessimistic algorithm (why? because although you have more
> free entries, you have also increased n_buckets_used by the same amount, so
> you
> have ended up only wasting time in moving data). Even for the optimistic
> approach, it means that we that a failed allocation requires one more pass
> through the original table to undo this effect. And in the optimal case, it
> makes no difference (with a perfect hasher, there are no overflow entries in
> the first place). At this point, I'm inclined to say that it probably won't
> help, unless we can come up with a contrived hasher where we really do see
> higher memory pressure mid-move than with either pre- or post- table, such
> that
> reducing pressure up front would be worthwhile.
I agree. Just an idea, and probably not worthwhile.
>> iterate through old table members, hashing them into the new table,
>>
>> [Possible optimization: process all overflow-entries before
>> any single-entry bucket. But that means two passes.
>> Alternatively: one pass through buckets, but for chains of
>> length > 1, process all allocated entries before the first,
>> non-allocated one, which (hence) might require an allocation
>> in the new table. ]
>
> My first attempt used a full two-pass algorithm for the cleanup case. With
> one
> pass, the allocations and frees are intermingled, with two passes all frees
> occur before any allocations. However, I have been unable (so far) to
> contrive
> any such hasher that would show a difference in worst-case pressure with the
> intermingled operation. So, for better cache locality, I'd rather avoid two
> passes in the common case. But we definitely want to code things to move the
> overflow first and bucket head last within each bucket, as that is a trivial
> rewrite with obvious benefits.
I agree. Two-pass sounds expensive.
>> If, in spite of all that, allocation is required, restore to
>> the original table any entries that we've moved into the new one
>> and free the new one and return false. However, restoring them
>> may be tricky, so we'd have to be careful there, too.
>
> I think I've convinced myself that recovery is always possible without further
> allocation, although I'm still unsure on whether there is ever anything where
> the cleanup would require a two-pass algorithm because the one-pass approach
> has higher pressure.
>
>>
>> This approach has the added advantage of decreasing the memory
>> footprint slightly, since we're more likely to reuse existing
>> allocated entries than to allocate new ones that would eventually
>> end up on the free list.
>
> It decreases the memory footprint of the table, but does increase the code
> size. But that's the price of avoiding worst-case pessimism (and besides, a
> code size increase of even a few k is probably much smaller than the cost of
> pessimistically overallocating, where real world resizing will probably be in
> the millions of hash table entries before memory pressure is an issue).
>
> Do you want me to resurrect the bits of my first patch, and submit something
> that recovers from failure rather than preventing it?
Sure! Thanks.
- Re: [PATCH] hash: declare some functions with the warn_unused_result attribute, (continued)
- Re: [PATCH] hash: declare some functions with the warn_unused_result attribute, Eric Blake, 2009/06/16
- Re: [PATCH] hash: declare some functions with the warn_unused_result attribute, Ben Pfaff, 2009/06/16
- Re: [PATCH] hash: declare some functions with the warn_unused_result attribute, Jim Meyering, 2009/06/16
- Re: [PATCH] hash: declare some functions with the warn_unused_result attribute, Eric Blake, 2009/06/16
- alt hash implementation (was: [PATCH] hash: declare some functions with the warn_unused_result attribute), Eric Blake, 2009/06/17
- Re: alt hash implementation, Jim Meyering, 2009/06/17
- Re: alt hash implementation, Eric Blake, 2009/06/17
- Re: [PATCH] hash: declare some functions with the warn_unused_result attribute, Eric Blake, 2009/06/18
- Re: [PATCH] hash: declare some functions with the warn_unused_result attribute, Jim Meyering, 2009/06/17
- hash_rehash allocatino failure (was: [PATCH] hash: declare some functions with the warn_unused_result attribute), Eric Blake, 2009/06/17
- Re: hash_rehash allocatino failure,
Jim Meyering <=
- Re: hash_rehash allocatino failure, Eric Blake, 2009/06/18
- Re: hash_rehash allocation failure, Eric Blake, 2009/06/19
- hash resizing bug (was: [PATCH] hash: declare some functions with the warn_unused_result attribute), Eric Blake, 2009/06/17
- Re: hash resizing bug, Eric Blake, 2009/06/17
- Re: hash resizing bug, Eric Blake, 2009/06/17
- Re: hash resizing bug, Jim Meyering, 2009/06/17
- Re: hash resizing bug, Eric Blake, 2009/06/17
- Re: hash resizing bug, Jim Meyering, 2009/06/17
- Re: hash resizing bug, Eric Blake, 2009/06/17
- Re: hash resizing bug, Jim Meyering, 2009/06/18