[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hash: declare some functions with the warn_unused_result att

From: Jim Meyering
Subject: Re: [PATCH] hash: declare some functions with the warn_unused_result attribute
Date: Tue, 16 Jun 2009 18:36:57 +0200

Eric Blake wrote:
> Jim Meyering <jim <at> meyering.net> writes:
> My code inspection didn't turn up obvious problems, but I have not yet tested
> with USE_OBSTACK either, so I wouldn't be surprised to find bit rot.  If we do
> get it working, would it be worth making the use of obstacks a runtime 
> decision
> via hash_tuning, rather than a compile-time decision?

Code size is one potential reason not to make it a run-time decision.
Consider an application that doesn't otherwise use obstacks and whose
hash table usage is unlikely to benefit from it.  All they'd get is
the code size penalty.

> There might also be alternative implementations possible with better
> performance.  For example, instead of malloc'ing free entries one at a time 
> and
> tracking bucket overflows as pointers to malloc'd objects, we could instead
> malloc/realloc an array of overflows, with bucket overflows tracked as indices
> into that array.  This would provide some of the same benefits as obstack
> allocation (no per-entry malloc overhead, and fewer calls to malloc and with
> larger chunks of malloc claimed per call), such that it is no longer necessary
> to use an obstack for free-entry management in the first place.

I've found it useful to have one malloc'd/free'd buffer per entry
when debugging memory corruption problems.  For performance/production,
I would use the obstack-based code, and when testing, I'd compile/test
both with and without USE_OBSTACK.

> Another hash implementation I am familiar with treats the slots array as a
> circular array, and has no overflow linked lists (thus, the entire hash table
> is contained within a single malloc'd block, rather than the hash.c
> implementation of one block for bucket heads and other blocks for collisions).
> Instead, the hasher merely determines where in the circular array to start
> searching, and on collisions, you end up searching until you hit a free slot.
> But this has other ramifications; searching for a match means traversing
> through the array until you encounter an empty slot, and you no longer have 
> the
> guarantee that the comparison function will always be called with two elements
> with the same hash.  Also, whereas linked lists only search the set of
> collisions, a nearly full circular array may end up searching the bulk of the
> table before it arrives at the next free entry.  However, I have never
> benchmarked this implementation to see how it fares when compared with the 
> more
> traditional bucket of linked lists.

Performance-improving patches are always welcome ;-)
However, finding a sufficiently representative set of applications on
which to perform benchmarks may be a challenge, considering the variety
of hash functions and differing sizes of objects in the table.

> We may also want to borrow from Bruno's idioms in the gl_list arena, where
> hash_initialize is implemented as setting up a vtable optimized to an
> appropriate implementation, such that you could then experiment with different
> implementations by swapping the vtable at initialization time.
>> I've begun writing a test module, and may even find the time to test
>> the USE_OBSTACK code, too.  As you say, it also needs to be improved
>> to handle allocation failure, probably the same way remove.c now does:
>>   http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=a5111af33ea6a5d27
> Unrelated to this topic, but in looking at that patch, I noticed you added the
> following line:
>  cleanup:;
> Any reason for the empty statement after the label?

While it appears not to have been the case here, I've sometimes had to
add a semicolon after a label's colon because otherwise the label would
refer to a following declaration, and that's not valid C.  A label must
refer to a "statement", not a declaration, so if I always use a semicolon,
I don't have to think about it.

> Is this worth a syntax check in maint.mk?

Might be hard to get right in code that allows decl-after-stmt.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]