bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6524: RFC: modules for generic unordered sets and mappings


From: Jim Meyering
Subject: bug#6524: RFC: modules for generic unordered sets and mappings
Date: Fri, 02 Jul 2010 10:21:25 +0200

Eric Blake wrote:

> On 07/01/2010 06:44 PM, Paul Eggert wrote:
>> I've just gone through "du", and it's fresh in my
>> mind, so I thought I'd run through what I needed there, to see how
>> well it maps to the proposal.
>>
>> I needed three kinds of hash tables.
>>
>> * Table A is indexed by device number (a dev_t value) and the value is a
>>   secondary hash table.  Typically the number of devices is relatively
>>   small.
>>
>> * Table B is indexed by inode number (an ino_t value) and has no value.
>>   All that matters is whether the key is present.  So it is a set,
>>   not a general mapping.  Often there are many, many inode numbers.
>>
>> * The gnulib hash package uses void * keys, but having a void * value
>>   that exists only to point to an ino_t value wastes memory.  So, if
>>   an inode number is small (less than M, say), it maps to itself.  If
>>   it is large, it is a key into a third table (table C) whose values
>>   are mapped inode numbers.  (Values in table C are always M or
>>   greater.)  The mapped inode number (whether taken from table C, or
>>   used directly) is cast to void * and then used as the index into
>>   table B.  In practice, most inode numbers are less than M so this is
>>   a storage win.
>
> We need to be careful on cygwin.
>
> $ ls -i | head
>  1125899907178954 ABOUT-NLS
> 28147497671067760 AUTHORS
>  1970324837308495 COPYING
>  9851624184873962 ChangeLog
>  4785074604197847 ChangeLog-2005
>  1970324837180710 ChangeLog-2006
>  1970324837078885 ChangeLog-2007
>  2251799813904421 ChangeLog-2008
>   281474977732635 GNUmakefile
>  1125899907781258 GNUmakefile~
>
> sizeof(void*) == sizeof(size_t) == 4, but sizeof(ino_t) == 8, and most
> inodes are quite randomly dispersed but definitely larger than 4 bytes.
>  Does your scheme work well at mapping cygwin's 8-byte inodes into
> 4-byte pointers?

It's better not to modify algorithms in order to optimize
for systems with 4-byte pointers.

We can find/use a set/hash package that can efficiently
handle keys of type uintmax_t regardless of pointer size.

Combine that with a slight re-design suggested by Paul
and (before him, privately) by Tim Waugh -- to map device
numbers to inode hash tables rather than to small integers
that need to be encoded -- and you get all the benefits
with much less complexity.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]