[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RFC: modules for generic unordered sets and mappings
From: |
Jim Meyering |
Subject: |
Re: RFC: modules for generic unordered sets and mappings |
Date: |
Fri, 02 Jul 2010 10:21:25 +0200 |
Eric Blake wrote:
> On 07/01/2010 06:44 PM, Paul Eggert wrote:
>> I've just gone through "du", and it's fresh in my
>> mind, so I thought I'd run through what I needed there, to see how
>> well it maps to the proposal.
>>
>> I needed three kinds of hash tables.
>>
>> * Table A is indexed by device number (a dev_t value) and the value is a
>> secondary hash table. Typically the number of devices is relatively
>> small.
>>
>> * Table B is indexed by inode number (an ino_t value) and has no value.
>> All that matters is whether the key is present. So it is a set,
>> not a general mapping. Often there are many, many inode numbers.
>>
>> * The gnulib hash package uses void * keys, but having a void * value
>> that exists only to point to an ino_t value wastes memory. So, if
>> an inode number is small (less than M, say), it maps to itself. If
>> it is large, it is a key into a third table (table C) whose values
>> are mapped inode numbers. (Values in table C are always M or
>> greater.) The mapped inode number (whether taken from table C, or
>> used directly) is cast to void * and then used as the index into
>> table B. In practice, most inode numbers are less than M so this is
>> a storage win.
>
> We need to be careful on cygwin.
>
> $ ls -i | head
> 1125899907178954 ABOUT-NLS
> 28147497671067760 AUTHORS
> 1970324837308495 COPYING
> 9851624184873962 ChangeLog
> 4785074604197847 ChangeLog-2005
> 1970324837180710 ChangeLog-2006
> 1970324837078885 ChangeLog-2007
> 2251799813904421 ChangeLog-2008
> 281474977732635 GNUmakefile
> 1125899907781258 GNUmakefile~
>
> sizeof(void*) == sizeof(size_t) == 4, but sizeof(ino_t) == 8, and most
> inodes are quite randomly dispersed but definitely larger than 4 bytes.
> Does your scheme work well at mapping cygwin's 8-byte inodes into
> 4-byte pointers?
It's better not to modify algorithms in order to optimize
for systems with 4-byte pointers.
We can find/use a set/hash package that can efficiently
handle keys of type uintmax_t regardless of pointer size.
Combine that with a slight re-design suggested by Paul
and (before him, privately) by Tim Waugh -- to map device
numbers to inode hash tables rather than to small integers
that need to be encoded -- and you get all the benefits
with much less complexity.