Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `

gnu-arch-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `

From:	Denys Duchier
Subject:	Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
Date:	Thu, 21 Apr 2005 02:05:22 +0200
User-agent:	Gnus/5.110003 (No Gnus v0.3) Emacs/21.4 (gnu/linux)

Tom Lord <address@hidden> writes:

> Thank you for your experiment.

you are welcome.

> I think that to a large extent you are seeing artifacts
> of the questionable trade-offs that (reports tell me) the
> ext* filesystems make.   With a different filesystem, the 
> results would be very different.

No, this is not the only thing that we observe.  For example, here are the
reports for the following two experiments:

Indexing method = [2]

Max keys at level  0:     256
Max keys at level  1:     108
Total number of dirs:     257
Total number of keys:   21662
Disk footprint      :    1.8M

Indexing method = [4 4]

Max keys at level  0:   18474
Max keys at level  1:       5
Max keys at level  2:       1
Total number of dirs:   40137
Total number of keys:   21662
Disk footprint      :    157M

Notice the huge number of directories in the second experiment and they don't
help at all in performing discrimination.

> I'm imagining a blob database containing may revisions of the linux
> kernel.  It will contain millions of blobs.

It is very easy to write code that uses an adaptive discrimination method
(i.e. when a directory becomes too full, introduce an additional level of
discrimination and rehash).  In fact I have code that does that (rehashing if
the size of a leaf directory exceed 256), but the [2] method above doesn't even
need it even though it has 21662 keys.

Just in case there is some interest, I attach below the python scripts which I
used for my experiments:

To create an indexed archive:

        python build.py SRC DST N1 ... Nk

where SRC is the root directory of the tree to be indexed, and DST names the
root directory of the indexed archive to be created.  N1 through Nk are integers
that each indicate how many chars to chop off the key to create the next level
indexing key.

        python info.py DST

collects and then prints out statistics about an indexed archive.

For example, the invocation that relates to your original proposal would be:

        python build.py /usr/src/linux store 4 4
        python info.py store

build.py
Description: script to build an indexed archive

info.py
Description: script to print statistics about an indexed archive

Cheers,

PS: I should mention again, that my indexed archives only contain empty files
because I am only interested in measuring overhead.

-- 
Dr. Denys Duchier - IRI & LIFL - CNRS, Lille, France
AIM: duchierdenys

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gnu-arch-users] Re: [GNU-arch-dev] Re: [ANNOUNCEMENT] /Arch/ embraces `git', (continued)

Prev by Date: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
Next by Date: Re: [Gnu-arch-users] getting the diff represented by a revision name
Previous by thread: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
Next by thread: Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
Index(es):
- Date
- Thread