[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] Insertion speed

From: Christian Grothoff
Subject: Re: [GNUnet-developers] Insertion speed
Date: Fri, 9 Aug 2002 18:11:23 -0500
User-agent: KMail/1.4.1

Hash: SHA1

On Friday 09 August 2002 12:12 pm, Igor Wronsky wrote:
> On Tue, 6 Aug 2002, Niklas Höglund wrote:
> > I've experimented a bit with the insertion speed.  I did a benchmark
> > using different storage methods and file systems to see if I could get
> > gnunet to insert files faster.
> How about the 3 files with names "database.?". Whats their
> part in all this? The actual storage scheme doesn't affect
> how they are handled... hmm, random accessing lots of very
> small blocks might be hard for anything using disk media.

Well, yes. It would be nice to keep them in memory, but that
may not be possible (256 MB). But if we're using them a lot, the OS may cache 
them in memory for us.

> Lets see. You want to access block A. First A is looked up
> from database.1 (and 2 and 3 if not found). Then we
> retrieve it from gdbm/tdbm/etc. This might mean moving the disk
> head even over 4 times around. There is no reason why these
> reads should be near each other on the disk. Process lots of
> blocks. Enter lots disk trashing. And remember, boys and
> girls, moving that head is quite expensive.

Right, but notice that we should really have 99% of all hits in
database.1 and that 2 and 3 should hardly ever be used. Also, the most
frequent type of access is for an inbound query -- and most queries
will fail (not there in database.1) and thus we will then *not* access
gdbm/tdb. This one-shot read in database.1 (even if not cached) should be much 
faster than searching the database (which we can not do anyway because the
query is the triple-hash and we have the double-hash in the database). Also
note that the database.X tells us if the block is in plaintext in some file 
and not at all in the gdbm/tdb database. 

> Do we really need to do it this way?

Yes. But it's really only slow when you insert *and* the kernel can not
cache the writes to database.X in memory. Design wise, it's optimized for the 
common case -- a lookup, not insertion.

> btw, sometimes gnunetd takes a lots of time to start. also
> gnunet-stats, after running gnunetd for long, might take ages
> to produce results. They too seem to be spending their time
> on disk activity. These issues might have a connection.

Not quite. The reason is, that gnunetd counts the number of entries
in the database on startup. And the database APIs only allow a
"'for all entries' - do nothing" approach. We also do a re-count on
gnunet-stats (which we may be able to avoid). A better solution
would be to store the 'count' as a value in the database and keep
it up-to-date at all times --- and never re-count. But that's just a
little piece of code missing that I intended to write on a rainy

Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see


reply via email to

[Prev in Thread] Current Thread [Next in Thread]