monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] net.venge.monotone.experiment.performance


From: Eric Anderson
Subject: Re: [Monotone-devel] net.venge.monotone.experiment.performance
Date: Wed, 2 Aug 2006 17:38:58 -0700

Nathaniel Smith writes:
 > On Tue, Aug 01, 2006 at 12:39:05AM -0700, Eric Anderson wrote:
 > > Suitable for mainline:
 > >   eddb7e59361efeb8d9300ba0ddd7483272565097:
 > >     Make an upper bound on the amount of memory that will be consumed 
 > > during
 > >     a single commit.  Right now a commit will keep all of the compressed 
 > >     differences in memory, which is not a good thing on a big import of
 > >     an existing project.  Patch limits the amount stored in memory to 16MB,
 > >     has no effect on sync because sync is flushing every 1MB.
 > >     Detailed performance improvement included at the bottom since I forgot
 > >     to include it in the commit message.
 > 
 > [ don't use fprintf ]

Fixed in 033e5805502c5441ef539ab5cbc1a497e9930b00, as an update to the
above to make taking just the diff for those pieces easier.

 > Could you say a few more words to convince me of the correctness of
 > your approach?  I don't totally understand the existing pending write
 > stuff to comment more knowledgeably, but sqlite already has
 > bounded-memory write buffering, so if we're not using it we probably
 > have some reason, and this code makes it so we do silently use it in
 > some cases, and not in others. 

The change to buffer pending writes was introduced in
54bba40d63b8f4bd8103cf1049b3045a162e540a with a comment that it gives
about 20% performance gain on initial pull.  This is because (I
assume) 1) it gets to batch together multiple updates to the database,
2) because anything that gets canceled before being written is almost
free, and 3) multiple writes of the same thing can be handled faster
in the pending write map than in the database.

I'm not sure what you mean about "this code makes it so we do silently
use it in some cases, and not in others".  The patch makes it so that
we won't ever buffer more than constants::db_max_pending_writes_bytes
in memory.

 >  For a more specific worrisome
 > instance, it doesn't look like cancel_pending_write can possibly
 > fulfill its contract now?  So, does this all work, and if so, why?

Yes, both calls to cancel_pending_write first call have_pending_write,
and if true, call cancel_pending_write and if false call drop().
Before 54bba40d63b8f4bd8103cf1049b3045a162e540a all that happened was
we just called drop immediately.  So all that is happening is that if
you ever try to write more than db_max_pending_writes_bytes of data
the code may revert back to the behavior pre
54bba40d63b8f4bd8103cf1049b3045a162e540a.  My guess is that it will be
very rare that the memory bounding code is used, but if someone is
committing a huge tree (for example an import), I don't think you want
monotone to try to keep the entire compressed import in memory.

 > >   4e99cc37f548b5884d63c48bc486dfe98c8d0bd2:
 > >     Add support for expidited parsing of rosters during annotation.  
 > >     Also skip verification of SHA1 hashes only during annotation.
 > >     Worth 5-20x speedup on annotation, but the faster parsing code may
 > >     not succeed on all rosters that the standard code should parse.  I 
 > >     belive the faster parsing code will abort in any case where it might
 > >     do the wrong thing.
 > 
 > Do you have any measurements of how much of that gain is due to which
 > of the optimizations?  Each half has different hurdles to overcome to
 > make it into mainline, so it'd be nice to be able to prioritize them
 > separately...

Most of the benefit is from the faster parsing code.  I don't believe
that you'd see any significant benefit of skipping the SHA1 hashes
until you get the faster parsing.  Once the faster parsing code is in,
IIRC annotating Makefile.am took around 20 seconds (down from 195), at
that point disabling the SHA1 hashes got it down to 8.
        -Eric




reply via email to

[Prev in Thread] Current Thread [Next in Thread]