monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: partial pull #3 - calling conventions


From: Matt Johnston
Subject: Re: [Monotone-devel] Re: partial pull #3 - calling conventions
Date: Sun, 27 May 2007 19:22:48 +0800
User-agent: Mutt/1.5.13 (2006-08-11)

On Sat, May 26, 2007 at 10:22:54PM +0200, Christian Ohler wrote:
> Lapo Luchini, 2007-05-26:
> 
> >Space is not the scarce resource here (well, not the most important one,
> >at least, IMHO): time is.
> >Pull time is not only a question of size, it's also (mainly?) a question
> >of the time taken by the multiple hash and signature verifications.
> 
> Ok.  Still, verifying signatures on 10MB worth of data is very likely 
> faster than verifying them on 71MB worth of data.
> 
> Assuming that cryptographic verification is what's really taking too 
> long at pull time, maybe we shouldn't be doing it at pull time? 
> Wouldn't it be possible to defer the verification of each file's hash, 
> each revision's id, each cert's signature etc. until the respective item 
> is accessed for the first time?

It's not the SHA1 or RSA verification that is slow. From a
profile [1] of a local pull of n.v.m* taking ~22 mins wall
time (~14 mins user time):

- Around 1min30 is spent in sha160 hashing. Half of that is
  just checking that the reconstructed file versions that
  get pulled out of the database match what was expected.
- zlib inflate/deflate doesn't show up that much (40 secs
  total perhaps?)
- A couple of minutes seem to be spent on verifying and
  writing out file data/deltas (the delayed write cache
  seems pretty efficient).
- 12 minutes is spent in put_revision(), mostly in roster
  construction (put_roster_for_revision()).
  - get_uncommon_ancestors() takes ~3 minutes. It's mostly
    to do with traversing up long-lived diversions such as
    n.v.m.cvssync* and *.select-heads-of (I think).
  - ~2 minutes are spent writing out the full manifest
    data of every revision, so that we can check that its
    hash matches that specified in the revision.
  - Lots of time seems spent destroying std::maps in
    dir_node and other little memory operations like that.
    (this may be OS-memory-allocation-dependent - profile
    it yourself)
- Do RSA signatures even get checked in a pull? I can't see
  them.

My conclusion is that removing file data will be useful for
people with slow network connections, but not for speeding
up netsync generally. Optimisation of the revision->roster
operations seems like it would be fairly beneficial.  They
do count as "consistency checking", but not in the
cryptographic sense. I'm not really sure how the
revision->roster checking could be delayed at pull time,
since the current head revision (most likely to be of
interest) depends on all the previous revision that have
been received.

I'm also curious why the pull process was only active for
2/3 of the pull time - possibly the netsync protocol could
be pipelined better.

Matt


[1] 1.83ghz core duo macbook, Mac OS X 10.4.9, profiled
using Shark statisticaly sampling every 60 ms.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]