monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: [patch] roster-deltas (40% pull speedup)


From: Nathaniel Smith
Subject: Re: [Monotone-devel] Re: [patch] roster-deltas (40% pull speedup)
Date: Wed, 6 Sep 2006 14:46:30 -0700
User-agent: Mutt/1.5.12-2006-07-14

On Wed, Sep 06, 2006 at 01:47:58AM -0700, Graydon Hoare wrote:
> Nathaniel Smith wrote:
> 
> >Thanks for looking it over -- I know you don't have much time these
> >days :-).
> 
> s/have/make/. It'd be dishonest to claim innocence; I'm just not 
> allocating as much time for this stuff anymore.

A philosophical difference, perhaps... if you don't want to do it, you
shouldn't, and people can hardly be blamed for what they find they
prefer to spend time on :-).

> >Well, it's a tricky bit of state tracking.  For instance, at one
> >point, I had converted things so that rosters were indexed by
> >roster_id, but hadn't actually changed their storage format or
> >anything yet.  I was still caching them in the vcache.  I only
> >realized much later that this meant there was a major bug -- because
> >rollback() does not flush the vcache, I could have left stale rosters
> >that never existed in there... this doesn't apply to files in the
> >vcache, because they're indexed by content, but it did to rosters once
> >I switched them over to being indexed by...index.
> 
> Well, I don't immediately see any places you're not handling. It's late 
> and there's an increasing amount of state in this class, so I'm not very 
> confident in that assessment, but I don't know what else to do. I read 
> through the methods and their sub-methods using etags. It seems coherent.
> 
> I mean, you put them in the write-through cache during put_roster, you 
> flush them to disk on commit, you cancel them all on rollback, you 
> consult the cache on get_roster_version and populate it on cache-misses. 
>  Sounds sufficient to me.

'kay; sounds sufficient to me too, but it's always good to get a
sanity check.

> Three points stand out:
> 
>   1. I'm nervous about handing out shared pointers to cache entries.
>      I see they're const and all, so realistically I can't imagine
>      anything that'd go wrong.. it's not like we had anything in
>      the old code to prevent fetching soon-to-be-cancelled writes
>      either, it just seems a bit ominous.
> 
>      Is it possible that the db invalidates a cache entry and some
>      caller will keep the cache entry alive, thinking it's a real
>      roster? If it does, what are the possible consequences?

This is a general problem with transactional stores, right, you might
do BEGIN/write/read/ROLLBACK and the read becomes invalid?  I don't
have anything super useful to say about the problem in this particular
case -- I just tried to make the shared_ptr stuff have exactly the
semantics as one would get with a copying approach like we use
elsewhere (e.g., for files, for revisions proper, etc.).

>   2. The manifest-reconstruction stuff is still present. Soon this
>      should all be deleted. Maybe now? I don't know. I'm surprised
>      you kept it alive. If you're migrating the schema, why not just
>      toss the manifest tables entirely? They're empty, no?

I got about half way through tearing it out as being completely
ancient, but then I realized that no, 'rosterify' actually needs the
manifest reconstruction stuff too.  0.30 seems a little early to break
migration from pre-0.26.

>   3. Why is the delayed_files map separate from the vcache? Shouldn't
>      the two be unified (and dedicated to files alone, ditching
>      manifests) now that you have a proper LRUWritebackCache?

Assuming we do want to keep manifests around for now, using a
writeback cache for files is a little bit tricky, because we'd have to
separate out the file and manifest logic somehow, or something...

Anyway, I call "defer!" on this, the current file/manifest logic is
just like it is on mainline, and if this branch creates new
opportunities, we don't have to take all of them before it lands.

> >That's the kind of bug I'm worried about -- somehow dropping a step in
> >the dance between roster_cache insertion, marking things clean/dirty,
> >and letting them eventually get written or not -- depending on how the
> >transaction resolves, and whether a delta gets written straight to the
> >db first.  I _think_ I got all this stuff right, but I would feel
> >better if someone independently convinced themself of the same thing.
> 
> Again, I think this is even less likely if you make the file cache use 
> LRUWritebackCache too: identical code paths and relationship to 
> transactions as roster cache, less opportunity for logic mismatch.

The file logic is already "known good", so I think this is okay for
now.

> If the delta gets "written straight to the db" it's still inside a 
> transaction; it'll still get backed out if the transaction rolls back. 
> The key is to ensure that everything in memory that might relate to the 
> current transaction is written out before issuing "COMMIT", or discarded 
> from cache before issuing "ROLLBACK". The fewer paths we have that cause 
> write-out or cache-clearing, the easier this is to ensure.
> 
> >The other place I'm worried for similar reasons is in the roster delta
> >code -- this is much simpler, but again, I'd appreciate it if someone
> >else independently read through and convinced themselves that we
> >aren't going to accidentally drop some field or whatever.
> 
> One thing that stands out here is that you're detaching and attaching in 
> random order, not bottom-up / top-down like in the editable tree code.

Yes -- this makes the code much simpler :-).  Is this a problem?  We
can get away with it because we are not screwing around with paths,
where interpreting one path requires that other nodes be in the proper
place.

> You can add some "use std::foo" entries in here (especially make_pair) 
> to cut down on some chatter.
> 
> You have a couple long lines still that need wrapping.

Okay.

> It seems a bit wasteful to encode the whole marking of each node that 
> changes, rather than just the aspect of the marking that changed. Most 
> edits are patches. You're going to write out the parent+name marking and 
> all the attr marks in each such delta. This makes the deltas likely 
> twice as large as they need to be. Maybe gzip will handle that?

I was being lazy, yeah.  Looking at 'db info' on nvm* both migrated
and not, I get:
                              mainline        .roster-delta
   number of full rosters:        65                 64
  nubmer of roster deltas:      9083               7397           
        full roster bytes:   3408371            3452574
       roster delta bytes:   6813125            6121118
              total bytes:  10221496            9573692

Probably a lot of the size difference is due to .roster-delta being a
more thrifty about creating multiple reconstruction paths to the same
roster, but there doesn't seem to be too bad a size overhead in any
case.

> Are the function names "delta_only_in_to" and "delta_in_both" really 
> correct? It seems they're called when a given *node* is in to/both, not 
> a delta. I.e. they should be called "node_only_in_to" and 
> "node_in_both". (And really, using "to" as a noun sort of irritates me; 
> can you call it "target", "dest" or such?)

Delta is intended as a verb, as in, perform the delta for a node that
is only on one side, or on both :-).  But yeah, the names could be
better.  Fixed.

> >Hm.  I'm not sure.  Basically, IIUC, the difference between signed and
> >unsigned is in:
> >  1) comparison operators
> >  2) widening coercion
> >  3) IO/printing/parsing
> >and in particular, arithmetic couldn't care less _which_
> >representatives us silly humans think we're using for our 2^n modulo
> >equivalence classes, the computer'll happily use the same bit strings
> >either way?
> >
> >(1) and (2) are irrelevant here, because these are just uninterpreted
> >blobs, we never _do_ anything with them (also, they're 64 bit, so not
> >much widening going on :-)).
> ...
> >However, I really do not care all that much, and am curious what your
> >reasons are.
> 
> This is not much of an argument, and I don't *really* care in the sense 
> that I'll hold you up. But you asked, so:
> 
> I generally hate signed integers. They are an unmitigated source of 
> evil: something to confine to a very small corner of a program, surround 
> with an iron box, and glare at.
> 
> Once you have any signed value floating around, C will go its best to 
> infect everything else with signedness via promotion. It only takes one 
> signedness-where-you-didn't-expect-it to cause all manner of 
> memory-corrupting security bugs, from underflowing loop counters to 
> negative array indices to unexpected sign-extending of "small" numbers 
> when widened. I prefer to keep them in a place where they'll infect 
> less, sort of like a plague.
> 
> (Note that if you complie sqlite with any level of checking for this 
> stuff, it's absolutely lousy with mixed-signedness arithmetic. Makes my 
> skin crawl)

Hmm, I have another idea -- looking at this code with fresher eyes,
I'm not sure why we have a roster_id at all.  There is a 1-1
relationship between revisions and rosters, we always access rosters,
via revisions, and the roster id is arbitrary anyway -- so no point in
allocating these ids and using a bridge table, let's just index
rosters by revision_id.

I just implemented this on .roster-delta.no-roster_id, and it seems to
be working... I'll merge it back into .roster-delta once the tests
finish running.

> >>>-database::remove_version(hexenc<id> const & target_id,
> >>>-                         string const & data_table,
> >>>-                         string const & delta_table)
> >>I wonder if it's wise to remove this function. Isn't it useful?
> >
> >Its only use was so that 'db kill_rev_locally' could remove a
> >killed revision's roster (and potentially files?).  I have a bit of a
> >passive aggressive relationship with this feature :-).
> 
> Really? Hmm, istr having a use for it when we receive redundant storage 
> edges, nuking one of the reconstruction paths to save space. I thought 
> this was an important space optimization at one time. Is it no longer, 
> or am I mis-remembering? Maybe that's "deltify"...

I dunno, maybe we did at some point, but it only had the one caller.

> >>>@@ -2439,7 +2486,6 @@ parse_marking(basic_io::parser & pa,
> >>>          pa.str(k);
> >>>          pa.hex(rev);
> >>>          attr_key key = attr_key(k);
> >>>-          I(n->attrs.find(key) != n->attrs.end());
> >>>          safe_insert(marking.attrs[key], revision_id(rev));
> >>>        }
> >>Slightly nervous about this. Why is it ok?
> >
> >Because the only caller for ros.parse_from is read_roster_and_marking,
> >which immediately calls roster.check_sane_against(marking), which does
> >the same check.
> 
> Eh. Unless profiles say it's expensive, I prefer leaving O(1) and 
> O(lg(N)) sanity checks in cold code paths. Suppose the path gains 
> another caller someday.

The problem is that when parsing a roster_delta, I don't have the
roster it produces.  So I can't sanity check the marking_t against
that roster as I parse it.

> Otherwise looks good. Very nice work. I wish I could grant you 100% 
> confidence that it's correct, but I can't. All I can say is that I've 
> *certainly* pushed buggier code before; I don't have a recipe for 
> avoiding bugs. "make backups" (seriously: save a database do a $2 DVD 
> every few months, you'll be glad you did someday)

Sure, we do what we can...

I already added a check for sanity on every roster before we put it
into the cache; I think I'll add a check of the manifest_id too
(using revision_ids to index rosters makes this easier too), and
then call it done.

-- Nathaniel

-- 
"If you can explain how you do something, then you're very very bad at it."
  -- John Hopfield




reply via email to

[Prev in Thread] Current Thread [Next in Thread]