[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: nvm.experiment.db-compaction

From: Zack Weinberg
Subject: [Monotone-devel] Re: nvm.experiment.db-compaction
Date: Tue, 26 Feb 2008 00:16:57 -0500

On Mon, Feb 25, 2008 at 7:58 PM, Markus Schiltknecht <address@hidden> wrote:
>  > Rather than add lots and lots of en/decode_hexenc calls all over the
>  > place, I was thinking that it would be better to make operator<< and
>  > operator% (for formatting) on 'id' automatically convert to hexenc,
>  > and the 'id' constructor automatically convert from hexenc when it
>  > detects it (the hexenc string will be twice as long as the binary
>  > string, so it's unambiguous).
>  I thought about that as well, but it would circumvent the type system
>  somewhat. I'm not sure if we want that.

Well...  I'm certain we want this for operator% formatting, because
its contract is to produce something human-readable.  I really don't
want to have to introduce new hexenc<id> variables and encode_hexenc()
calls everywhere we might need to print out an id in an N() or a L().
Similarly for dump(), because otherwise MM() on ids is useless.  And I
didn't audit them in detail, but it appears to me that most of the
places operator<< is used with ids are places where the output goes to
a human, and so the same argument applies.  [I like what you did about
this for basic_io, by the way.]

>  I've been under the impression, that we sometimes want the binary
>  representation. For example for netsync, but at least in we
>  now need the binary representation. Just using id::inner() to get that
>  seems rather confusing and doesn't help with type safety either.

I'm confused, what would we be doing if not foo_id::inner()?

>  Also keep in mind, that the original verify() routine for id's checked
>  for length 40. Where I now check for length 20 for ids, but length 40
>  for hexenc<id>s. Without that distinction, this verification would not
>  be possible, and I'd have an even harder time finding the mismatches.

I can agree with you more readily on the input side, but I'm kinda
scared of the number of places where command line parsing will have to
grow decode_hexenc() calls.  (I think this is behind an awful lot of
the test failures.)

Maybe it wouldn't be so bad if we backed away from the "no large
return objects" rule for this case (I personally think that rule is
mostly obsolete with modern compilers, anyway).  If en/decode_hexenc
returned their result, we wouldn't need as many intermediate

>  > Doing this of course runs us afoul of vocab_macros ... I want to
>  > dynamite the entire vocab system and start over,
>  How would that look like?

I'm not sure yet, but there are several things I want out of it:

1) Turn the templates inside out; id<revision_t> instead of
revision<id>, with the template parameter being just a cookie for the
type system.  Conceptually simpler *and* easier to implement; in fact
I think this makes a lot of the macro goo go away in one fell swoop.

2) Symtab/immutable_string goo only for those cases that can be
demonstrated to need it, which I suspect is hardly any of them.

3) Some sane way of doing forward declarations of these types.

4) Some sane way of doing the actual definitions of these types along
with their associated data structures.

>  What I didn't understand is, that it is operator<<, which the linker
>  complains about. But the compiler went trough it all *without* having
>  such an operator defined for id. I did even check the .ii file: no such
>  operator. Is there some kind of implicit declarations those? Or does
>  template instantiation of the DECORATE implicitly declare them?

I'm not sure, but I think there might be a very generic template
operator<< lurking somewhere.

>  > On the longer scale I'd like to think about binary encoding of
>  > revisions and rosters, with an isomorphism to the text format for user
>  > presentation, but having the hash calculation be over the binary form
>  That would be nice as well, but lets take one step after another ;-)


BTW, since it sounds like you have a bunch of pending work and I don't
want to trip you up, I'm going to leave your branch alone for now and
look at eliminating unnecessary base64 encoding and decoding - I think
there's a lot less of this, so the patches can just go straight to

>  > ... unfortunately it would break all signatures.
>  Uh, not necessarily, I think. We could just bump the format_version, and
>  keep compatibility code in there, no?

Yes, but then we'd be regenerating the text format on the fly for all
historical rosters, in order to checksum them.  I'm not sure whether
that would end up being a win.  (It certainly would for brand new
databases, but ...)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]