[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: nvm.experiment.db-compaction

From: Zack Weinberg
Subject: [Monotone-devel] Re: nvm.experiment.db-compaction
Date: Mon, 25 Feb 2008 18:16:58 -0500

On Mon, Feb 25, 2008 at 2:58 PM, Markus Schiltknecht <address@hidden> wrote:
>  I've started to migrate to an internal binary representation of hashes,
>  the complete "id" now means "a binary id", and only hexenc<id> should
>  contain hex encoded ids. As a quick "make check" tells, this currently
>  doesn't quite work, because of intermixed binary and hex
>  representations. I'm trying to get rid of those - and already have a
>  bigger change pending (lots of added hexenc_{en,de}code calls).

Yeah, I was just looking at that myself.

Rather than add lots and lots of en/decode_hexenc calls all over the
place, I was thinking that it would be better to make operator<< and
operator% (for formatting) on 'id' automatically convert to hexenc,
and the 'id' constructor automatically convert from hexenc when it
detects it (the hexenc string will be twice as long as the binary
string, so it's unambiguous).

Doing this of course runs us afoul of vocab_macros ... I want to
dynamite the entire vocab system and start over, but that's hard, and
hopefully not necessary for this work.  Perhaps your ATOMIC_BINARY
approach would work.

>  What I wanted
>  was to disallow operator<< for a new type ATOMIC_BINARY, but still allow
>  (read: define) it for normal ATOMIC and ATOMIC_NOVERIFY types.

Well, see above, but you probably need something not unlike the
default template for dump() [in base.hh] visible in the header if you
want to get a compiler error rather than a linker error.

>  P.S.: I'm really wondering about performance of that thing. You once
>  mentioned, that those hashes are stored in hex in revisions and
>  manifests - thus we'd often need to hex encode them anyway. While that's
>  certainly true, we might save some encoding steps when comparing against
>  a newly calculated hash.
>  Plus, there's the space savings, which might be negligible for single
>  revision_ids, but consider ancestry_maps, where we store lots of revids
>  in memory...

Right, my bet is that more often than not we wind up not needing to
hex encode the majority of them, and thus it will be an unambiguous
speed win.

On the longer scale I'd like to think about binary encoding of
revisions and rosters, with an isomorphism to the text format for user
presentation, but having the hash calculation be over the binary form
... unfortunately it would break all signatures.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]