monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Rarely mentioned 25-50% efficiency boost for monoto


From: Christof Petig
Subject: Re: [Monotone-devel] Rarely mentioned 25-50% efficiency boost for monotone ; -)
Date: Fri, 15 Apr 2005 17:57:58 +0200
User-agent: Mozilla Thunderbird 1.0.2 (X11/20050404)

Nathaniel Smith schrieb:
> On Fri, Apr 15, 2005 at 12:02:14PM +0200, Christof Petig wrote:

Hi Nathaniel,

please consider this thread as a reminder not as an urgent request.
[Useful when comparing to git?]

>>Hi,
>>
>>since monotone's performance had been a topic on this list for some days
>>and nobody else mentioned this fact I will do it:
>>
>>Monotone both internally and externally (database) still uses
>>hexadecimal or base64 encoded data while sqlite3 already supports BLOBs.
>>It also does not use query parameters (which are needed by BLOBs), but
>>merges data and commands into a big SQL statement.
>>
>>There is still preliminary work on prepared statements in the
>>n.v.m.sqlite3 branch by Derek Scherger which has seen neither review nor
>>  further work (IIRC).
> 
> 
> Do you have any data supporting this 25-50% efficiency increase?

You get 25% IO+memory usage reduction (on the specified fields) when you
move from base64 to BLOBs.
You get 50% IO+memory usage reduction (on IDs) when you move from hex
encoding to BLOBs. (Note that I do not propose this, yet)
You get additional 50% memory reduction (on querys) if you use query
parameters. [Now we have the raw binary data, it's base64 equivalent and
the query (which copies the base64 equivalent another time)].

All in all this results in about >25% less memory and IO (not netsync
which uses binary data!) usage for some data structures. Considering
effects on processor cache I conclude that every space reduction should
result in a speed increase (though seldom linearly). Base64
encoding/decoding also takes some time (usually not much).

Of course I exaggerated a bit with this subject ;-) but I still believe
that a certain amount of efficiency is lost due to base64 encoding _and_
not using query parameters.

> Obviously I'm interested if it's true :-), but I like to see profiling
> data to support such ideas before suggesting people put work into
> them...

Of course I do not have profiling data. There should be enough cases for
 speed increase at other places, but the 25% on disk reduction is
significant. Combining author+date+changelog certs into one might give
another size reduction on fast moving projects.

> (You do get a space improvement, of about 25%; I don't think we're IO
> bound enough for this to matter for speed, but perhaps there are cases
> where we are.)
> 
> Saving 25% disk space is also nice, of course, but there still is the
> original motivation for using base64: that it's nice that you can poke
> around at the database, copy/paste data pulled out of it, etc.,
> without having to take precautions against binary data being blatted
> out on your screen... all in all, I can imagine deciding that the
> space is more important, but it doesn't seem like an urgent question?

I would love to see the query parameter approach reevaluated/discussed.
And I would love to say "select id,value from revision_certs where value
like 'something%'" without having to encode 'something' into
'c29tZXRoaW5nC'. Encoding IDs as binaries would not be a good idea
unless the database user interface shows BLOBs as hex strings.

>>Migrating the internal representation to binary (instead of hex) IDs
>>would be an enourmous effort (IIRC) but storing certs and data plain
>>instead of base64 encoded should be easy to accomplish.
>>
>>Another benefit would be the easier interface to cert data via human
>>initiated SQL.
> 
> 
> It's already trivial to get decoded data with 'db execute' -- just use
> the unpack() and unbase64() functions.

unless you tend to use sqlite3 as the interface ...

   Christof

To use facts:
revision_certs: consider the line (select * from revision_certs limit 1;)
740c19ffcc75a608cbea0289345b9ce71748c18a|62035893bc12369f31b0ffeb04aa18f28f1ecb69|author|Z3JheWRvbkBkdWIudmVuZ2UubmV0
|address@hidden|Xnk0I1y4bWOtoPAAnHjyoLYl3mOY6S5L7DGxPE9jWnu3XsbQDu0mhmfgWdyxMFENENHCJHXf
G8+OzvNrtrPoiW9ndnM7JGqIWvLVQa6d2cp1es7r38jTtldJd2E/B6UxDsMUHJhrPfpDrV8T
KvnVucfLFMCPxVY0rhKYDHRBUdw=
312 bytes

if you encode the base64 as BLOBS you get
740c19ffcc75a608cbea0289345b9ce71748c18a|62035893bc12369f31b0ffeb04aa18f28f1ecb69|author|address@hidden|address@hidden|<128bytes>
257 bytes (-18%)

if you also encode the hex you get
<20bytes>|<20bytes>|author|address@hidden|address@hidden|<128bytes>
217 bytes (-31%)

[I shamelessly assumed that SQLite uses about 1 byte internally to
separate the strings, should hold for short strings AFAIK]

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]