[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Monotone-devel] Re: Support for binary files, scalability and Windo
From: |
Zbynek Winkler |
Subject: |
Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port |
Date: |
Tue, 20 Jan 2004 00:31:45 +0100 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6b) Gecko/20031208 |
graydon hoare wrote:
as background (another email asked "how this stuff works"): the idea
with a hashtree (a.k.a. merkle tree) is that you and I both store our
entire set of "objects" (say, blocks or files or whatever) in an
N-ary tree where each leaf is positioned in the tree at some unique
spot (in our case the tree represents the entire space of SHA1, each
branch peeling off some number of bits, and the leaf's position at the
bottom of the tree is determined by content hash). each interior
node's slots are the hashes of the subnodes beneath it. the hash
algorithm used for interior nodes isn't necessarily related to the one
pointing out to leaves, but we might as well use SHA1 again.
Let me see if I understand this right. First we have a regular 256-ary
tree structure 20 levels deep (so SHA1 is 20 bytes, right?). There is
really nothing unexpected about this tree (just a regular search tree).
Now in addition to this we take each node in the tree and calculate a
hash of its contents and place the hash to the parent (we would do this
bottom up). When we have a new SHA1 that we want to add to the tree we
use the tree structure to find a place for this new leaf (going down
from the root) and add it to the appropriate node at the bottom. Now we
have changed the node so we have to update its hash in its parent (and
so on up the tree). Did I get it right? If so, it seems very elegant to
me. Why haven't you done it this way in the first place? ;-)
anyways, when you and I want to sync, I send you my root node, you can
see all the slots in it which are non-equal to *your* values for those
slots: those are 1/N subtrees which have "a difference" in them,
somewhere. so for each of those which is different, you send me *your*
subtree node, and then I look at them and pick out the 1/(N*2)
subtrees which contain the differences. we bounce back and forth at
worst log_N(X) times where X is the size of the space we're spanning
(I'm thinking a 256-ary tree over SHA1: 20 levels, so at worst 10
round trips) and exchange at worst O(D*K) bytes overhead locating the
K different objects. then we just transmit the objects missing from
each party's collection, having isolated them. there are some cheap
hacks used to make the average case costs collapse to the load of the
tree rather than its depth (so, more likely say 3 or 4 round trips),
but even in the worst case it's very efficient, and it's easy to
pipeline.
What is "D" in the bytes overhead?
note that this is a very general algorithm for synchronizing two
collections of arbitrary things. the "tree" here has nothing to do
with a filesystem tree or the xdelta/suffix-tree storage
representation for a file. those are orthogonal.
Yes, so it appears. Judging from the description the algo handles
deletes from the collection as well so it should work. I believe that
deletes are needed for the table 'files' where only the newest versions
of files are stored, aren't they? Or your term 'collection' does not
equal to 'table'?
Zbynek
--
http://zw.matfyz.cz/ http://robotika.cz/
Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
- [Monotone-devel] Support for binary files, scalability and Windows port, Asger Ottar Alstrup, 2004/01/12
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/12
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/15
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zbynek Winkler, 2004/01/15
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/16
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Ori Berger, 2004/01/16
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/17
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Nathaniel Smith, 2004/01/17
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port,
Zbynek Winkler <=
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Ori Berger, 2004/01/18
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zack Weinberg, 2004/01/18
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
- [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Alstrup, 2004/01/18
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, Peter Simons, 2004/01/18
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
- [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/19
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
- [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/20
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/20