[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Monotone-devel] Re: Support for binary files, scalability and Windo
From: |
graydon hoare |
Subject: |
Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port |
Date: |
Sat, 17 Jan 2004 01:11:14 -0500 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4 |
Ori Berger wrote:
I had been thinking about this as well
there are some things in here I'm curious about, some I disagree with..
First, it's possible to use a suffix tree to locate repeating parts.
It's possible to implement much, much faster than rsync's algorithm
rsync's algorithm is a 2-level thing which works when only one source is
available and you only have an [adler32,md4] tuple list from the other
guy; xdelta's "modification" is to drop that assumption and extend
adler32-matched windows forwards and backwards by directly comparing
source and destination regions. cost per byte in that phase is 1
compare. the cost per byte when scanning for matching windows is 1 add,
1 subtract, a mask, and a hashtable lookup (the expensive part). I don't
really see how or why a suffix array would find anchor regions any
faster. wouldn't the cost per byte be a partial (failing) suffix array
lookup? can you elaborate?
<http://www.cl.cam.ac.uk/~cpk25/libstree/>. This implementation doesn't
use persistent storage, but if it did, then it would take more or less
zero time to locate _all_ sources for a file almost immediately.
I don't know what you mean by "all sources for a file". could you be
more specific about where you mean to use a suffix array, and how it
helps with a particular bit of retrieval or storage? the way I see it, a
persistent suffix array makes the on-disk representation of a string
considerably larger, without improving the xdelta problem (which is:
find all common regions between these two strings)
Find a way to partition files into blocks, in such a way that:
1. Block lengths are reasonable (definitions later)
2. The block boundaries are likely to be unchanged by small
modifications - i.e. an insertion of 1 byte somewhere should have small
probability of changing the boundary area, smaller probability of
changing two boundaries, .... vanishingly small probability of changing
all boundaries.
Now, given a file, break it into blocks; Store each block independently,
with its own hash; Store a manifest that says how to rebuild a file from
its blocks. And that's it.
yes, I've been thinking something somewhat along these lines (for
dealing with the file storage issue, anyways). I would make a few
modifications to this scheme, though.
- since this scheme doesn't involve coalescing or extending matching
blocks (like xdelta does) the blocks need to be larger than you're
thinking. a block identified by a SHA1 needs to be a few hundred
bytes at minimum. the SHA1 itself is 20 bytes (binary) and as I
will note momentarily I think adding extents will help, so a
reference to a block extent will likely be 20 + 8 + 8 = 36 bytes
(sha1,off,len).
- for inserts smaller than minimum block size, it is better to just
put an "insert len [literal data]" marker in the script, rather
than a reference to a tiny block fragment. if the sum of all
the literals in a script is larger than block size, you can
concatenate them and push that one block down to the block layer,
replacing the literals with references.
- for deletes less than the size of the block containing the delete,
or any sort of insert in the middle of a block, your script will
benefit from being able to denote extents within blocks, rather than
all-or-nothing references to blocks.
- I'm still not precisely sure how to accomodate searching for
matching blocks or sub-block extents between a new file and "the
entire database". it's possible your thoughts on suffix arrays
will come in handy here; otherwise I am thinking a simpler
algorithm:
- when the file is "new" to the database (has no previous
matching manifest entry) split it up into new blocks.
- when a file is being "patched", run fine-grained xdelta
to locate matching extents with a table containing the
adler32s of all small windows within all blocks of the
previous version of the file
- after each xdelta-like operation, check to see if there are
enough inserts to coalesce into a new block
that technique is really nothing more than extending xdelta to work
over block-structured files, which is (as I'll get to) advantageous
since it permits Very Large Files (~2 ** 40 or so) and also makes
network sync work.
hashing Very Large Files would still be a bit of a bitch, though.
The blocks could be identified by their SA1, and the manifest could
either be file-specific, or the standard manifest extended to include a
part specification, e.g.
hmm, no, not changing the manifest format. the file remains identified
by its SHA1. if you don't like SHA1, substitute some other function
which makes an identifier given an input string, but if the filesystem
can maintain the notion of a "file" using inodes, so can manifests.
as far as I'm concerned, I'd keep doing what I'm doing now, reuse the
block + delta storage system for *storing* manifests, too. they're data too.
This also interoperates nicely with network stuff - To fetch a version
from a depot, you fetch the manifest, and then fetch all sha1 blocks
listed in the manifest, that you don't already have. As simple as that.
hm, for the network I have a broader idea: use hash trees over the
entire space of SHA1 to synchronize my collection + your collection into
the union of both (on both ends). with some special accounting to manage
singletons and tombstones, and a good spread factor, it's very efficient
to synchronize hash trees, and I'd use the exact same scheme to sync the
collection of blocks, the collection of manifests, the collection of
files, the collection of keys, and the collection of certs.
I'm cooking up a little proof of concept for this scheme now. it'll be a
few days before I have anything working, but I'm pretty excited about it.
There's no need to have a staging queue for the depot.
indeed. removing queues and all the associated logic is something I
would very much like.
About NNTP, email, "dumb web" distribution - all you have to do is
record, for each block whether or not it was sent to a specific
destination.
no, I think I'd just remove these things altogether, let people sync
between databases directly as the primary mode of operation.
* Requires more storage and /or bandwidth; A one
byte change in a 100MB file would cost ~500KB with this
scheme, and ~ ~100B with the current delta scheme.
nah, it's not quite so bad. a 100mb file will have something like 10
blocks of 10mb each. the script for building it will be at minimum 320
bytes long (though I'd prefer a human-readable script form, so more like
500 bytes). inserting 1 byte will add perhaps 10 bytes to the script,
plus require transmission of a new script: 510 bytes total. deleting 1
bytes will probably split 1 extent reference into 2, adding maybe 50
bytes: 550 bytes total.
for a heavily edited file, it'll slowly get worse, but maybe you could
have a "defragment" routine which builds some fresh blocks (especially
if a bunch of blocks appear with refcount=1; might as well toss them)
* Latest version, unless properly cached, will take longer to
construct (need to pull all blocks, which will be scattered
in the database). And proper caching costs space....
true. this could get to be a noticable cost. again, defragmentation is
a possibility, or else just building an LRU file cache in the db. I'm
not adverse to that.
-graydon
- [Monotone-devel] Support for binary files, scalability and Windows port, Asger Ottar Alstrup, 2004/01/12
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/12
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/15
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zbynek Winkler, 2004/01/15
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/16
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Ori Berger, 2004/01/16
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port,
graydon hoare <=
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Nathaniel Smith, 2004/01/17
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zbynek Winkler, 2004/01/19
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Ori Berger, 2004/01/18
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zack Weinberg, 2004/01/18
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
- [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Alstrup, 2004/01/18
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, Peter Simons, 2004/01/18
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
- [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/19