[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] more efficient sharing of similar data

From: Martin Uecker
Subject: Re: [GNUnet-developers] more efficient sharing of similar data
Date: Sun, 18 Aug 2002 16:03:08 +0200
User-agent: Mutt/1.4i

Hi gnunetters!

On Sat, Aug 17, 2002 at 03:33:17PM -0500, Christian Grothoff wrote:
> On Saturday 17 August 2002 06:25 am, Martin Uecker wrote:
> > In the preprocessing case each place where the rolling checksum
> > hits the magic value would be moved to a 1k boundary. This scheme
> > would work without modifying one line of code in gnunet.
> I kind of doubt that - if we take on-demand encoding into consideration 
> (where 
> GNUnet receives a request and then looks at 1k of the (unmodified) file in 
> plaintext on the drive and sends back the reply).

Yes. They file must be modified (preprocessed) in this case.

> > Integrating it into the client has the advantage that the user
> > can serve unmodified files (this is an argument for putting ECC
> > into the client to). But the index where the hashes are stored
> > must be extended with information where each block starts.
> > (10 bits of additional information for each 1k block)
> You can have 10 bits, and only 10 bits:       :-) 

I might have two bits left! ;-)


> The last 10 bits in 'fileOffset' are currently always 0 (1k alignment!). 

Yes, the blocks would just not be aligned anymore.

> > > | Also I'm not sure how tightly you can fit all blocks into the 1k
> > > | scheme (which would of course be best since you gain anonymity if all
> > > | blocks are truely uniform in size).
> >
> > Yes, this is critical for efficiency too.
> >
> > I have to think about the idea some more but I will provide
> > patches to play with next month.
> Well, I hope you'll give us a pile of documentation / descriptions with them, 
> because this sounds like it's going to be *very* complicated (but promising, 
> nevertheless, could solve one of the problems that we had with our 
> tree-encoding which was that inserting a single character at the beginning 
> would make the entire thing look totally different). So I'm looking forward 
> to this.

The idea is very simple! Look at this (stupid) implementation.
Algorithm and constants are choosen without much thought.

For on-demand loading I would suggest to use overlapping blocks
of constant size 1024 instead of padding.

Note that we can reconstruct the file only sequencially. For
random access the downloader needs to know how much each block
is moved before downloading the file. (this is one byte per

#include <stdio.h>
#include <stdbool.h>

#define CONTEXT         99
#define ALIGN           1024
#define MAX_PAD         256
#define MAGIC           ALIGN

int main(int argc, const char* argv[])
        int c;
        unsigned int inpos = 0;
        unsigned int outpos = 0;
        unsigned int sum = 0;
        unsigned char rol_buf[CONTEXT];
        bool in = true;

        if ((2 == argc) && (0 == strcmp(argv[1], "-d")))
                in = false;
        memset(&rol_buf, 0, CONTEXT);
        while (EOF != (c = getchar()))

                sum -= rol_buf[inpos % CONTEXT];
                rol_buf[inpos % CONTEXT] = (unsigned char)c;
                sum += (unsigned char)c;

                if (0 == sum % MAGIC) {
                        int space = ALIGN - ((outpos + ALIGN - 1) % ALIGN) + 1;

                        if (space < MAX_PAD) {

                                outpos += space;

                                while (space--) // FIXME: error checking
                                        (true == in) ? putchar('.') : getchar();

reply via email to

[Prev in Thread] Current Thread [Next in Thread]