[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[GNUnet-developers] DistribNet and GNUNet
[GNUnet-developers] DistribNet and GNUNet
Tue, 23 Apr 2002 07:40:45 -0400 (EDT)
I am starting a new distributed network, DistribNet, which has similar
aims of Freenet and GNUNet except that my main focus is speed and
stability rather than anonymously. Compared to freenet I like your
network a lot better. If for now other reason is that it is not written
in god dam Java. Sorry about that, it is just that I hate Java in just
about every way possible. Seriously though, I like most things about your
project except for your extreme approach of using UDP packets for
*everything*, tiny 1K block sizes, and using the filesystem to store these
tiny blocks. I plan to use a mixture of UDP and TCP packets and 32K block
sizes for splitting up files.
Below is an outline of DistribNet. Let me know what you think.
Perhaps will can work together provided our design goals don't conflict
too much. Maybe in the future we can even merge the two networks.
I am most interested in your lookup services and accounting of GNUNet.
Feedback more than welcome.
A global peer-to-peer internet file system in which anyone can tap into
or add content to.
Kevin Atkinson (kevin at atkinson dhs org)
Last Modified: 2002-04-22
Project Page: http://distribnet.sourceforge.net/
Mailing list: http://lists.sourceforge.net/lists/listinfo/distribnet-devel
*) To allow anyone, possibly anonymously, to publish web sites with
out having to pay to for the bandwidth for a commercial provider
or having to put up with the increasingly add ridden free web
sites. One should not have to worry about bandwidth
considerations at all.
*) Bring back the sense of community on the Internet that was once
present before the internet become so commercialized.
*) Serve as an efficient replacement for current file sharing networks
such as Morpheus and Gnutella.
*) To have the network stable and working before some Commercial
company designs a propitiatory network similar to what I envision
that can only be accesses via freely available but not FSF
approved free license.
(Possibly Impossible) Goals:
*) *Really* fast lookup to find data. The worst case should be O(log(n))
and the average case should be O(1) or very close to it.
*) Actually retrieving the data should also be really fast. Popular
data should be sitting on the same subnet. On average it should
be as fast or faster than a typical web site (such as slashdot,
google, etc.). It should make effective use of the
topology of the internet to to minimize network load and maximize
*) General searching based on keywords will be build into the protocol
from the beginning. The searching faculty will be designed in
such a way to make message boards trivial to implement.
*) Ability to update data while keeping old revisions around so data never
disappears until it is truly unwanted. No one person will have
the power to delete data once it spreads throughout the network.
*) Will try very hard to keep all but the most unpopular content from
falling off the network. Basically before deleting a locally
unpopular key it will first check if other nodes are storing the
key and how popular they find the key. If not enough nodes are
storing the key and there is any indication that the data may be
useful at a latter date it will not delete it unless it absolutely
has to. And if it does delete it it will first try uploading it
to other nodes with more disk space available.
*) Ability to store data indefinitely if someone is willing to provide
the space for it (and being able to find that data in log(n)
*) Extremely robust so that the only way to kill the network is to
disable almost all of the nodes. The network should still
function even if say 90% of it goes down.
*) Extremely effect cpu-wise so that a fully functional node can run in
the background and only take 1-2% of the CPU.
I would like the protocol to be able to effectually support (ie with out
any ugly hacks that many of the application for Freenet use)
1) Efficient Web like sites (with HTTP gateway to make browsing easy)
2) Efficient sharing of files large and small.
3) Public message forms (with IMAP gateway to make reading easy)
4) Private Email (with the message encrypted so only the intended
recipient can read it, again with IMAP gateway)
5) Streaming Media
6) Online Chat (with possible IRC or similar gateway)
(Also see philosophy for why I don't find these issues that important)
*) Complete anonymity for the browser. I want to focus first on
performance than on anonymity. In fact I plan to use extensive
logging in the development versions so that I track network
performance and quickly cache performance bugs. As DistribNet
stabilizes anonymity will be improved at the expense of logging.
The initial version will only use cryptology when absolutely
necessary (for example key signing). Most communications will be
done in the clear. After DistribNet stabilizes encryption will
slowly be added. When I add encryption I will carefully monitor
the effect it has on CPU load and if proves to be expensive I will
allow it to be optional.
Please note that I still wish to allow for anonymous posting of
content. However, without encryption, it probably won't be as
anonymous as Freenet or your GNUNet.
*) Data in the cache will be stored in a straight forward manner. No
attempt will be made to prevent the node operate from knowing what
is in his own cache. Also, by default, very little attempt will
be made to prevent others from knowing what is a particular node
*) I have nothing against complete anonymity, it is just that I am
afraid that both Freenet and GnuNet or more designed around the
anonymity and privacy issues then they are around the performance
and scalability issues.
*) For most type of things the level of anonymity that Freenet and
GnuNet offers is simply not needed. Even for copyrighted and
censored material there is, in general, little risk in actually
viewing the information because it is simply impractical to go
after every single person who access forbidden information. Most
all of the time the lawsuits and such are after the original
distributors of the information and not the viewers. There for
DistribNet will aim to provide anonymity for distributing
information, but not for actually viewing it. However, since
there *is* some information where even viewing it is extremely
risky, DistribNet will eventually be able to provide the same
level of anonymity that Freenet or GnuNet offers, but it will be
*) I also believe that knowing what is in one owns datastore and being
able to block certain type of material from one owns node is not
that big of a deal. Unless almost everyone blocks a certain type
of information the availability of blocked information will not be
harmed. This is because even if 90% of the nodes block say,
kiddie porn, the information will still be available on the other
10% of the nodes which, if the network is designed correctly,
should be more than enough for anyone to get at blocked
information. Furthermore, since the source code for DistribNet
will be protected under the GPL or similar license, it will be
completely impractical for other to force a significant number of
nodes to block information. Due to the dynamic nature of the
cache I find it legally difficult to hold anyone responsible for
the contents of there cache as it is constantly changing.
DistribNet Key Types:
There will essentially be two types of keys. Map keys and data keys.
Map keys will be uniquely identified in a similar manner as freenet SSK
keys. Data keys will be identified in a similar manner as freenet's
Map keys will contain the following information:
* Short Description
* Public Namespace Key
* Timestamped Index pointers
* Timestamped Data pointers
_At any given point in time_ each map key will only be associated with
one index pointer and one data pointer. Map keys can be updated by
appending a new index or data pointer to the existing list. By
default, when a map key is queried only the most recent pointer will
be returned. However, older pointers are still there and may be
retrieved by specifying a specific date. Thus, map keys may be
updated, but information is never lost or overwritten.
Data keys will be very much like freenet's CHK keys except that they will
not be encrypted. Since they are not encrypted delta compression may
be used to save space.
There will not be anything like freenet's KSK keys as those proved to
be completely insure. Instead Map keys may be requested with out a
signature. If there is more than one map key by that name than a list
of keys is presented sorted by popularity. To make such a list
meaning full every public key in freenet will have a descriptive
string associated with it.
Data Key Details:
Data keys will be stored in maximum size blocks of just under 32K. If
an object is larger than 32K it will be broken down into smaller size
chunks and an index block, also with a maximum size of about 32K, will
be created so that the final object can be reassembled. If an object
is too big to be indexed by one index block the index blocks themselves
will be split up. This can be done as many times as necessary therefore
providing the ability to store files of arbitrary size. DistribNet
will use 64 bit integers to store the file size therefore supporting
file sizes up to 2^64-1 bytes.
Data keys will be retrieved by blocks rather than all at once. When a
client first requests a data key that is too large to fit in a block
an index block will be returned. It is then up the client to figure out
how to retrieve the individual blocks.
Please note that even though that blocks are retrived individually
they are not treated as trully independent keys by the nodes. For
example a node can be asked which blocks it has based on a given index
block rather than having to ask for each and every data block. Also,
nodes maintain persistent connections so that blocks can be retrieved
one after another without having to re-establish to connection each
Data and index blocks will be indexed based on the SHA-1 hash of there
contents. The exact numbers of as follows:
Data Block Size: 2^15 - 128 = 32640;
Index block header size: 40
Maximum number of keys per index block: 1630
Key Size: 20
Maximum object sizes:
direct => 2^14.99 bytes , about 31.9 kilo
1 level => 2^25.66 bytes , about 50.7 megs
2 levels => 2^36.34 bytes , about 80.8 gigs
3 levels => 2^47.01 bytes , about 129 tera
4 levels => 2^57.68 bytes
5 levels => 2^68.35 bytes (but limited to 2^64 - 1)
A block size of just under 32K was chosen because I wanted a size
which will allow most text files to fix in one block, most other files
with one level of indexing, and just about anything anybody would
think of transferring on a public network in two levels and 32K worked
out perfectly. Also, files around 32K are rather rare therefor
preventing a lot of of unnecessary splitting of files that don't quite
make it. 32640 rather than exactly 32K was chosen to allow some
additional information to be transfered with the block without pushing
the total size over 32K. 32640 can also be stored nicely in a 16 bit
integer without having to worry if its signed or unsigned.
Blocks are currently stored in one of three ways
1) block smaller than a fixed threshold (currently 1k) are stored using
Berkeley DB (version 3.3 or better).
2) blocks larger than the threshold are stored as files. The primary
reason for doing this is to avoid limiting the size of data store
by the maximum size of a file which is often 2 or 4 gb on most
3) blocks are not stored at all instead they are linked to an external
file out side of the data store much like a symbolic link links to
file out side of the current directory. However since blocks often
only represent part of the file the offset is also stored as part
of the link. These links are stored in the same database that
small blocks are stored in. Since the external file can easily be
changed by the user, the SHA-1 hashes will be recomputed when the
file modification data changes. If the SHA-1 hash of the block
differs all the links to the file will be thrown out and the file
will be relinked. (This part is not implemented yet).
Most of the code for the data keys can be found in data_key.cpp
Lookup will probably be done by using the chord protocol. See
DistribNet is/will be written in fairly modern C++. It will use
several external libraries however it will not use any C++ specific
libraries. In particular I have no plan to use any sort of
Abstraction library for POSIX functionally. Instead thin wrapper
classes will be used which I have complete control over and will serve
mainly to make the process of using POSIX functions less tedious
rather than abstract away the details of using them.
- [GNUnet-developers] DistribNet and GNUNet,
Kevin Atkinson <=