gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] manuscripts/storm article.rst


From: Hermanni Hyytiälä
Subject: [Gzz-commits] manuscripts/storm article.rst
Date: Tue, 28 Jan 2003 04:16:47 -0500

CVSROOT:        /cvsroot/gzz
Module name:    manuscripts
Changes by:     Hermanni Hyytiälä <address@hidden>      03/01/28 04:16:47

Modified files:
        storm          : article.rst 

Log message:
        CFS/Freenet/PAST storage

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.44&tr2=1.45&r1=text&r2=text

Patches:
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.44 manuscripts/storm/article.rst:1.45
--- manuscripts/storm/article.rst:1.44  Mon Jan 27 22:44:00 2003
+++ manuscripts/storm/article.rst       Tue Jan 28 04:16:47 2003
@@ -15,8 +15,9 @@
 ever to be changed, but that it is not possible to resolve such names 
 on a global scale. However, recent development in peer-to-peer systems
 has made scalable indexing systems possible, rendering this assumption
-obsolete. This, we believe, is the most important result of
-peer-to-peer research with regard to hypermedia.
+obsolete [refs: chord, can, tapestry, pastry, kademlia, symphony, viceroy]. 
+This, we believe, is the most important result of peer-to-peer research with 
+regard to hypermedia.
 [/maybe]
 
 In today's computing world, documents move quite freely between computers, 
being 
@@ -145,13 +146,21 @@
 ================
 
 In our system, Storm (for *storage module*), all data is stored
-in *blocks*, byte sequences identified by a SHA-1 cryptographic hash 
+as *blocks*, byte sequences identified by a SHA-1 cryptographic content-hash 
 [ref SHA-1 and our ht'02 paper]. Blocks often have a similar granularity
 as regular files, but they are immutable, since any change to the
 byte sequence would change the hash (and thus create a different block).
 Mutable data structures are built on top of the immutable blocks
 (see Section 5).
 
+CFS [ref], which is built upon Chord routing layer[ref], store data as blocks. 
+However, CFS *splits* files into several miniblocks and spreads blocks over 
the 
+available CFS servers. Freenet [ref] and PAST [ref, pastry ref] doesn't split 
+files into blocks, since they store data as whole files. All previously 
mentioned 
+systems lack of the immutable property which is used in Storm blocks.
+
+Immutable blocks has several benefits...
+
 Block storage makes it easy to replicate data between systems.
 Different versions of the same document can easily coexist at this level,
 stored in different blocks. To replicate all data from computer A
@@ -187,8 +196,9 @@
 When used in a network environment, Storm ids do not provide
 a hint as to where in the network the matching block can be found.
 However, current peer-to-peer systems could be used to
-find blocks in a distributed fashion; for example, Freenet [ref]
-and some Gnutella clients [ref] also use SHA-1-based identifiers.
+find blocks in a distributed fashion; for example, Freenet [ref], 
+a few recent Gnutella clients [e.g. ref: shareaza] , Overnet/eDonkey2000 [ref] 
+also use SHA-1-based identifiers [e.g. ref: magnet uri].
 However, we have not put a network implementation into regular use
 yet and thus can only describe our design, not report on
 implementation experience.
@@ -280,8 +290,171 @@
 XXX
 
 
+<<<<<<< article.rst
+Idea/Plan
+=========
+
+[Notes for the authors, not part of the final document
+though text may be moved from here to there.]
+
+Whenever a document moves on the current web, links to it break, 
+be it from an author's computer to a public server,
+from one server to another, from the server to a client,
+or from one personal computer to another. We subsume
+these forms of movement under the term 'data mobility.'
+
+
+Storm goals/benefits:
+
+- Reliability
+  - Append-and-delete-only
+  - The same data can be stored in many locations,
+    allowing it to be easily reconstructed after failure
+  - Versioning: Old versions remain accessible
+- Xanalogical storage
+- If a document is accessible, references to it work
+- Links do not break
+- Easy syncing:
+  - Just copy a bunch of blocks
+  - Documents can be synced & merged
+  - Inter-document structures can be synced & merged
+  - Syncing can be done without merging immediately,
+    leaving two alternative versions current
+    (so e.g. an automated process is entirely possible,
+    even when there are conflicts)
+- Versioning
+
+
+Grouped differently,
+
+- Reliability (as above)
+- Usablility in the face of intermittent connectivity 
+  (includes syncing, finding a document if available...)
+- Xanalogical structure 
+  (includes versioning, non-breaking links etc.)
+
+Storm limitations/weaknesses:
+
+- what, actually?
+
+antont ponders: for files storm is ok, but how about:
+- irc? (latency?)
+- video? (throughput)
+
+and:
+.. multipoint live video? (both latency and throughput demands)
+
+* does it make sense to think of irc messages, and/or video frames, as
+datablocks .. or what?
+
+  
+hemppah's comment on syncing term:
+I'd prefer term 'replication' instead of term syncing, when
+updating data to 'the most recent state'. E.g. Lotus Notes uses
+term replication, when one performs locally made updates into
+a centralized server --> 'used within same system'. Syncing term, however, 
+is used when importing/exporting e.g. Nokia Communicator calendar data 
into/from 
+Lotus Notes calendar --> 'used between different systems'.
+
+
+hemppah: worth to mention is that Ray Ozzie is a man behind Lotus Notes and 
Groove; 
+Lotus Notes is based on client-server model and, Groove is based on p2p model 
--> 
+possible direction etc. ?
+
+hemppah: I think we should mention that in Gzz one refer to data in 
non-hierarchial 
+way, where as in Notes (and other systems also, references!!), we must use 
+hierarchial way. In Notes most important IDs are:
+1) every document has a unique identifier, which is unique among all replicas 
+of database
+2) every document/design element has a identifier, named as noteID, which is 
unique 
+in database, but not among all replicas of database 
+3) every view has a unique identifier,  which is unique among all replicas of 
+database
+4) every database has a replica ID, which identifies database's replicas 
+among all databases
+
+So, if we want to refer to a document, we use format:
+
+replicaID/viewID/documentID
+
+Also, we can refer to same document, through *many different* views 
(analogical to Gzz's dimensions ?):
+notes://<server>/replicaID/viewID1/documentID
+notes://<server>/replicaID/viewID2/documentID
+
+Here's a real example:
+Notes://server/D235632D00313587/38D22BF5E8F088348525JK7500129B2C/REWB3FDE0D53807B67C2256CB50026FCVV
+
+For information about IDs in Notes:
+http://www-12.lotus.com/ldd/doc/tools/c/4.5/api45ug.nsf/85255d56004d2bfd85255b1800631684/00d000c1005800c985255e0e00726863?OpenDocument
+
+In Gzz, however, we don't know the location, we know only the *identity* of 
data what we are looking for, as follows:
+
+urn-5:FAB3FDE0hgfD5kkjj3807B67C2256CfsdB50026FC51 
+
+Above is not a *correct* urn-5, but very similar to last part of notes' syntax.
+
+<<<<<<< article.rst
+benja's reply:
+Hm. Replication to me means, the same data is kept on multiple
+machines. This is not what we are talking about here: We're talking
+about *different versions* of the same data being kept
+on multiple machines, and occasionally being 'brought into sync'
+with each other. If I send you a draft article and you comment on it,
+and I make changes too, and later I merge the two divergent
+versions back together, 'syncing' seems approximately right,
+but 'replication' seems completely wrong to me.
+
+hemppah's reply:
+Hm ;).
+When same data is kept on multiple mahcines, each instance is called replica
+of data. When we merge different *versions* of replicas to same 'version', this
+is called replication.
+
+When we want information to exchanged between different kind of *machines*
+(e.g. between TabletPC's calendar and Notes), we call it syncing.
+
+Above is described how things work in Notes and what terms are used. And, your 
+example is just that what replication does in Notes; merge changes, made in
+different version of replicas, into a new *different* version.
+
+
+=======
+(Of course, this is very similar to 'normal' URLs, but our purpose here is to 
give an example
+of how one refer to a particular data item in colloboration-like tool, like 
Notes)
+>>>>>>> 1.28
+
+In Notes, there are servers, which maintain replication of data, opposite to 
Gzz. What is 
+interesting in Notes' replication, is the fact that the replication of 
database not only 
+replicate the *data* but also design of data, which represent the data.
+Worth to mention is also that, even if the data and the design of data (logic 
etc.) is 
+in the *same* (physical) structure, data and design of data is very loosely 
coupled with 
+each other.
+
+Additionally, we should emphatize that how things are going towards 
non-hierarchical reference models, 
+for instance, Notes(hierarchical) and Gzz(non-hierarchical), which both are 
based on the same 
+xanalogical model.
+
+"Usability in the face of intermittent connectivity" is
+more than just mobile applications: It is also copying data
+from one computer to another, where the two computers'
+file systems are not kept in sync through a permanent
+network connection. Hmm, maybe "Usability in the face
+of irregular synchronization" or some such would
+make it clearer?
+
+Ok, let's split that in two:
+
+- Usability in the face of intermittent connectivity
+  (we cannot access data stored on the internet)
+- Usability in the face of non-synchronization
+  (we can have two independent versions of something
+  on two unconnected computers and we can easily
+  synchronize the two versions when desired)
+  
+=======
 6. Peer-to-peer implementations
 ===============================
+>>>>>>> 1.44
 
 XXX
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]