gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] manuscripts/storm article.rst


From: Benja Fallenstein
Subject: [Gzz-commits] manuscripts/storm article.rst
Date: Sat, 08 Feb 2003 11:56:36 -0500

CVSROOT:        /cvsroot/gzz
Module name:    manuscripts
Changes by:     Benja Fallenstein <address@hidden>      03/02/08 11:56:36

Modified files:
        storm          : article.rst 

Log message:
        more

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.113&tr2=1.114&r1=text&r2=text

Patches:
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.113 manuscripts/storm/article.rst:1.114
--- manuscripts/storm/article.rst:1.113 Sat Feb  8 08:57:01 2003
+++ manuscripts/storm/article.rst       Sat Feb  8 11:56:36 2003
@@ -574,8 +574,8 @@
 comments of articles etc.
 
 
-5. Indexing
-===========
+5. Application-specific reverse indexing
+========================================
 
 Finding links to and transclusions of a piece of content in
 Xanalogical storage is but one example of *reverse indexing*
@@ -687,27 +687,29 @@
 
 .. [Figure ? -Hermanni]
 
-6.1. Pointers
--------------
+6.1. Pointers: implementing mutable resources
+---------------------------------------------
 
 In Storm, *pointers* are used to implement mutable resources.
-A pointer is a globally unique identifier that can refer to
-different blocks over time.
-
-To assign targets to pointers, a special type of block is used.
-If we have a block B and a pointer P (P is simply a random id),
-then we create an additional block whose mere presence tells Storm
-that pointer P points to block B. This is one application of Storm
+A pointer is a globally unique identifier (usually created randomly)
+that can refer to different blocks over time. A block a pointer
+points to is called the pointer's *target*.
+
+To assign a target to a pointer, we create a special kind of block,
+a *pointer block*, representing an assertion like *pointer P targets
+block B*. To find the target of pointer P, Storm searches for
+blocks of this form. This is one application of Storm
 indexing (Section 5): Keeping an index of all pointer blocks
 about pointer ``P``, we can quickly find out the current one.
 
 UML to be drawn::
 
-    +---------+ 1      * +---------------+
-    | Pointer |----------| Pointer block |
-    +---------+          +===============+
-                         |list: obsoleted| 
-                         +---------------+
+                                   obsoleted block
+                                     * -----\ 
+                                       /      \
+    +---------+ 1      * +---------------+    |
+    | Pointer |----------| Pointer block |----/
+    +---------+          +---------------+ 1
                                * |
                                  | 
                                  |
@@ -718,12 +720,16 @@
 
 In addition to the pointer and the target, pointer blocks contain
 a list of zero or more *obsoleted* pointer blocks. When a new version
-is created, it usually supersedes one older version; when the
-pointer block targeting the new version is created, it obsoletes
+is created, it usually supersedes one older version; 
+the corresponding pointer block then 'obsoletes'
 the pointer block targeting the superseded version.
 Only the new, non-obsoleted block will be considered when
 loading the document (although the pointer blocks pointing to
-past versions remain accessible for tracing the document's history).
+past versions remain accessible for tracing the document's history) [#]_.
+
+.. [#] All known pointer blocks for a pointer are still loaded
+   when the pointer is resolved. Storm then discards
+   the obsoleted ones.
 
 If, on the other hand, two people collaborate on a document
 and produce two independent versions, neither will obsolete
@@ -736,7 +742,23 @@
 alternative versions have been consolidated, a pointer block
 obsoleting both consolidated previous versions is created.
 
-XXX talk about security -> we need a way to authenticate authors...
+In a multi-user environement, we generally want only
+one user or group of users to be able to produce new
+official versions of a given document (an exception may
+be wikis, which are collaboratively edited by anyone
+interested [ref]). It is not yet clear how to do this.
+Signing pointer blocks digitally may be sensible, but
+digital signatures require a public key infrastructure
+and a trusted timestamping mechanism [#]_, which
+is hardly feasible for a system intended to be used
+for off-line as well as on-line work.
+For long-term publishing, one-time signatures have been
+found useful [ref]. For the time being, the pointer mechanism
+works only in trusted Storm zones (Section 3), e.g.
+in a workgroup collaborating on a set of documents.
+
+.. [#] Without timestamps, digital signatures are only valid
+   for a limited time [ref].
 
 The ability to retain multiple 'current' versions of a document
 can be useful, for example when there is no time to consolidate
@@ -745,26 +767,28 @@
 the computer pops up a little window saying, "There are currently
 three different versions of this resource. Which one would you like
 to open?" Overcoming this problem will be an important issue
-as the system matures. Possible situation-specific solutions include one
-where there may be (simple) rules to prioritize them based on metadata, e.g.
-if some sort of an official or original version is preferred.
+as the system matures. A possible way to alleviate this problem
+would be to allow prioritizing alternate versions based on metadata,
+for example opening an official or original version automatically
+if one exists.
 
-While we see the pointer system as useful for 
-asynchronous collaboration, it is not well suited to Web-like publishing.
+While we think that alternative current versions are useful for
+asynchronous collaboration, they aren't well suited to Web-like publishing.
 For this, a different system may be practical, where pointer blocks
 store a target and a timestamp; when resolving a pointer, the newest
 pointer block for that pointer would then be selected.
 
 In summary, the current pointer system seems promising, but 
-there are a number of unresolved issues with it:
+there are a number of unresolved issues with it: 
+authenticating pointer blocks; the user interface for choosing
+between alternative current versions; and the suitability
+for Web-like publishing. More research is needed in this area.
 
-- signatures needed online [possible refs: ConChord, SDSI/SPKI ? -Hermanni]
-- multiple 'current' versions annoying
-- not suited to Web-like publishing
+.. authenticating -> [possible refs: ConChord, SDSI/SPKI ? -Hermanni]
 
 
-6.2. Diffs
-----------
+6.2. Diffs: storing alternative versions efficiently
+----------------------------------------------------
 
 [benja says: Please do not touch this section, but tell me
 how to improve it instead. Reason: this is the meat for my thesis,
@@ -788,8 +812,8 @@
 leading up to the current one would be broken if any
 previous version were deleted. 
 
-Additionally, many versioning systems
-[ref-- CVS?] store the current version as well as the differences,
+Additionally, many versioning systems (e.g. CVS [ref])
+store the current version as well as the differences,
 enabling them to retrieve the current version quickly and compute
 recent versions by applying the differences 'backwards,'
 starting from the current version. This technique would also be
@@ -885,33 +909,59 @@
         | writeDiff(:OutputStream, :Diff)       |
         +---------------------------------------+
 
+The diff system is more complicated than simple block storage,
+and therefore more liable to bugs. Yet, as long as we do not
+do backward diffing, saving is purely additive: New diffs
+are added, but old diffs aren't changed. Therefore, when a save
+goes wrong, again only the changes after the previous save are lost.
+With backward diffing, we remove the cached full version,
+but we can reconstruct it using the diffs. We believe that
+diff-based Storm storage is still more reliable than file storage,
+where a simple application bug can lose all previous work
+on a document.
+
+To protect against buggy ``Diff`` or ``VersionFormat``
+implementations, before storing a diff, we always check
+that we can reconstruct the appropriate version block from it;
+if this fails for some reason, we store the full version block
+instead. At the cost of some storage space, this protects
+the user's data.
+
+Currently, Storm pool implementations do not know anything about diffs;
+all the functionality described here is implemented on top of them.
+For a networked system, however, it would be useful if a server
+could recreate version blocks before sending them to a client.
+Then, instead of transferring all the diffs, only the full version
+would have to be sent through the network.
+
 
 7. Peer-to-peer implementations
 ===============================
 
-7.1. Overview
--------------
 During the last few years, there have been a lot of research efforts related 
 to Peer-to-Peer (p2p) resource discovery, both in industry and academic world.
 Intensive work in p2p field has yielded two main approaches: broadcasting 
 [ref: gnutella1, kazaa, limewire, shareaza] and Distributed Hash Tables (DHT) 
 [refs: chord, can, tapestry, pastry, kademlia, symphony, viceroy, skip graphs, 
-swan]. Both of these approaches build an *application* level overlay 
connectivity 
-graph network. However, there are *significant* differences between 
broadcasting 
-and DHT approach, i.e. in scalability and efficiency properties, how the 
overlay 
-network is created and maintained and how queries are performed. DHT is seen 
as 
-scalable approach and usually provides (poly)logarithmic bounds to *all* 
internal 
-operations (footnote about 'stable state' ?), while broadcasting can't achieve 
+swan]. Both of these approaches use an application level overlay network.
+However, there are significant differences between broadcasting 
+and DHT approach in scalability and efficiency properties. A DHT
+usually provides log-like bounds to *all* internal 
+operations [#]_ (footnote about 'stable state' ?), while broadcasting can't 
achieve 
 either of these.
 
-[footnote: It's not clear whether *all* proposed DHT designs can preserve
-(poly)logarithmic properties when participants are heterogeneous and they 
-join and leave the system in a dynamic manner. 
-
-In DHT approach, both keys and the addresses of peers are mapped into one 
virtual 
-key space. The form of key space depends on implementation (e.g. can be a 
circle). 
-The mapping makes possible to assign number of data items to a peer, based on 
how 
-'close' (e.g. numerical, XOR) data item's and peer's keys are each other. 
Thus, 
+.. [#] It's not clear whether *all* proposed DHT designs can preserve
+   log-like properties when participants are heterogeneous and they 
+   join and leave the system in a dynamic manner. 
+
+A distributed hashtable stores key/value pairs.
+In a DHT, both hashtable items and the addresses of peers 
+are mapped into a single virtual key space. The form of the key space 
+depends on implementation (for example, Chord uses a circle).
+A distance metric (e.g. numerical, XOR) is used to find the peer
+whose position in the key space is 'closest' to the key of a given item.
+This peer is responsible to store the item (so both queries and insertions
+relating to the key are routed to it.) Thus, 
 DHT's overlay connectivity graph is structured. On the other hand, the overlay 
 connectivity graph of broadcasting approach is formed more or less (depends on 
 implementation) in a random manner. 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]