[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz-commits] manuscripts/storm article.rst
From: |
Benja Fallenstein |
Subject: |
[Gzz-commits] manuscripts/storm article.rst |
Date: |
Sat, 15 Feb 2003 08:29:04 -0500 |
CVSROOT: /cvsroot/gzz
Module name: manuscripts
Changes by: Benja Fallenstein <address@hidden> 03/02/15 08:29:04
Modified files:
storm : article.rst
Log message:
start making the section 3 <-> section 4 swap work text-wise
CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.154&tr2=1.155&r1=text&r2=text
Patches:
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.154 manuscripts/storm/article.rst:1.155
--- manuscripts/storm/article.rst:1.154 Sat Feb 15 06:54:00 2003
+++ manuscripts/storm/article.rst Sat Feb 15 08:29:03 2003
@@ -127,7 +127,7 @@
We address the mobility of documents by block storage
and versioning, while we use Xanalogical storage
to address the movement of content between documents (copy&paste);
-see Fig. [ref-storm_layers]_.
+Fig. [ref-storm_layers]_ provides an overview of Storm's components.
.. uml:: storm_layers
:caption: Components of the Storm model
@@ -172,7 +172,7 @@
usable yet.
This paper is structured as follows. In the next section, we describe
-related work. In section 3, we give an overview of xanalogical model.
+related work. In section 3, we give an overview of the xanalogical storage
model.
In section 4, we introduce the basic storage unit of our
system, i.e. file-like blocks identified by cryptographic hashes. In section
5,
we discuss application-specific reverse indexing of blocks by their
@@ -326,34 +326,6 @@
at the cost of each peer maintaining one node in the overlay network
for each (key,value) pair it publishes.
-.. hemppah's original text before benja's changes:
- In a DHT, both hashtable items and the addresses of peers
- are mapped into a single virtual key space. The form of the key space
- depends on implementation (for example, Chord uses a circle).
- A distance metric (e.g. numerical, XOR) is used to find the peer
- whose position in the key space is 'closest' to the key of a given item.
- This peer is responsible to store the item (so both queries and insertions
- relating to the key are routed to it.) Thus,
- DHT's overlay connectivity graph is structured. On the other hand, the
overlay
- connectivity graph of broadcasting approach is formed more or less (depends
on
- implementation) in a random manner.
-
- When performing queries, in broadcasting approach, peer sends a query
request to a
- subset of its neighbors and these peers to their subsequent neighbors. The
- process will continue as long as query's time-to-live (TTL) value hasn't
been reached.
- In DHT approach, query request is deterministically routed towards the peer
- which hosts a specific data item. Routing is based on 'hints' (based on
- differences between data item's key and peer's key), which each peer
provides
- along the routing path.
-
- Obviously, there are major differences within approaches. For the DHT
approach,
- perhaps the main difference is *what* is self-organized into a
- virtual key space. For instance, in SWAN [ref] and Skip Graph [ref], *data
- items* self-organise into a virtual address space, while in other DHT
- implementations *peers* self-organise in structured form in a virtual
space.
- In the broadcasting approach, implementations' differences mostly lie in
the
- *structural level* of the overlay network, i.e. super peers and peer
clusters.
-
The basic definition of a distributed hashtable does not indicate
how large the keys and values used may be. Intuitively, we expect keys
to be small, maybe a few hundred bytes at most; however, there are different
@@ -364,12 +336,6 @@
a *home-store* and the latter a *directory* scheme (they call the peer
responsible for a hashtable item its 'home node,' thus 'home-store').
-.. Should we discuss applications of p2p systems (CFS, OceanStore, Squirrel,
...)
- here? If so, which ones?
- [CFS/PAST(DHT, GUID, block vs. files), perhaps Freenet too (GUID) ?
-Hermanni]
-
-.. thesis-benja: remove paragraph below
-
CFS [ref] is a global peer-to-peer storage system. CFS is built upon Chord DHT
peer-to-peer routing layer[ref]. CFS stores data as blocks. However, CFS
*splits* data
(files) into several miniblocks and spreads blocks over the available CFS
servers.
@@ -427,63 +393,37 @@
Through this mechanism, the system can show to the user all documents
that share text with the current document.
-To keep track of links and transclusions, the system keeps a global index
-of documents by the characters they contain, and of links by the characters
-they refer to. Thus, for each character in the document, the system
-queries the index for other documents containing this character,
-and shows them as transclusions. Resolving links is a multi-step process.
-Each link is modeled as two collections of characters: the two
-endpoints of the link. To show links to a specific document,
-the system firstly uses the link index to find links
-to each character in the document. Secondly, for each link,
-it looks at the *other* set of characters in the link-- the target
-of the link, if the original character was the source, and vice versa.
-Thirdly, it looks for documents containing these target characters.
-This way, even if both the source and target characters of the link
-are moved to a different document, the link stays connected to them.
+To track links and transclusions, the system indexes documents by
+the characters they contain, and links by the characters they refer to.
+To find transclusions of a document, we search the index for other
+documents containing any character from this document. To show links,
+we first search for links refering to any character in this document.
+However, when we have a link, we don't yet have the document it links to.
+Therefore, in a second step, we search for documents containing
+the characters the link targets.
-Of course, doing any expensive operation for *every* character
+Of course, doing any expensive operation
+(like an index lookup) for *every* character
in a document does not scale very well. In practice,
characters typed in consecutively are given consecutive ids,
such as ``...:4``, ``...:5``, ``...:6`` and so on, and
-operations are on *spans*, i.e. consecutive ranges of characters
-(``...:4-6``) in a document. In Storm, in each editor session we
-create a block with all characters entered in this session (the content type
-being ``text/plain``). To designate a span of characters
-from that session, we use the block's id, the offset of the first
-character, and the number of characters in the span.
-This technique was first introduced in [lukka02guids]_.
-
-In Xanadu, characters are stored to append-only *scrolls*
-when they are typed [ref]. Because of this, in Storm, we call the
-blocks containing the actual characters *scroll blocks*. The documents
-do not actually contain the characters; instead, they are
-*virtual files* containing span references as described above.
-To show a document, the scroll blocks it references are loaded
-and the characters retrieved from there [#]_.
+operations are on *spans*, i.e. ranges of consecutive characters
+(``...:4-6``).
-.. [#] It is unclear whether this approach is efficient for text
- in the Storm framework; in the future, we may try storing
- the characters in the documents themselves, along with their
- permanent identifiers. For images or video, on the other hand,
- it is clearly beneficial if content appearing in different
- documents-- or different versions of a document-- is only
- stored once, in a block only referred to wherever
- the data is transcluded.
-
Our current implementation shows only links between documents
that are in memory at the same time [screenshot of xupdf, perhaps ref too
(submitted) antont: was thinking the same, it would illustrate this well].
-In the future, we will implement a global distributed index at top of
-a distributed hashtable, with the scroll blocks' ids as the keys.
+In the future, we will implement a global distributed index on top of
+a distributed hashtable (Section 5).
To find the transclusions of a span, the system will retrieve
-all transclusions of any span in the scroll block, then
+all transclusions of any span with the same prefix (``...:``), then
sort out those that do not overlap the span in question.
Since the problem is to search for overlapping ranges,
-the spans cannot be used as hashtable keys. However, as the blocks
-will be relatively small (limited by the amount of text
-the user enters between two saves of a document), we hope
+the spans themselves can not be used as hashtable keys.
+However, we keep the number of spans with the same prefix
+relatively small (limited by the amount of text
+the user enters between two saves of a document). Therefore, we hope
that this will not be a major scalability problem. Otherwise,
systems that allow range queries, such as skip graphs [AspnesS2003]_,
skipnet [ref], may prove useful.
@@ -502,7 +442,7 @@
Figure [ref-figdocmovement]_ illustrates how xanalogical storage addresses the
issue of
movement of data between documents. Initially, there are documents D1 and
D2, with two links (directed arrows in the figure) from D1 to two different
-elements in D2, A and B. The links actually are to the /spans/ A and B that
+elements in D2, A and B. The links actually are to the *spans* A and B that
are stored in the scroll, but shown as parts of D2, as illustrated with the
dashed lines. Then, when in the next step the document D2 is split in two --
becoming documents D2.1 and D2.2 -- with link target A in the first and B
@@ -750,6 +690,33 @@
version of a document whose identifier is hard-wired into
the software (mutable documents are described in section 6.1).
+4.2. Xanalogical storage on top of blocks
+-----------------------------------------
+
+In Storm, in each editor session we
+create a block with all characters entered in this session (the content type
+being ``text/plain``). To designate a span of characters
+from that session, we use the block's id, the offset of the first
+character, and the number of characters in the span.
+This technique was first introduced in [lukka02guids]_.
+
+In Xanadu, characters are stored to append-only *scrolls*
+when they are typed [ref?]. Because of this, in Storm, we call the
+blocks containing the actual characters *scroll blocks*. The documents
+do not actually contain the characters; instead, they are
+*virtual files* containing span references as described above.
+To show a document, the scroll blocks it references are loaded
+and the characters retrieved from there [#]_.
+
+.. [#] It is unclear whether this approach is efficient for text
+ in the Storm framework; in the future, we may try storing
+ the characters in the documents themselves, along with their
+ permanent identifiers. For images or video, on the other hand,
+ it is clearly beneficial if content appearing in different
+ documents-- or different versions of a document-- is only
+ stored once, in a block only referred to wherever
+ the data is transcluded.
+
5. Application-specific reverse indexing
========================================
@@ -839,25 +806,10 @@
Clearly, for block storage to be useful, there has to be a way to
efficiently update documents/maintain different versions of documents.
We achieve this by a combination of two mechanisms. Firstly, a
-*pointer* is an updatable reference to a block;
-pointers can be updated by creating a specific kind of Storm block
-representing an assertion of the form, "pointer ``P`` now points
-to block ``B``." Pointers are resolved with the help of a Storm index
-mapping pointer identifiers to blocks providing targets for that pointer.
-Through this mechanism, we can keep old versions of documents
-along with the current versions.
-
-.. [Figure ? -Hermanni]
-
-Secondly, in the spirit of version control systems like CVS,
-we do not store *each version*, but only the differences between versions.
-However, we still refer to each full version by the id of a block
-containing that version, even though we do not store this block.
-When we want to access a particular version, we reconstruct it
-using the differences, and then check the result using
-the cryptographic hash in the full version's block id.
+*pointer* is an updatable reference to a block.
+Secondly, similar to version control systems like CVS,
+we do not store each version, but only the differences between versions.
-.. [Figure ? -Hermanni]
6.1. Pointers: implementing mutable resources
---------------------------------------------
@@ -865,7 +817,7 @@
In Storm, *pointers* are used to implement mutable resources.
A pointer is a globally unique identifier (usually created randomly)
that can refer to different blocks over time. A block a pointer
-points to is called the pointer's *target*.
+points to is called the pointer's *target* (Fig. [ref-storm_pointers]_).
To assign a target to a pointer, we create a special kind of block,
a *pointer block*, representing an assertion like *pointer P targets
@@ -931,7 +883,7 @@
for off-line as well as on-line work.
For long-term publishing, one-time signatures have been
found useful [anderson98erl]_. For the time being, the pointer mechanism
-works only in trusted Storm zones (Section 3), e.g.
+works only in trusted Storm zones (Section 4), e.g.
in a workgroup collaborating on a set of documents.
.. [#] Without timestamps, digital signatures are only valid
@@ -967,13 +919,8 @@
6.2. Diffs: storing alternative versions efficiently
----------------------------------------------------
-[benja says: Please do not touch this section, but tell me
-how to improve it instead. Reason: this is the meat for my thesis,
-due Feb 9th, so I want all possible improvements on it
-to go there, too ;-) [and I'm of course allowed to solicite feedback,
-but not allowed to use stuff written by someone else...]]
-[Hm, should we move/remove 'Additionally, many versioning'
-paragraph into related work ? -Hermanni]
+.. [Hm, should we move/remove 'Additionally, many versioning'
+ paragraph into related work ? -Hermanni]
The pointer system suggests that for each version of a document,
we store an independent block containing this version. This
- [Gzz-commits] manuscripts/storm article.rst, (continued)
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/13
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/13
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/14
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/14
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/14
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/14
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/14
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/15
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/15
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/15
- [Gzz-commits] manuscripts/storm article.rst,
Benja Fallenstein <=
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/15
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/15
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/15
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/15
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/15
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/15
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/15
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/15
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/15
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/15