gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc...


From: Hermanni Hyytiälä
Subject: [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc...
Date: Wed, 18 Dec 2002 08:26:24 -0500

CVSROOT:        /cvsroot/gzz
Module name:    gzz
Changes by:     Hermanni Hyytiälä <address@hidden>      02/12/18 08:26:24

Modified files:
        Documentation/misc/hemppah-progradu: research_problems 

Log message:
        Some updates to answer for a)

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/research_problems.diff?tr1=1.7&tr2=1.8&r1=text&r2=text

Patches:
Index: gzz/Documentation/misc/hemppah-progradu/research_problems
diff -u gzz/Documentation/misc/hemppah-progradu/research_problems:1.7 
gzz/Documentation/misc/hemppah-progradu/research_problems:1.8
--- gzz/Documentation/misc/hemppah-progradu/research_problems:1.7       Wed Dec 
18 06:08:14 2002
+++ gzz/Documentation/misc/hemppah-progradu/research_problems   Wed Dec 18 
08:26:24 2002
@@ -221,15 +221,28 @@
        -metadata ?
        
 2.1.5. Answers to research problem
+-the most efficient algorithm: O(log n)
+       -DHTs requires O(log n) hops, log n neighbors, except Viceroy (however, 
robustness suffers)
+       -SWNs requires O(log^2 n) hops, much less neighbors (constant, is not 
depedent of n)
+-in DHTs and SWNs, it is possible to locate data in constant time, BUT it 
requires knowing all n neighbors!!!
+-in basic scenario, DHTs and SWTs assume that we have to know value's key 
beforehand
+-in this case, DHTs and SWNs require little bandwidth, because "network knows 
where the block resides"
+-in this case, FBNs, Hybrids, Social Discovery require much bandwidth, since 
there is no benefit from the block's ID at all (they have to ask constantly)
+-SWNs and FBSs require less memory than DHTs, since in these approaches, there 
are less connections to other nodes
+-SWNs relies less to neighbor nodes than DHTs
 -there are different kinds of uses of DHTs, e.g. DHT actually stores the data, 
or DHT only stores 
 the values, which are used for locating the data
--current systems mainly uses to latter method
+-existing systems mainly uses the latter method
 -both have pros and cons: 
        -in the value use, several hostile nodes might say that they hosts the 
value, but refuse to serve any host
        -in the storage use, hostile node might say that someone must save data 
block size of 1TB for her computer
-       -however, there are methods for preventing these kinds of actions
--DHTs and SWNs are very robust to failures: e.g. 20% of the nodes 
simultaneously fail, less than 0.1% of the data retrievals are failed
--the basic difference between DHTs/SWNs is that in DHT/SWN approach, systems 
"knows", where the desired data block resides
+       -however, there are methods for preventing these kinds of actions (look 
below)
+-DHTs and SWNs are very robust to failures: e.g. 20% of the nodes 
simultaneously fail, less than 0.1% of the data retrievals are failed 
+(each node has "enough neighbors for alternative routing path)
+-however, in other approaches, the network infracstructure is very vulnerable: 
usually FBNs/hybrid networks follows the power-law distribution, 
+in which, there are many nodes connected to one node. So, when this "super 
node" is under DoS attach, the performance of the network suffers a lot --> 
+most of data retrievals could fail
+-when locating data blocks, the basic difference between DHTs/SWNs is that in 
DHT/SWN approach, systems "knows", where the desired data block resides
        -based on data block ID's, nodes gives "hints" (routes more close to ID 
in the key space) constantly to a query lookup until 
        the specific node is found which hosts the block
        -no extra network traffic
@@ -237,21 +250,52 @@
 -instead, in other approaches, system doesn't "know" where the data block 
resides
        -we have to ask from any participating node, if they have the data block
        -very much extra network traffic
-
--the most efficient algorithm: O(log n)
-       -DHTs requires O(log n) hops, log n neighbors, except Viceroy (however, 
robustness suffers)
-       -SWNs requires O(log^2 n) hops, much less neighbors (constant, is not 
depedent of n)
--in DHTs and SWNs, it is possible to locate data in constant time, BUT it 
requires knowing all n neighbors!!!
--in basic scenario, DHTs and SWTs assume that we have to know value's key 
beforehand
--in this case, DHTs and SWNs require little bandwidth, because "network knows 
where the block resides"
--in this case, FBNs, Hybrids, Social Discovery require much bandwidth, since 
there is no benefit from the block's ID at all (they have to ask constantly)
--SWNs require less memory than DHTs, since there are less connections to other 
nodes
--SWNs relies less to neighbor nodes than DHTs
 -proposol: most efficient approaches for this research problem are DHTs and 
SWNs, since they are based on key-value pairs (Storm has a key as block ID)
- 
-
-
-
+-research challenges for DHTs/SWNs:
+       -handling failures
+               -replication            
+               -caching
+       -coping with system in flux
+               -large system may never achieve a ideal state
+               -no overwhelming theory for handling system in flux
+       -robustness with untrusted participants
+               -data authentication
+               -self-certifying pathnames
+               -integrity of routing tables            
+       -network-awareness for performance
+               -topology-awareness
+               -traffic-awareness
+-there are known security considerations related to DHTs/SWNs, but there are 
also (partial) solutions to them:
+               -incorrect lookup routing
+                       -problem: an individual malicious node could forward 
lookups to an incorrect or non-existent node
+                       -solution: at each hop, querier knows that the lookup 
is supposed to get "closer". The querier should check this so that
+                                  this attack can be detected and backtrack as 
necessary in routing path
+               -incorrect routing updates
+                       -problem: an individual malicious node could corrupt 
the routing tables of other nodes by sending incorrect updates
+                       -solution: system maintains information of requirements 
of correct routing tables (and verifies them)                   
+               -partition
+                       -problem: A new node may join parallel network, formed 
a set of malicious nodes
+                       -solution: Node bootstrap via some sort of trusted 
source (e.g. history information of trusted nodes)
+               -storage and retrieval attacks
+                       -problem: an individual malicious node could join in 
the lookup correctly, but deny the existence of data it is responsible for
+                       -solution: replication: storage layer must implement 
replication in a way so that no single node is responsible for replication
+               -inconsistent behaviour
+                       -problem: an individual malicious node could present a 
good face to only its neighbors, so that neighbors don't see any reason 
+                                 to remove node from their routing tables.
+                       -solution: (partial solution) most DHT systems have 
ways of jumping to distant points in the identifier space in order to speed up 
queris and
+                                  jump over the "bad" neighborhood 
(backtracking and jump over)
+               -overload of targeted nodes
+                       -problem: an individual malicious node could attempt to 
overload targetted nodes with garbage packets
+                       -solution: (partial solution) data replication in 
physichally disparate locations 
+               -rapid joins and leaves
+                       -problem: an individual malicious node could trick the 
system with rapid joins and leaves for extra rebalancing of the system
+                       -solution: (partial solution) The rate of replication 
and amount of data stored at each node must ne kept at the levels that allow 
+                                  for timely replication without causing 
network overload when even regular nodes join and leave the network
+               -unsolicited messages
+                       -problem: an individual malicious node could send an 
unsolicited response to a query (man in the middle attack)
+                       -solution: (partial solution) the defence against this 
would be standard authentication techique, such as digital signatures. However, 
+                                  there are no working implementation for this 
yet (digital signatures are expensive currently)
+               
 2.2. "Searching for most recent Storm block associated with specific urn-5 
name, where
      the block has been signed with a given key"
 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]