[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [GNUnet-developers] How do updates work?
Re: [GNUnet-developers] How do updates work?
Sat, 8 Aug 2009 00:17:42 +0200
Am Thursday 06 August 2009 01:01:41 schrieb Kenneth Almquist:
> I've been trying to understand how namespaces are supposed to support
> As I understand it, a namespace advertisement contains a string called
> the root, which can be used in a namespace search. A result of a
> search of a namespace may contain a string which can be used for a
> subsequent search, and it looks like the search code will initiate
> a search using this string automatically.
> Thus one could post a file under the keyword A with next_id set
> to B, and subsequently publish a file under keyword B with next_id
> set to C, and finally publish a file under keyword C. This would
> allow someone to find all three files using repeated searches starting
> with A. The question is: how does this differ from publishing all
> three files under the keyword A?
Good question. The answer is simple: you could tell people (for example, in
an updated namespace announcement) that they should search for "B". Then
they would never see "A". Or you could use an informal system:
MyContent-2007 update: MyContent-2008
Someone may then guess that the next update will be "MyContent-2009" and
jump-ahead manually. Those who don't know the system would find the 2009
version as you describe it. With more years (or finer granularity) this may
become more useful.
> In terms of speed, a single search
> for A is likely to be faster than performing a series of searches
> starting with A. In terms of completeness, the answer is less clear.
> Possibly a chain of searches would be more likely to locate all the
> files, because each search only has to produce one result. On the
> other hand, a single failed search in the middle of chain means that
> none of the files in the rest of the chain will be found, so I would
> suspect that the odds of the search finding *most* of the files would
> be higher with a single keyword search than with a chain once the
> chain reached a certain length.
Well, who said you need to do chains? The system supports publishing multiple
results under "A", so you could say that "B" updates "A" initially, and then
later you can ALSO publish that "Z" updates "A". Then, a search would follow
both chains (presumably A->B->D->...->Z and A->Z). The real question in this
case is of course if the GUI support for this is adequate (and I'm sure the
answer is "no").
> If the above mechanism provides a way to perform updates, I don't see
> it. When the ECRS paper states that, "since SBlocks are signed, it
> is possible to allow updates," the point is that updating an existing
> file involves deleting the old version, so we want the originator of
> the content to be able to do that, without allowing anyone else to do
Ah, well, maybe the paper is not precise here. The idea was always to support
finding of "more recent" versions, but always without the ability to
remove "older" revisions. We never intended to support deletions.
> A scheme to do this might work as follows:
> Allow an SBlock to include a 16 bit version number. (For backward
> compatibility, the version number is optional, and an omitted version
> number is equivalent to a version number of zero.) The rules for
> dealing with version numbers are:
> 1) Any time a host sees an SBlock with a given version number, it
> gets rid of versions of the SBlock with lower version numbers.
So the version number would have to be stored in plaintext, which is likely
fine. Now, we may want to allow multiple (valid/current) SBlocks per
namespace, so we would need to refine this to say "lower version numbers" for
SBlocks with the same identifier. But identifiers are encrypted, so hosts
can not tell if two SBlocks have the same identifier => oops.
> 2) A host does not earn trust for supplying an obsolete version.
> This means that if a host is doing a search, gets one version
> of an SBlock, and then receives a later version, it should
> revoke the trust credited to the host that provided the older
> version and give it to the host that provided the newer
> version. This rule means that there is no incentive for
> hosts to keep old versions around.
Other intermediaries would most likely not be able to tell that the version is
obsolete. Or worse: a host could deliberately keep the older versions, first
supply the oldest (cash in), then the second-oldest (cash in again), etc..
Combined with providing low expiration times this would give peers an
incentive to keep older revisions (assuming the namespace is popular) and
hand them out first.
> 3) If a host receives an obsolete version of an SBlock, it can
> request a refund of the trust expended on the search by
> sending a more recent version of the SBlock. This requires
> a new message type. This rule means that hosts which have
> cached obsolete SBlocks are likely to be provided with updated
> versions of those SBlocks. (Otherwise, a host might cache
> an obsolete version of an SBlock indefinitely, since it would
> receive periodic search requests that matched it.)
To ask for a refund, you would have to supply proof (more recent SBlock),
which would cost both peers bandwidth (more than the search query!) and IO
(need to look up the block) and, possibly worse, might allow peers to abuse
the system: the author of the namespace could just always make up a new
revision and then ask for a refund from peers storing his SBlocks. So I'm not
sure this can be done cleanly and efficiently. Not to mention that asking
for a refund would require peers to keep a charge-history (maybe the peer
returned the SBlock without charging in the first place?), which would again
Finally, given that the sqstore provides long-term soft-state storage, we have
some expiration of old, obsolete versions anyway. So I'm not convinced that
deletions in general are necessary.
> To allow a user with a given version of a file to see if a more recent
> version of the file exists, we need a variant of the keyword search
> which specifies the minimum version number that is being searched for.
> It would be conceptually simplest to say that a search with a version
> number of N would return any version greater than or equal to N. To
> allow for reasonably efficent Bloomfield filtering, I would suggest
> that a search specifying version number 0 should query the data base
> for any version of the file, and a search with a nonzero version
> number N should query for versions N through at least N+4.
You can already do this: simply use version numbers for your identifiers.
Use "0" for your root, then publish 1 as the first update, 2 for the second,
etc. Then if some user wants to query for versions > N, he just has to use N
for the identifier. So if you want a simple, linear versioning system, this
would certainly be a good identifier-convention to use. If you want to
publish on a regular basis, using some date-format would likely be better.
The current use of user-defined strings is in this respect a very flexible
way to choose identifiers that can be adapted (by humans, not software) to
fit many needs. The only thing you don't get is deletion of older versions,
but as a VCS-addict I tend to think of this as a good thing.
> Versioning only applies to search records. Since DBlocks and IBlocks
> may be shared by different files, I don't see any way to explicitly
> get rid of the IBlocks and DBlocks from an obsolete version of a file.
That's where it is nice that sqstore is soft-state: DBlocks, IBlocks, SBlocks
all will eventually be "expired" by the sqstore (and typically at the same
> Once the obsolete version of the file no longer shows up in searches,
> there will be no more requests for the IBlocks and DBocks (unless
> they are, in fact, shared by a different file), and they can be pushed
> out of caches to make space for blocks thare are being requested.
That would also be true (we expire by expiration and popularity).
Anyway, please take this just as comments and clarifications, this is a good
discussion to have at this point in time (before 0.9.0's FS APIs are too far