demexp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Demexp-dev] A protocol proposal to track modifications on server


From: felix . henry
Subject: Re: [Demexp-dev] A protocol proposal to track modifications on server
Date: Thu, 16 Jun 2005 11:56:49 +0200
User-agent: Internet Messaging Program (IMP) H3 (4.0.2)

Dear David,

Here are my comments:

1/ Concerning your proposed approach
-I think that it is probably most efficient in terms of bandwidth.
I would recommend using a binary tree (instead of 10-tree you seemed
to suggest in your mail) because I would intuitively
guess it is optimal in terms of bandwidth needed to check the
timestamps (but not optimal in terms of server workload, see below).
-I may put some heavy worldload on the server because each time an
object is modified (mainly questions and tags), all timestamps
hierarchically "above" the said object are to be modified. In this
case a non-binary tree is better because the larger the order of the
tree, the less parents to update.
-I may be complex to implement (as compared to brute method, see below)
because of the recursive nature of the algorithm and because the server
and client trees do not have the same shape (there are always more
objects on the server due to new questions etc.)

2/ Possible other approach: brute force
A possible other approach would consist in brute force compression
of all timestamps on the server. Imagine that the server compresses its
timestamps into a single file, updated every 2 minutes. The
compressed file would be downloaded at client connection. All comparisons
would then be performed on client in a single shot.
Now, say the project gets moderately big, say:
-100000 users
-1000000 questions
-1000000 tags
(we assume that if the project gets that big, funding people to implement a more
complex approach will be easy)

That's 2100000 timestamps. Now, obviously, the time resolution of
the timestamps should not be higher than the time resolution of the
file update on the server. With 24 bits for a timestamp,
your system can last for 2^24=16777216 and 16666216/(30*24*365)=63,8 years.
The uncompressed file would be:
2100000 timestamps * 24 bits /(8*1024)=6 MB

Now when you compress the timestamps using a gzip-like algorithm, you are
likely to obtain very high compression ratio, I'd say at least around 10:1.
This is because most of the timestamps will belong to a very short time range
because they will have been updated within the past few weeks, and so they
will be highly redundant. That gets you down to a 600 kB file
*in the very worst case*.

That's very fast to generate on the server (only needed every 2 minutes ) and
pretty quick to download (imagine the available bandwidth in 63 years if
earth is still there of course!).

This method has the advantage of being very simple to implement. In a way
it is also scalable because compression is inherently logarithmic (if
you double the number of timestamps, you only need one more bit, and usually
much less). Add bandwidth growth to this and things should run
smoothly.

Once again, the main advatange is the simplicity of implementation:
-use 24 bits stamps to indicate the number of seconds since 1/1/2005
divided by 120 when an object is updated (you can actually use
32 bits words if it's more convenient to you because the compression
will automatically remove the redundancy)
-Compress with gzip every 2 minutes on the server
-When client connects, download the whole file.
-Make comparisons on client in a single shot

You would need a file format to be able to assign each timestamp to its
corresponding object but that should be easy.

Let me know what you thing.


  Felix






reply via email to

[Prev in Thread] Current Thread [Next in Thread]