[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reliability of RPC services

From: Jonathan S. Shapiro
Subject: Re: Reliability of RPC services
Date: Tue, 25 Apr 2006 16:20:34 -0400

On Tue, 2006-04-25 at 20:23 +0200, Tom Bachmann wrote:
> Hash: SHA1
> >   [...] I can see some ways to reduce
> >   its overhead if a small delay in notification is acceptable. The
> >   achievable delay is likely to be about 5 minutes, [...]
> are you speaking about something related to checkpointing (disk garbage
> collection?) here? I wonder why the checkpoint interval is so long. I
> think I read somewhere that checkpointing takes about 1/100 of a second,
> so wouldn't it be feasible to checkpoint more often, say, once a minuite?

Let me answer the "speed of checkpointing" first, and then go back to
your initial question.

Checkpointing has two phases: snapshot and stabilization.

On modern Pentium, snapshot phase can be done in O(1000) cycles -- it's
a system call that sets a single word.

The stabilization phase takes time that is proportional to the number of
dirty pages/objects. In real systems, the proportion of dirty objects is
very stable at around 25%, so in practice the stabilization phase is
proportional to the amount of memory installed.

Depending on the amount of memory, one may or may not be able to
checkpoint more often than one minute. 

Now let me back up and answer your first question. If I may be permitted
to rephrase it, the question is "How long does it take to notice that
there are no more references to an object?"

Here is a sketch of a hybrid design, but I am concerned about the

I assume here that we are only interested in death notices on FCRBs,
since those are the objects named by reply capabilities.

First, we divide capabilities into two groups: on-disk capabilities and
in-memory capabilities. For on-disk capabilities we use reference
counting. Whenever a capability is converted from in-memory format to
on-disk format, we increment the reference count (which is efficient
because we know the object is in memory at that time). When the
capability goes from on-disk to in-memory form, we decrement the
reference count.

Second, we impose the requirement that all capabilities are converted to
in-memory form before being copied (i.e. both the source capability and
the capability that lives in the target slot). This will sometimes cause
unrelated objects to come in to memory, but empirically this will not
occur very often and on modern memories it is probably an acceptable

The goal here is to avoid changing the reference count during a
normal-case copy, because that update causes a cache line miss, which
may cost several hundred cycles on modern machines.

An object is unreachable when *all* of the following statements are true
at the same time:

  1. The disk reference count value is zero.
  2. The main-memory collector has determined that no
     valid, in-memory capabilities to this object exist.
  3. The object has not been destroyed.

There is an unfortunate problem with this design: we must either

  (a) guarantee that the target object stays in memory as long as
      an in-memory capability exists. This makes paging much more


  (b) find a way to defer the disk counter increments -- which may
      be possible if we are very very careful, but this is the
      part that I am nervous about getting right if the system
      crashes at the wrong moment.

Assuming that we can get this right, then there is a delay in
notification, but that delay will be no longer than the checkpoint
interval (between 1 and 5 minutes).

Note that this design relies on the fact that a valid FCRB never
contains a resume capability. This guarantees that no cycles are
possible, and therefore ensures that a reference count is sufficient.

Unfortunately, we are likely to need to add a capability slot to the
FCRB specifically for the purpose of holding a reply capability, and
this introduces the possibility of cycles. Given a cycle in disk data
structures, reference counting is still a good optimization, but certain
cases can only be discovered by disk GC.

Given this explanation, I hope it becomes clear that "notify on
unreferenced" with any sort of low latency is probably feasible only if
the reply capability is a "single copy" capability.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]