l4-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A comment about changing kernels


From: Jonathan S. Shapiro
Subject: Re: A comment about changing kernels
Date: Sun, 30 Oct 2005 09:40:18 -0500

On Sat, 2005-10-29 at 18:47 +0200, Bernhard Kauer wrote:
> On Sat, Oct 29, 2005 at 11:42:28AM -0400, Jonathan S. Shapiro wrote:
> > > > Three questions for Bernhard:
> > > > 
> > > > 1. Who pays for the storage for all of these endpoint capabilities that
> > > >    the server must retain?
> > > 
> > > Think of a server who offers session based protocols: for example a 
> > > network stack.
> > > A TCP/IP network server has to hold TCP connections, receive and send 
> > > buffers.
> > > In this situation accounting the resources to the client is needed, if 
> > > denial of
> > > service should be avoided. Therefore, let the client pay for the storage 
> > > the
> > > server need to keep the return endpoint.
> > 
> > I understand that there are servers that must operate this way. These
> > servers are extremely difficult to build, and they must manage their
> > storage with tremendous care.
> > 
> > My problem with your proposal that a server must hold many endpoint
> > capabilities is that it has the effect of insisting that *all* servers
> > undertake this error-prone and complex management task.
> 
> If the performance of a session-based protocol is not needed than build
> easier but slower session-less server. If they are a map() slower for every
> operation, they should not care whether a copy() takes an IPC more.

I disagree with this statement. It appears (to me) to involve an error
of cost estimation and a misleading cost comparison. It also involves an
assumption about sessionless protocols that seems very unfortunate.

COST ESTIMATION

You assume that the difference in cost between (IPC w/map) vs the cost
of 2(IPC w/map) can be ignored. This is incorrect. Have a look at the
graph on page 73 of

        http://www.l4ka.org/publications/1996/towards-ukernels.pdf

What the graph says is that a VERY careful IPC implementation can obtain
a factor of 10 advantage over badly built legacy kernels. Since that
measurement, the factor of advantage has already been reduced
substantially by the declining relative performance of SYSENTER/SYSEXIT,
to the point where the real advantage now may be as low as a factor of
four. You now propose to throw away a factor of two, and state that we
should not care.

But what this appears to ignore is the L4Linux numbers. What the L4Linux
numbers (and also the EROS microbenchmark numbers) show is that the
current L4 (and EROS) IPC times are only barely good enough. This, by
the way, is also confirmed by the recent work at UNSW confirming the
continuing importance of the small space optimization.

In order to believe that an extra IPC is nothing to worry about, we
would need to re-measure those results with an extra IPC performed
everywhere and see what the results were. As it happens, the UNSW people
have (in effect) used and measured a design very similar to this one.
The Mungi simulation of capabilities used something very similar to a
cap-server design, using an extra IPC for every call. Performance was
definitely a concern.

The EROS numbers reported at IWOOOS in 1996 specifically examined the
cost of third-party tail-call style IPC of the type that you propose.
The tail-call two-IPC cost is definitely lower than two unrelated IPCs,
but this is primarily due to retained D-cache and I-cache residency. In
the actual protocol that you propose, some amount of work must be done
by the middle party. This will almost certainly destroy L1 I-cache
residency on current generation Pentiums, and it is likely to intrude
significantly on D-cache residency as well (because the
server-implemented capability tracking data structure needs to be
updated).

Finally, we have a meeting between Jochen and myself, somewhere in 1995,
where Jochen tried to convince me to move EROS onto L4 using this type
of design. I explained to him that doubling the IPC overheads (because
of the kind of indirection you propose) wasn't going to work for EROS,
because cap transfer is a critical and high-frequency operation. After
some discussion, **he agreed.**

MISTAKEN COST ASSESSMENT

If the COPY operation is first class (as is being proposed), then
correct comparison is the cost of (IPC w/COPY) to 2(IPC w/map). So we
must ask what the cost of COPY actually is.

Hmm. I have just discovered that some talk slides have fallen offline.
What I was going to point you at was the performance graph from the
IWOOOS talk in 1996 comparing L4 and EROS IPC performance. The summary
is that the measured cost of transferring the capability by COPY is
insignificant. Jochen challenged these numbers at the talk and then
agreed that they were correct.

Now there are three things to say about that comparison:

1. L4 has gotten better since then, but every optimization that L4 has
applied (that I know about) would work equally well in EROS.

2. The EROS numbers include a linked list update in the capability chain
that is surprisingly expensive. This is going away in Coyotos.

3. I have noticed that many of the older L4 numbers published by Jochen
were based on implementations that were not entirely correct. At that
time, for example, the L4 kernel *restored* segment registers, but did
not save them. This turns out to be a communication channel. If the EROS
implementation had taken the same (incorrect) short cut, our numbers
would have been noticeably faster than that particular implementation of
L4.

However, based on those measurements, I think that the cost of IPC+COPY
for one capability in 50% of transfers (namely: the reply endpoint,
which is only passed in the CALL phase of the round-trip RPC) is not
measurably different from the cost of IPC without any transfer.

ASSUMPTIONS ABOUT SESSIONS

A session is a long-term relationship between a client and a service.
That is: it is a communication channel having the property that mistakes
can occur for as long as it is open. Further, sessions make cleanup more
complicated. It is desirable to narrowly limit the temporal scope of
sessions.

Because of this, it would be very unfortunate to demote sessionless
protocols to second-class status. Sessionless protocols are inherently
cleaner, and should be preferred whenever possible because they minimize
the temporal scope of relationships.

But further, you are saying, in effect, that the high-performance case
(the session-based case) requires complex storage management code in
each server combined with a multi-party trust relationship to ensure
cleanup. Subjectively, this does not seem like a good set of defaults
for a security kernel to impose.

> > > > 2. If the client dies, how does the server learn that the session is
> > > >    terminated and the server's endpoint capability for that client can
> > > >    be dropped?
> > > 
> > > In the network server example: if a connection timeouts. Otherwise the 
> > > parent,
> > > who deletes the client, has to reclaim the memory  or notify the server.
> > 
> > So the party who deletes the client must have complete understanding of
> > the actions of the client? Doesn't this violate encapsulation?
> 
> If the party who gives the resources to the client want them back, it needs
> to get a notify of the dead.

You appear to be proposing a protocol that relies on a security
violation [1] in order to achieve storage recovery. This does not seem
entirely wise, since it is only correct if the server can trust the
client (or its parent), and we have already established that this is an
unsafe practice.

Also, your estimations of performance cost don't take into account any
of the extra protocol overhead that is required in order to make
arrangements for these notifications to occur.

[1] There is no security violation if the client agrees to notify. The
security violation occurs if the client chooses NOT to notify and the
system nonetheless performs a notify.

> > > > 3. Consider a server that has multiple clients, each with a session.
> > > >    One of these clients invokes the server. How does the server know
> > > >    which client it is supposed to respond to?
> > > 
> > > We use the badge for this.
> >
> > In any case, the badge is insufficient. Two clients can hold the same
> > capability (including the same badge) and the server must be able to
> > know which one it is replying to in this case.
> 
> The badge is a session identifier for the server, comparable with an IP 
> address
> or better a MAC address in a network. If a client gives his badge directly to
> another party, it allows to act on behalf the client. I do not know whether 
> this
> is needed, but optional sending a return endpoint should solve it.

I understand the protocol design. I have stated elsewhere my belief that
it will not work in practice, and I questioned the "extensible badge"
idea very seriously at the Dresden summit, but my ultimate argument is
based on assumptions about usage, and these can only be tested by
experience.

Just so they are documented, there are two things about this design that
seem to me to lead to surprising and hard-to-recover errors:

1. If client C1 desires to transfer a capability to client C2, then
client C1 must extend the badge in order to ensure that a new session ID
is generated. There are several ways that this protocol can become
complicated:

  a) There is no guarantee in the design concerning how many
     extension bits are available in the badge.

  b) The badge is very size-constrained. When it runs out of bits,
     what happens?

  c) A badge used in this way is not very adaptable, since the
     server cannot extend *its* portion of the badge in response
     to growth.

2. There is no way in this protocol for C2 to obtain a capability that
is fully independent of C1 **even if COPY is the primitive** because
your assumptions about the interpretation of badge bits is hierarchical.
Because it is hierarchical, C1 will always control the session of C2
even if the badge bits are extended.

My point is only to state that the hierarchy assumption is very deeply
embedded in the L4.sec design. The assumption that hierarchy is okay
appears to rely on assumptions about the cost of multi-IPC protocols
that do not seem consistent with existing measurements, and also do not
appear to have adequately considered the complexity in the OS of
managing and avoiding denial of resource issues.

Whether the "extensible badge" approach will work well in practice is
something that only time and usage will permit us to learn.


Bernhard:

I think that the arguments about performance that I am making are
plausible based on experience and other measurements, but they *could*
very well turn out to be completely wrong. I suggest that you (or
somebody in Dresden) should try to extract them into a set of
microbenchmarks that can be used to validate or invalidate these
assertions before a large body of application-level code is written on
L4.sec. Whether I am correct or not, it would be good to know for sure,
and if I am incorrect it would be good to know *why*.


shap





reply via email to

[Prev in Thread] Current Thread [Next in Thread]