Re: Part 2: System Structure

l4-hurd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Part 2: System Structure

From:	Jonathan S. Shapiro
Subject:	Re: Part 2: System Structure
Date:	Thu, 18 May 2006 18:46:06 -0400

On Fri, 2006-05-19 at 00:29 +0200, Bas Wijnen wrote:
> On Thu, May 18, 2006 at 02:09:28PM -0400, Jonathan S. Shapiro wrote:
> > > But does this mean every piece of critical code should be in its own
> > > address space?
> > 
> > Yes. That is *exactly* what it means. More generally, it means that
> > every piece of code whose robustness is pragmatically important -- even
> > if it is not critical -- should be in a separate address space.
> 
> I expect this costs performance (for setting up the address spaces all the
> time).  I see this is useful in case recovery is indeed possible, but in many
> cases I don't see the use of it.

The experience in KeyKOS is that yes, (a) there is a cost in
performance, but (b) it is offset by other simplifications that are made
possible by this structure. Taken overall, the user sees no loss of
performance.

> > You are arguing contrary to empirically established fact. Isolation
> > boundaries have been consistently observed to make systems several
> > decimal orders of magnitude more robust when used appropriately.
> 
> I shall believe you and agree that we do want constructors, but I don't agree
> that the memory they accept should by default be opaque to the user who owns
> it.

The question of whether memory should be opaque is an orthogonal
question. In the current conversation, I was only trying to argue that
the constructor "pattern" for creating processes is a good thing to do
even if you think that memory should be translucent.

> > You are, in essence, proposing to throw away the only fundamental advantage
> > that a microkernel offers: the ability to isolate and contain faults.
> 
> This ability is only useful if you can actually recover, I would think.

Or analyze.

> > We are operating from very different philosophies. My philosophy is:
> > wherever you *can* design a system in a way that makes faults better
> > isolated and easier to analyze, you *should*. The issue goes far beyond
> > what "fails as a unit" in the field. It also applies to debugging and
> > post-failure analysis. Code that is not isolated in the field cannot be
> > instrumented in the field -- which is important when you are trying to
> > figure out what went wrong.
> 
> I'm sorry, I do not understand what you are saying here.  Can you rephrase it,
> please?

I can try. :-)

You have a system in the field. It is doing something strange that you
cannot reproduce, and your customer wants it fixed. You would like to be
able to instrument the suspected parts of the system in-situ so that you
can see what is actually happening.

Of course, you cannot do this effectively unless the suspected parts are
decently isolated components.

There are also significant advantages for live upgrade.

> > > > The problem here isn't the destruction of storage per se. It is the fact
> > > > that the destruction of storage used by the child wasn't "all or
> > > > nothing".
> > 
> > File servers are a rare case: programs that must manage storage on
> > behalf of multiple clients. This is well known to be exceptionally hard
> > to deal with, and file systems must be written with great care.
> > Realistically, they should not use client-revocable storage at all.
> > 
> > But the overwhelming majority of objects are single-client, or if
> > multi-client, all clients are in the same storage allocation domain. For
> > these, the right thing to do is have the subsystem fail as a complete
> > unit instead of having its storage be partially violated.
> > 
> > Note that "serves one client" does not mean "trusts that client".
> 
> So what would be an example of a single-client server, which does not run on
> the space bank of the same user as its client?

In practice, they all seem to run on a *child bank* of their requestor's
bank (for ease of destruction), but I don't think this alters your
question.

Obvious example: a directory object.

The problem in the file system is not one person running from another
person's storage. The problem is the *commingling* of storage from
multiple sources.

shap

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Part 2: System Structure, (continued)

Prev by Date: Re: Part 2: System Structure
Next by Date: Re: Part 2: System Structure
Previous by thread: Re: Part 2: System Structure
Next by thread: Re: Part 2: System Structure
Index(es):
- Date
- Thread