[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New channel concept

From: olafBuddenhagen
Subject: Re: New channel concept
Date: Wed, 30 Jan 2008 06:21:26 +0100
User-agent: Mutt/1.5.17 (2007-11-01)


Sorry for answering that late again. We do seem to have a very bad
timing -- that's the second time your mail arrives just at the beginnig
of a longer period of email abstinence on my side ;-)

On Sat, Jan 19, 2008 at 01:57:55PM +0100, Carl Fredrik Hammar wrote:
> <olafBuddenhagen@gmx.net> writes:

> No, a hub is a an object similar to a channel, except it deals with
> filesystem requests, i.e. opens.  Which results in a channel.

I think it's clearer now: You are talking about the distinction between
the object doing the channel management, i.e. handling opening the
connection to a channel-aware FS node, optionally uploading to the
client; and the object that implements the calls on the once opened
connection -- the one that actually gets uploaded.

There might be some confusion here (at least I must admit that it wasn't
so clear to myself up to now): Opening a channel session is really
orthogonal to opening files!

Think of translators that expose a whole directory tree rather than just
one node: If we want to optimize stacking of such translators, it's not
sufficient to open channel sessions whenever opening some of the files
exposed by this translator. Rather, we want all operations -- both FS
and IO -- invoked on the directory tree exposed by this translator, to
happen in a single large session, created on first access to the
translator's directory tree.

Considering this, it seems obvious that the connection management is
completely distinct from the invocations (FS, IO, or else) done on an
existing connection. No call for trying to unify them in any way, I
would say...

> My original idea was to implement hubs as channels, using channels
> ability to implement extra interfaces and not implementing io.  This
> didn't fit with the channel concept.  Expanding the concept to allow
> it, we can throw out hub as a special concept, it's ``just a channel
> implementing the fs interface''.  (The channel fs interface would
> return channels instead of ports on `open()'.)

I see. Terminology is really problematic here...

AIUI the channel concept as originally planned has two major elements:
One is the connection management. The other is an API that can work both
with modules loaded in the client, and with actual RPCs to the
translator; and implements the standard IO interface as well as hooks
for any family-specific extensions.

Your idea now was to use the extension mechanism to implement the
management interface as well, right?

I don't remember how you intended to implement the extension hooks in
your original design -- or maybe I never really knew. Thinking about it
now, I don't quite see a reason why channels would need any special
handling for extensions at all? It seems to me that the family-specific
extensions are Hurd interfaces like any others -- each coming with a
.defs file for MiG, and a library for programmer's convenience (and for
optimized stacking...)

In this sense, "implementing something as channel" just means that the
respective convenience library supports libchannel. (Or better even
something with a more appropriate name, like libtstack or so...)

For FS and IO interfaces ideally this would be libc, or perhaps some
special libstackio offering a similar interface, if there are some
objections agaist implementing it in libc itself. For stores it would be
libstore; for audio and network it would be libnoize and libpacket, or
something like that.

> > Not sure though what exactly you mean by "Hurd objects" in this
> > context. Did I mention already that the word "object" is way
> > overloaded? ;-)
> By Hurd objects I meant server-side object, as in objects servers of
> the Hurd provide.  Typically file objects, but also others, for
> instance user identities and processes through the auth and proc
> servers respectively.  (I'm not suggesting these particular objects
> are useful in the context of channels.)

OK, that's what I guessed. Just wanted to make sure :-)

> I'll stick with the term `server-side object' instead.

I didn't say "Hurd object" is a bad term. It's not fully
self-explaining, but neither probably would be any other term one could
come up with...

> > Considering the more generic scope, maybe we should drop the whole
> > "channel" terminology alltogether, and try to find something more
> > intuitively describing translator stacking... I have no suggestions
> > offhand, though :-)
> How about `virtual port' or `vport' for short.

Well, the "virtual" part has some merit; but "port" is rather confusing
IMHO. Note that a connection to a translator exporting several files can
have many open ports. (In fact, it could even with a single-file
translator -- though this is not a very likely case...)

> > (For stores for example it's probably not really useful most of the
> > time... It seems to me that the major motivation behind libstore was
> > actually to allow the root FS to run without relying on other
> > processes. Personally, I'm not convinced this was really a good
> > idea. But well, what do I know :-) )
> Interesting.  I think I agree with you, at the very least libstore
> seems a much too complicated solution with respect to the problem.

You think so? To be honest, *if* the root fs running all in a single
process is considered a requirement, I can't think of any simpler
solution that wouldn't loose a lot of flexibility...

It's really only the requirement I have doubts about.

> > The idea of channels (or more generally, optimizing translator
> > stacking) is *not* merely to avoid actual RPC calls. I don't think
> > that would be a worthwhile goal. The actual IPC is very slow on
> > Mach, but modern microkernels show that it is possible to do much
> > better. The cost of IPC itself is not exactly neglectible, but only
> > in few situations really a relevant performance factor.
> I would argue that (potentially deep) translator stacks is one such
> situation.

The depth is not really decisive here.

IPC overhead only becomes a relevant factor when having many calls, each
doing only very little work. That can happen without any stacking just
as well.

Also, even if it's a relevant *factor*, it's still not necessarily a
*problem*, unless we have really lots of calls in absolute numbers --
with Mach, in the order of at least tens of thousands per second; on a
modern machine, probably more like hundreds of thousands. There are not
really that many situations where we reach such numbers, I'd guess...

I'm not saying IPC overhead is meaningless. But we shouldn't
overestimate it.

(Admittedly, I heard claims that on Mach there is a much larger indirect
overhead from IPC, because of poor scheduling. I don't know enough about
the details to form a good opinion on whether it's true, and if so,
whether it can be fixed...)

> Also my impressions are that we will be stuck with Mach for quite a
> while, and that IPC on Mach is inherently slow.  So the fast IPC
> argument doesn't really apply.

Well, opinions on that are vastly diverging. Marcus and Neal seem to
consider the existing implementation useless, and to believe that we
should all focus on new designs instead. (At least that is my
impression... Hope I won't be accused for misrepresenting their opinion

On the other hand there are people like me, convinced that the research
on new designs is interesting and will give some inspiration for future
developments; but otherwise has little direct effect on current Hurd
development. Convinced that before we switch to a totally new design, we
first should take the existing design (and Mach) as far as we can; and
only when we have reached the real limits of what can be done with it,
go for a design that addresses precisely these limits -- based on
knowledge, not speculation.

While this means sticking with Mach for the time being, it doesn't mean
we need to take it as given. The process of improving the existing
implementation can very well encompass improvements to Mach as well.
Presently, there is lower hanging fruit for improving Hurd performance;
but once we get to a stage where IPC performance becomes the major
bottleneck, I hope that we can improve on it, rather than trying to work
around it...

It seems for example that a good part of what makes Mach IPC so slow is
owed to network transparency at kernel level -- which we don't make use
of anyways. Considerable simplification might be possible here; the
question is only whether it can be done without changing the semantics
too much, so that everything would need to be rewritten...

> > The main overhead of RPCs is not from the actual calls, but from the
> > implications of an RPC interface -- from the fact that client and
> > server can run in different processes (address spaces), possibly
> > even on different hosts. (Though we don't employ this latter
> > possibilily presently, and I'm not convinced it is really useful to
> > preserve network transparency at such a low level.) Meaning the
> > client needs to be prepared for communication to fail; meaning that
> > the interface is constrained to passing mostly plain values, no
> > pointers, no global variables, no function pointers etc.; meaning
> > client and server don't have access to the same resources; meaning
> > server and client threads run asynchronously (unless using passive
> > objects, which we don't).
> I'll tackle the issues one at the time in the order you enumerated
> them.
> * Communication failure
>   We have to deal with this in either case, since we might be using a
>   port wrapper.

That's exactly my point: *If* you want to do everything the same as if
it was a real port you are communicating through, you will have to
handle things like possible communication failure, even when the code is
actually loaded in the client, and communication failure thus can't

That is precisely why pretending that we are always talking through
ports is not really helpful. We need to do things at an abstraction
level where it is possible to skip these things when not necessary.

>   Even if not using a port wrapper directly, the bottom layer of a
>   channel stack probably uses IPC, (it is most likely a port wrapper).

Not at all. Translator stacks that are mostly standalone, where incoming
requests are handled internally rather than being forwarded to same
other entity, are perfectly possible -- and in fact it's those that can
profit most from stacking optimization.

More importantly, the fact that communication can fail at some lower
layer, does not mean that *every* layer has to guard against it.

> * No pointers
>   The problem here is unnecessary copying.  To illustrate this lets
>   compare the Hurd's `io_read' to POSIX's `read'.
>   `io_read' optionally takes a buffer as input and returns a buffer
>   which is either the input buffer or a newly allocated one.  Note
>   that the input buffer is deallocated from the client on a successful
>   send and that the output buffer is deallocated on a successful
>   reply.  Mach can take advantage of this and avoid copying the buffer
>   if the server reuses the input buffer.
>   Instead of taking a buffer, `read' takes a pointer as input and
>   writes to the underlying memory, thus avoiding any copy.
>   The problem with `io_read' is that we have to pass page aligned
>   data.  (We can pass unaligned data, but the entire page would be
>   visible to the server.)  This means that we have to resort to copy
>   if we want to store the data at an unaligned address. Unfortunately,
>   this is quite common and for instance it's needed for buffering.

Yes, out-of-band transfer can help in some cases. However, the need for
alignment already extremely narrows down the use: It means that you
can't just use it for any piece of memory you wish to transfer, but only
for specially prepared buffers. It also means that it can only be used
in situations with few, large buffers.

Furthermore, while it can avoid the cost of copying, it is far from free
-- the VM manipulations necessary are quite expensive too. (In fact, I
once saw a claim on lkml that VM tricks tend to be *more* expensive than
copying! My intuition tells me that with standard 4k page size, this
might very well be true.)

While these points strongly reduce the usefulness of out-of-band
transfer as a remedy to the lack of pointers in case of one-shot buffer
transfers, that's not even the worst of it.

Pointers can do much more than that. For one, a pointer once transferred
allows both caller and library to update the referred data repeatedly.
With RPC, each subsequent update needs to be communicated explicitely.
(Shared memory can be used to avoid it, but is expensive at setup time.)
While in some cases this can actually be considered a good thing, as it
tends to be more robust, you must see that it can be an enormous cost.

And there is yet more. Pointers are crucial for complex data structures.
For RPC, any data structure needs to be flattened to a set of arrays and
indices, and reconstructed into a proper data structure on the receiver
side. (Or used in the awkward flattened representation.)

Also, when transferring data over RPC, the caller either needs to have a
very good understanding of what will actually be needed, or in some
cases has to transfer much more than really necessary -- which is
especially painful when the data structure needs to be converted...

> * No globals
>   The use of globals implies that memory be shared between different
>   clients each having a channel from the same translator.  I think we
>   can agree that is a bad thing in this context, (unless read-only
>   like code).

I don't see how globals are related to a translator having multiple
clients. I'm talking about server and client code sharing global

Of course that also means that when a client contacts multiple instances
of the same translator, i.e. one uploaded module is used multiple times,
all the instances share the same variables, which obviously limits
application somewhat...

> * No function pointers
>   Right.  But we do have ports which can do the same thing, just send
>   it and listen for call-backs.

That's not quite the same thing. They are more expensive, both in terms
of resources and of complexity, by several orders of magnitude --
prohibitively expensive in all but a very few cases.

> * Asynchronism
>   While Mach's IPC primitive `mach_msg' is asynchronous, we are only
>   interested in RPCs and these are synchronous.  In some sense an RPC
>   is just a function call to a function in another address space.

I guess you are right on this one. While I'm not convinced that the RPC
mechanism fully hides the underlying asynchronity, I can't think of any
situation right now, where it would add complexity above the RPC
level... Maybe I was too hasty here.

> * etc.
>   It's hard for me to counter this one.  I hope you don't mind me
>   skipping it.  ;-)

You left out access to resources... But that's beside the point anyways.
I must have expressed myself very badly indeed, if I left you in the
belief that by commenting on some individual issues I picked out, you
could in any way disprove the general problem I'm presenting here.

You can hardly question the fact that RPC mechanisms put very severe
restrictions on communication interfaces; and by that, on the structure
of the whole program. Things that work naturally in a library interface,
require explicit, all but trivial handling, when dealing with RPCs.

We have to put up with vast overheads; lots of code to handle all the
communication, the context, the possible error conditions -- where the
actual functionality of the translator might boil down to a simple
function of no more than a few lines.

On top of the direct and indirect cost of communication itself, the
constraints and inefficiency or RPC often calls for redundant checks and
calculations at the various layers; for data and control flow patterns
not at all optimal when transferred to a library call environment. And a
framework working at the lowest level, unaware of semantics of the
communication, has no means to offer anything that might reduce the
inefficiency -- anything that could help avoid redundancies or rearrange

> > (I must confess that I don't know the actual store interfaces; but
> > my guess is that libstore uses such a special interface between the
> > modules internally?)
> It seems the only difference is that offsets are mandatory and given
> in blocks instead of bytes, amount to be read is in bytes but must be
> a multiple of block size.  In this case it would of been better to
> keep it wholly consistent with the io interface, and just use block
> aligned offsets.
> Also it has some funky functionality to remap the blocks of a store
> without any cooperation from the back-end.  But this just seems
> awfully complex and could probably be reimplemented through a store
> module with only a slight loss of performance.
> It seems that libstore really could use a clean-up.  :-/

Don't be too hasty in your judgements. These things were designed by
some very smart folks, and I'm sure they did have something in mind
there. Problem is that we do not know what it was...

(If you put down specific questions, and directly address the mail to
Roland and/or Thomas, you *might* have a chance of getting a response;
but don't count on it :-( )

> > In some cases the actual functionality of the individual layers
> > perhaps could even be implemented using some kind of abstract
> > description, rather than C code.
> I don't really see how that would work, do you want to elaborate?

Take the example of stores: Many of them just do some kind of remapping
of the blocks of underlying stores (striping, concatenating), or other
trivial operations (zero store). These can easily be expressed
mathematically. In such a case, instead of passing the client requests
through all the layers one by one, each doing some transformation, the
framework could get the mathematical descriptions from all layers, and
assemble them in a single translation function.

Note that I'm not claiming this is really feasible or useful in
practice. Just wanted to give an idea what possibilities exist when
working at a higher abstraction level...

> So where do we go from here?  As I see it we have two extremes, which
> I will call dynamic and static channels (at least for now).

For the sake of understanding, let me present my own take on this --
which seems mostly to restate what you are saying below, in slightly
different terms.

I see some three or four distinct levels at which translator stacking
could be optimized. Level 1 would just replace RPC by library calls,
having no clue about the semantics of the communication, and giving no
clue to the actual translator implementations.

This is pretty similar in spirit to what I was originally pondering way
back -- only that my crazy mind was actually thinking even lower level
(let's call it 0.5): Hooking right into the program execution
mechanisms... The library variant seems much simpler and just as
effective, though.

The great advantage of such an approach is simplicity of use: Once the
necessary machinery is implemented, everything can make use of it, with
hardly any modification at all.

However, I'm no longer convinced that such a mechanism would be really
worthwhile -- that the limited performance increase would make up for
the disadvantages of giving up the process boundaries. If we already
have to coax all communication through the restrictions of RPC, we can
just as well reap the benefits of having distinct hardware protected
address spaces...

(I mentioned in the "vision" mail that I was thinking of using LLVM in
the hope that it would optimze away the remaining overhead of RPC-based
communication that I described above, after linking. But I have serious
doubts regarding LLVM's ability to optimize a lot at such a high level.)

This approach might have some merit perhaps, because I was also
pondering a similar approach to do network transparent RPC in user space
-- both things might be handled in a common RPC abstraction framework.
OTOH, tschwinge suggested a much simpler approach for network RPC...
(Just using a port forwarding translator.)

Network transparency could of course also be achieved at a higher
abstraction level, e.g. FS like in Plan9. This way it could be smarter,
but also less universal. Not sure which approach is better.

Translator stacking at level 2 is totally different from level 1: It
works directly with the public, POSIX (for FS and IO) and POSIX-like
APIs. It knows the semantics of these interfaces. It is also fully aware
of the difference between real RPC and library interface, and
intelligently maps the APIs in either case to minimize overhead.

This level is still fairly simple: It requires an adapted library for
each interface; but actual translators and applications can still reap
the benefits with only little or no explicit support. At the same time
it should already cut considerably at the communication overhead.

(Some intermediate level, say 1.5, might also be possible: Hooking at
some internal interfaces below the level of the public APIs; already
aware about the differences between communication over real RPC and
library calls, but not having full understanding of the semantics of the
specific API. I can't see any benefit in this approach, though...)

I believe that level 2 should be the base line for the translator
stacking framework, working for all stackable translator families.

Individual families like stores can optionally support even more
optimized interfaces, for internal use by translator of that family.
This would be level 3. It's pretty much the same as the existing
libstore (and libchannel as originally intended), except that it uses a
common base framework for all families, and that modules are loaded
directly from the server rather than from some external location.

> Dynamic channels are the ones I have presented, sans the non-RPC
> interfaces.  That is they closely emulate the existing RPC interfaces,
> and appears as nothing more than fast RPCs to clients.

That seems to be what I'm describing as level 1 above.

> The big selling point here is transparency.  Using a channel is just
> like using a port, so existing clients need little change to benefit
> >from using channels.  Same thing with servers, since they implement
> RPCs already, it's mostly a matter of hooking the existing RPC
> implementations to channels instead.

Yeah, the simplicity of use as I described it.

> Unfortunately we miss out on the convenience offered by libc, unless
> we reimplement them over channels and supply them also.  (We could
> also integrate channels into libc directly, but I suspect that
> wouldn't happen anytime soon if at all.)

Indeed, I also fear that there might be problems with that. Which would
be rather unfortunate, as not only level 1 is in this situation, but
also level 2 (or 1.5).

> Other benefits include that different channel families need not be
> aware of each-other to benefit from using the channel interface.

That is also true for level 2, with FS and IO interfaces forming the
common denominator for all translator families. Only Level 3 offers no
compatibility per se -- but then, as I said, I'd only implement level 3
as optional extensions beside level 2, so no problem here.

> Though I suspect interoperability isn't very useful.  Why would an
> audio channel want to be layered over a network channel?

Oh, this is actually a boring case: Streaming an audio channel over the
network is a way to obvious application. But what about the inverse,
layering a network channel stack over an audio channel? Now that calls
for a bit more imagination :-) How about pushing network packets through
an audio link? (Thing analog modem!) Or maybe just listening to the
network stream to get a feel for the traffic? Or maybe an art project?

Obviously, I'm not totally serious with the latter use cases. But my
point is to demonstrate that just because applications are not obvious,
we should not assume that there are none. In fact, for me the whole
point of the Hurd is that it's much more flexible in use of interfaces
and mechanisms than other operating systems -- that it's much more open
to implementing new ideas!

> With static channels each channel belongs to a family, where each
> family corresponds to its own abstraction, and where each channel
> implements an interface optimized for this abstraction.
> That is, we introduce a libaudio, libnet, etc. for each family.  Each
> being like libstore currently is, only cleaner and sharing common code
> through libchannel.  Where libchannel itself doesn't introduce a
> channel abstraction per se, it's just a support library.

I'm not certain about the details of what you are describing here. I
think you mean something akin to my level 3, but I'm not entirely sure.

> (Although in this case I'd much rather split libchannel into smaller,
> clear cut pieces, for instance a `libenc' to deal with encoding and
> decoding transferred data.)

Splitting out sub-libraries only makes sense if there will actually be
other users of these...

> The thing about static channels is that they are simple.  Both simple
> to implement and use because their interface can be brought closer to
> the problem domain.
> The downside being that a suitable abstraction must be engineered for
> each channel family, including support libraries to implement the
> translators corresponding to each modules.

Well, simpler by what measure? This is really hard to classify. Writing
a translator using the level 3 interface is completely different from
writing a traditional translator centered around POSIX interfaces. On
one hand, the interface can be much better optimized for the problem
domain; and the programmer is also relieved from handling all the gory
details of FS-like interfaces.

On the other hand, this means loosing a common language, and a very
intuitive abstraction -- FS interfaces, while usually far from optimal
for specific applications, also have strong advantages!

So it's really hard to come to a definite conclusion as to which way
it's easier to write translators...

> Also any interoperability must be explicit, by creating modules that
> adapts one channel type to another.

Not at all: I personally believe that every translator using a specific
interface, should also expose an FS-based compatibility interface, at
least for the basic operations. Special RPCs more suitable for the
specific application should be implemented as additions, not
replacements for FS interfaces.

The nice thing when working with level 3 libraries, is that handling of
the compatibility interface can be done automatically by the library. So
compatibility in fact gets easier: Once the support is implemented in
the level 3 library, the programmers of the actual translators don't
need to think about it.

> The middle ground as I see it is dynamic channels with non-RPC
> interfaces, where each such interface corresponds to a channel family.

Not sure what you mean here. Is it related to my level 2?

> Somehow I think providing *both* static and dynamic channels would be
> cleaner and more straight forward, using which ever handles the task
> well enough.

As I said, I believe we should provide level 2, and additionally level 3
(what you seem to call "static") for some families. I don't see much use
in also providing level 1 ("dynamic")...


reply via email to

[Prev in Thread] Current Thread [Next in Thread]