[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Improving object mobility within the Hurd

From: Carl Fredrik Hammar
Subject: Re: Improving object mobility within the Hurd
Date: Sat, 28 Feb 2009 14:17:59 +0100
User-agent: Mutt/1.5.18 (2008-05-17)


sorry for the late reply.  I had written most of the mail quite a while
ago, except for the terminology discussion.  Then I fell ill, and was
unable to complete that part, which I felt required a lot of concentration.
Then it took additional time to get back up on the horse and get back
into the nitty-gritty details.  In hind-sight, I should of broken the
mail and replied to the different parts as I finished them.

On Fri, Jan 30, 2009 at 10:39:51AM +0100, olafBuddenhagen@gmx.net wrote:
> On Thu, Jan 22, 2009 at 10:54:53AM +0100, Carl Fredrik Hammar wrote:
> > On Fri, Jan 16, 2009 at 01:11:09PM +0100, olafBuddenhagen@gmx.net
> > wrote:
> > > On Sat, Jan 10, 2009 at 06:56:15PM +0100, Carl Fredrik Hammar wrote:
> > I actually think we agree on what an object is: a bundle of state and
> > code with a specific interface, i.e. what you call abstract objects.
> > The interface can be RPCs, function calls, direct state manipulation,
> > or some other way of using the object.
> I'm not sure we are talking about the same... By "abstract object", I
> mean a bundle of state and code, but not necessarily bound to a specific
> interface. It could have multiple interfaces, or a single internal one
> that can be mapped do different external interfaces (RPC, local function
> call etc.).

OK, so it was a distinct concept.  I see how it could be useful.  However,
abstract objects seem to be more of a policy and I'm more interested
in the underlying mechanism, we still need to be able to discuss how an
abstract object is to be implemented concretely.

Unless otherwise stated, I'm referring to ``concrete'' objects.

> Or rather, there is a single interface at an abstract level, but this
> can be implemented using different transport mechanisms or containers or
> whatever we call them.

Ah, an abstract interface.  I guess you can see related local and remote
interfaces as instances of a single abstract interface.

What you call transports, I have called wrapper objects.  I prefer
your terminology, but maintain that transports are distinct objects.

As an example: on server side a transparent object would be implemented
using a remote transport around a local object.  On client side, it would
be a local transport connected to the remote transport.  After migration,
the client would use a copy of the server's local object directly.

> > A /remote/ object is an object that can be called remotely.  A /local/
> > object is one that can be called locally.
> I'm not sure remote object vs. local object is a meaningful distinction.
> We have the normal Hurd servers, where the objects are hard-wired to the
> RPC transport. We have the store framework (and hopefully a more generic
> framework in the future) for mobile objects, which can reside in the
> server and be accessed through the RPC transport, or be loaded into the
> client and accessed through local function calls. And we discussed the
> possibility of objects that can only reside in the client, and could be
> hard-wired to a local function call transport.

It isn't meaningful to distinguish objects that can be called remotely
using RPCs from objects that cannot?  Perhaps local is a bit redundant
since almost all abstract objects can at least be called locally from
the implementing process.  More on this bellow.

> > They are the best /pair/ of terms I've found so far.  `RPC object' is
> > more specific than remote, but I haven't been able to find a good
> > substitute for /local/, the best I have mustered is /C/ object.
> I'm not sure whether this was clear: By "RPC object", I did not mean a
> specific subclass of some larger class I happened to call "abstract
> objects", but rather *any* abstract object, when accessed through the
> RPC transport/container. The *same* abstact object can be access as an
> RPC object, or as something else...

I've been quite torn over this issue.  This is part of the abstract object
discussion.  The question is where to draw the line of what constitutes
a single object.  While I consider transports as clearly separate objects
on a concrete level, it's less clear when considering normal Hurd objects.

Is the RPC handling part an additional interface or a separate proxy
object?  Or if preference is given to the first view, is there any point
in regarding the proxy as separate from the proxied object?

An object depends on its interfaces but is independent of its proxies.
While it controls which interfaces it implements, it has no control
over its proxies.  This makes proxies more flexible and dynamic than
interfaces, you can have several different or equivalent proxies to the
same object, and proxy proxies.

Consider an object bound to several ports, i.e. messages to either port
will result in the same operation on the same object.  Clients would
not be able to establish that they are bound to the same object (in
general), thus it is more appropriate to view the ports as references
to different proxy objects rather then references to the same object.

This might seem a bit contrived: why wouldn't you just reuse a single
port.  One reason could be selective revocability, but the main
point is that it's a possibility that should not be ignored.

Another point is that remote use of an object is usually restricted
compared to the actual implementation, for protection against misuse if
nothing else.  It's hard to consider a port as a direct reference when
it limits the object in this manner, again it works more like a proxy.

Even though this view means that remote objects are all proxies to
some local object or ports used only for their identity, I think it
reflects the actual topology of Hurd's object system better.  Luckily,
clients can remain oblivious to this fact since being a proxy is an
implementation detail.

Note that according to this view remote and local objects are always
distinct.  For an abstract object, being remote would mean that it has
some RPC transport.  Being local would mean it has a local function
call transport, which is almost always true (except for id ports),
so it's not very useful in this case.

In addition, it makes the full topology of a remote transport more
complex: a remote object connected to a remote-to-local transport
connected to a local object.

> > A mobile object is one that can be copied from one process to another,
> > code and all.  Note that both local and remote objects can both be
> > mobile or not.
> This statement doesn't make sense to me... I haven't really understood
> the definition of local/remote objects you proposed. It's not intuitive.

I was just pointing out that mobility and whether an object is remote or
local is orthogonal.  Remember that local and remote refers to interfaces
not location.  An example of a local object that isn't mobile is a copy
store.  A mobile remote object would be one that is accessed through RPCs
both before and after migration.

> > An /object system/ is a framework for implementing objects and
> > controls how they may be formed.  libstore is a trivial object system
> > where all objects have the same single interface.  Mach's IPC and MiG
> > forms the object system for remote objects. which allows objects with
> > several interfaces.
> > 
> > A /mob/ is an object specifically implemented through my future object
> > system.  Unless otherwise mentioned, a mob is assumed to be mobile as
> > it is the framework's primary purpose.
> This seems a bit confusing to me: The object system encompasses both the
> transport(s), and the methods for mapping an abstract interface to them?
> BTW, I just realized that I never considered the interaction between MiG
> and abstract interfaces... Stores support only a single abstract
> interface, so providing a matching MiG definition file for the RPC
> transport is not a problem; but if we have a generic framework that can
> handle any abstract interface, would we still create matching MiG
> definitions for each one manually?...

Well yes, unless we implement an interface generator for abstract
interfaces that can be compiles to MIG and mob interfaces.  But that is
a consideration I'll happily postpone.  ;-)

> > /Transparent/ in this context means that either a local or remote
> > object can be used with the same interface (using a wrapper).  This is
> > to make it possible to fall-back on using the object remotely if the
> > object can't be transferred.
> Well, when a remote object is migrated to run locally, I would actually
> still call it the same (abstract) object; only it uses a different
> transport/container now...

Abstractly: yes, concretely: no.  :-)

> Anyways, things are becoming clearer now :-)


> > I'm trying to avoid making assumptions on how interfaces might look
> > like.
> Well, to be honest, this discussion is in parts a bit too abstract/vague
> to my taste... I'd rather talk about more specific bits.

It'll get more specific once I get around to writing the mails discussing
the individual issues.  (I hope.)

> > I do, however, suspect that transparent interfaces will be optimized
> > for the local object case.  For a io interface that would probably
> > mean a POSIX style, rather than a Hurd style, interface.
> Not sure what you mean by POSIX style vs. Hurd style... But admittedly I
> do not really have any idea at all how the abstract interfaces of
> transparently migrating objects could look.

Take the difference between how read and hurd_io_read returns data,
read fills specified memory, while hurd_io_read might fill a specified
buffer or return a new buffer.

> However, I tend to think they would rather resemble the standard RPC
> interfaces. For one, the RPC transport is obviously the more limiting
> one, so it must dictate what is possible.

Of course.  However it's definitely possible to implement POSIX style
interfaces on top of Hurd interfaces.  Just look at glibc.  ;-)

The other way around should not be a problem either.  (Though it might
be a problem to do it over the *actual* POSIX interface.)

> Also, for practical reasons it seems inevitable that the interfaces
> should not be too different -- having two completely different
> approaches for creating Hurd objects would be too big a burden on
> programmers.

I agree.  Though, in many cases, burden can be reduced by supplying
wrappers that implements an interface in terms of another.  If the
wrapped interface is simpler than the wrapper interface, programmers
might prefer it where applicable.

> And last but not least, I actually think the RPC case should probably
> have higher priority. I guess migration will be rather the exception
> than the rule: While not my primary interest in the Hurd, I think the
> potentially better robustness resulting from small isolated components
> is a nice bonus, which we shouldn't give up readily except in cases
> where really necessary for good performance...

You have a point, but I'm not totally convinced.  Luckily this decision
can be post-poned to when the interfaces are actually designed, as libmob
will be able to support either, and even both at the same time.

> > > So I guess by your definition, the use case I'm interested in for
> > > translator stacking, would actually not classify under object
> > > migration, but under other uses... I guess you remember that I don't
> > > consider actual RPC emulation particularily useful :-)
> > 
> > I'm guessing it classifies under partially transparent or
> > non-transparent object migration.
> This classification really depends on what level you are looking at...
> At the transport level, it would be totally non-transparent; on the
> abstract level, it would be completely transparent.
> (It might be worth considering cases where it's only transparent to the
> client but not the object, i.e. there is some special handling in the
> server implementation. I think we should try to avoid that however, to
> keep the implementations as simple as possible.)

Yes, I have considered this too.  For instance, an object that isn't
really mobile but doesn't change very often might send an object
that caches its state and reloads states when notified by the server.
A ``fat transport'', if you will.

You are right that this complicate the implementation of an object and
should normally be avoided.  But we might as well consider it a use-case
as it's not as we could prevent someone from (miss)using it this way.

> > > > The command line mechanism can ignore many of the issues that
> > > > arise in mobility, e.g. consistency between different copies.
> > > 
> > > I must admit that I don't see the difference... Please explain.
> > 
> > Take copy store as an example.  The copy store makes a copy-on-write
> > copy of another store and discards changes when closed.  For instance,
> > a copy store over a zero store is useful for backing /tmp.
> > 
> > If a copy store where to migrate, then all modifications would also be
> > copied.  Writes made to the copy would not be reflected in the
> > original and vice versa.  Because of this, the copy store has the
> > enforced flag set, which makes storeio refuse migration requests.
> > 
> > When creating an object instead there will only be a single copy.
> > Which circumvents the problem entirely.
> So it's really not because the objects are created in the client in the
> first place instead of being migrated, but simply because objects
> created from the textual store representation are never shared between
> clients... We could get the very same situation with actual migration,
> by enforcing an "exclusive" property. Not a fundamental difference, but
> rather just a special case really.

A problem actual migration still faces is the need for authority
verification.  The necessary authority has to be given by the creator,
there's no need to check if it already has it.

> I'd consider these to be special cases of object mobility, rather than
> completely different use cases of parts of the framework.

Well the command-line mechanism requires a language to describe the objects.
A feature that isn't necessary for mobile objects.  I guess it could be seen
as a human readable marshaling format, but I think that's stretching it.

> > > > I do not have high hopes for this method though, mostly because
> > > > it's hard for the recipient to determine if it can trust the code.
> > > 
> > > Well, in the simple case -- using the traditional UNIX model -- it's
> > > pretty trivial: The client trusts the code if it trust the server,
> > > which is the case when the server is run by the same user, or by
> > > root. In this case, there is no problem at all.
> > 
> > Ah, but -- as per the Hurd's design goals -- we want to reduce the
> > trust needed between normal users to take advantage of this feature
> > when cooperating.  And the client doesn't need to trust the server if
> > it acquires the code from a trusted source, e.g. from /lib, /usr/lib,
> > or $LD_LIBRARY_PATH, or even statically linked code.
> The problem with Hurd's design goals is that everyone has a different
> opinion on what they are...
> The only reference on that is "Towards a New Strategy of OS Design" --
> but this is mostly a mixture of design ideas and nice features resulting
> from them; the goals are never really stated very explicitely. There is
> only one thing that manifests throughout the paper: Giving users more
> control over their computing environment. This is the one fundamental
> idea behind the Hurd design. Everything else in there is either a
> consequence of this fundamental goal, or just mentioned as another nice
> thing incidentally resulting from the design...

OK, I'll try to avoid citing design goals to motivate features.
Instead I'll try to be more direct.

> Accessing servers that are run by untrusted users IMHO is one of the
> things in the latter category. It is mentioned in the paper, but it
> doesn't actually work: As Marcus and Neal pointed out, you *can't*
> blindly trust a filesystem server. It could do all kinds of nasty
> things, like creating infinite amounts of garbage, or just stalling
> indefinitely, both resulting in various kinds of DoS.

I'm aware of this.

Also note, that I have not ruled out transferring objects from client to
server (which is why I usually say sender and receiver instead).  I don't
think there will be any controversy over servers not trusting its clients.

> Or it could
> provide a malicious link that tricks the user process into doing
> something destructive, like deleting or overwriting a precious file.
> (This last scenario is known as the "firmlink problem". AFAIK the
> standard firmlink implementation doesn't actually expose this problem,
> but it shouldn't be hard to create a corrupted variant that does.)
> Probably it wouldn't be hard to come up with other exploit scenarios
> that allow spying, manipulating files, and even completely taking over a
> user's account.

Thank you for pointing this out.  I have considered it before, but not in
the context of mobile objects.  Not only should the sender be convinced
that the receiver has authority to hold an objects dependencies, but
also the other way around.  Thus the receiver can't be fooled to expose
or modify an object that isn't accessible to the sender.

Though a general mechanism eludes me, for io objects io_restrict_auth can
be used to limit access to the intersection of both parties' authority.
Perhaps a generalized version could be used.

The worst case scenario, that I don't find a solution, is no worse than
status quo for the Hurd.

> So what does that mean for the Hurd? IMHO not much. (One of my major
> qualms with the "Critique" is that it presents this as a major failure
> of the Hurd architecture. I don't agree it is.)
> It has nothing to do with the fundamental goal of the Hurd design -- and
> it's not even a terribly useful feature. Why would anyone provide a
> filesystem server for other users of the machine? If someone wants to
> provide a service for others, the usual way of doing that would be
> implementing a network service. Not only does that remove the arbitrary
> limitation to other users of the same machine, but also network software
> is generally well aware of the possibility of misbehaving servers, and
> usually can handle it more or less well.

You make some interesting points.  Though I don't really like thinking of
the file system as a mine field just waiting to be wandered into.  ;-)

> Back on topic: Essentially it's out of the question ever to use a server
> that is run by an untrusted user. (In theory we could design clients
> that are immune against misbehaving file servers, but the ones we have
> so far aren't.)
> Admittedly, executing untrusted code is a more direct threat. Perhaps
> it's still worth trying to prevent it in the mobility framework; I'm not
> sure.

I guess it comes down to whether fixing the problem should be considered
adding a feature or fixing a bug.  That is, if by relying on servers I'll
be introducing more bugs or just keeping status quo.

I'm not convinced either way yet.  Though, for now at least, I'll avoid
trusting the server unless it's unavoidable.

> > > Admittedly, this is more tricky when leaving the UNIX model, and
> > > working with pure capabilities... I'm not sure that an object named
> > > through a textual file name is indeed more trustworthy than one
> > > named through a port directly -- but I haven't really thought about
> > > it yet. I'm curious what you have to say on that in the promised
> > > later mails :-)
> > 
> > Using a file name, you can figure out who controls the file, and
> > decide whether you trust it based on that.  (Or at least I think so,
> > I'm not sure yet if a malicious file system can't fool you.)
> > 
> > This might not be impossible with ports, but I imagine it's trickier.
> The problem with file names is that they aren't very reliable. For one,
> they only work if client and server are in the same name space. (You
> even mentioned chroot yourself...)
> Also, file names aren't stable temporaly: The meaning of the name could
> change between the time the server passes the name, and the time the
> client opens it.

The time frame is probably longer in most cases, as the server will most
likely use a file name determined at start up.

That said, the server can send an io identity object (as returned by
io_identity) which the client can use to check whether it resolves to
the same file.

> In the end, I think the only thing we can do with file names is
> resolving them, and then doing exactly the same checks we do on a
> directly passed FD: Checking that the node is owned and writable only by
> trusted users, and that it resides on a file system that can be trusted
> regarding this information.
> I can't see how we could derive any additional trust from the file name
> itself. It seems only to open additional potential for failure.

I guess this is true.

However, if we only want to use files in /lib:/usr/lib:$LD_LIBRARY_PATH,
we are forced to use file names.  (There's no way to go from FD to paths
to all links AFAIK.)

I like this approach because it's the conventional way to specify
libraries to use, and thereby to trust.  There might be reasons to
avoid using certain modules, even if the controlling user is trusted.
For instance, the module might be considered experimental.  The method
might not be the most convenient one, but it's one that is useful for
users to learn anyway.

This is pretty much how libstore does it BTW.

> > > In the UNIX case, it is actually quite symmetrical: The client
> > > trusts the object code provided by the server, if the server is the
> > > same user or root. The server entrusts the client with the content
> > > of the object, if the client is the same user or root.
> > 
> > I'm hoping to make it so that the server doesn't need to trust that
> > the client doesn't miss use the content of the object.  This by
> > verifying that the client already has the authority needed to hold it,
> > and would thus already able to acquire the content through other
> > means.
> You are right: I oversimplyfied it. Indeed it's not necessary that the
> client runs as the same user -- it suffices that it's a user that has
> access to all the capabilities the object requires. A simple mechanism
> for checking that is called for.
> (Note that the mechanism doesn't actually need to be secure: If the
> client turns out not to have the necessary capabilities after all, it
> will simply fail, harming only itself...

Well it needs to be secure or it'll leak capabilities the receiver shouldn't
have access to...  But the rest of the paragraph seem to implies that
capabilities don't leak, which leaves me confused. :-(

Perhaps you meant that it may return false negatives, i.e. fail to detect
that the receiver does have the necessary access.  In which case I agree,
only false positives must be avoided to ensure security: the number of
false negatives should be minimized, but some are probably unavoidable.

> Unless of course the object has
> some temporary state that must be migrated, and contains classified
> information -- but temporary state in translators is problematic in
> general, and should be avoided as far as possible.)

Classified information, such as passwords, should be treated as
capabilities.  In that they should not be shared unless receiver already
has access to it.  This requires that the information be discoverable
by the receiver.  For instance, it might be available at some canonical
location, like the .netrc file for FTP passwords.

> > Also note that checking that it's the same user is not enough, a
> > process can have its authority limited by chroots and sub-Hurds.
> Interesting point. Does chroot normally prevent communication between
> processes inside the chroot and outside the chroot having the same
> UID?...

At least through ports that are only obtainable from the file system.
It seems that processes can register a port with the proc server which
it'll give out to whomever asks.  And this isn't limited by a chroot
alone, since a port to proc is inherited from the parent process.

I'm not sure what these ports are normally used for, but I assume they
refer to objects distinct from file system objects.  Thus, chroots still
limits a process authority to objects in processes of the same UID.

If an object is normally accessed through the proc server, then that
needs to be analyzed instead of the file system.  However I will focus or
perhaps even limit my thesis to objects reachable through the file system,
which is the normal case for the Hurd.

This raises a question: should the root user be considered limited by
chroots?  Or rather should root always be treated as all powerful?

> > > We could still move the handling to the client in the more common
> > > case that there is only one client -- but that wouldn't solve the
> > > resource management problem, as there are still the cases where it
> > > must remain in the server.
> > 
> > It doesn't need to be in *the* server, though someone must act as a
> > server for the file cursor object.  This could be the original client,
> > the new client, the server, or a third-party server in the system/per
> > user/per login/whatever.
> > 
> > My thoughts mostly revolve around clients pushing the cursor to a
> > third-party server and reloading if it becomes the sole client again.
> There is a somewhat similar situation with pipes: As long as there is
> only one reader and one writer, there is really no need for a server
> proccess -- the users could just communicate directly. When there are
> more readers and/or writers, an explicit server again becomes necessary.
> Note however that both in the FD case and the pipe case, the object
> migration is only an optimisation. The actual problem with the resource
> management is making sure that the (possibly shared) client state is
> accounted to the clients, not to the server. Depending on how the
> resource accounting framework works, your suggestion to keep the file
> pointer in an extra server could indeed help with that.
> But this doesn't require any object mobility framework. Introducing an
> explicit FD server is something that can be trivially done. All the
> object mobility framework does here is move the object to the client in
> the single-client case.
> This is again just a specific use of the standard mobility mechanism
> BTW, not a different use of some components of the framework :-)

I agree, however this particular part of the discussion was about
whether the framework could be used for other things than optimization.
Though, as you pointed out cursor mobility is an optimization of the FD
server idea.

But consider a similar object that maintains session state that can't
be shared.  That help resource management without being an optimization
if communication with the server is still required.

> As I already said, I don't discourage conceptually considering the
> various indigents of the mobility framework as independant components.
> But as long as we don't actually have other users, you shouldn't try to
> make them any more generic than is strictly required for the standard
> mobility mechanism -- anything else would just be overengineering.

I'm mostly avoiding artificially coupling components for no apparent

The general problem I'm trying to avoid is that frameworks usually get
in your way when doing something the original developers didn't foresee.
I want to minimize the actual framework part, making most of the
functionality available as utility functions, which are easily ignored
or replaced.

> > > I like the translator concept, because it allows intuitively naming
> > > objects through filesystem locations; and the objects are
> > > standalone, i.e. can be accessed directly from the command line,
> > > typically through a filesystem interface.
> > 
> > I'm not sure what you mean by an object being standalone...
> Well, saying that they can be accessed from the command line is not
> exactly a precise definition, but I thought it would be sufficient to
> show what I mean...
> Standalone means that it is usable on its own. It doesn't require any
> external framework to use it; it doesn't need to be loaded in a special
> way or anything like that.

Oh OK.

> > > An obvious use case are ioctl handlers: I believed for a long time
> > > that rather than being hardcoded in libc, they should be handled by
> > > some kind of loadable modules. This was actually discussed as part
> > > of the channel concept, but I discarded it back then, as it doesn't
> > > fulfill the transparency requirement, and thus didn't seem useful to
> > > me back then.
> > 
> > I never did look into how ioctls are handled so I can't tell of-hand
> > whether this is a good idea.
> This is a bit surprising, considering that you explicitely mentioned
> that in your original libchannel design...

Only at the server side, i.e. mapping RPC interfaces to channel
interfaces.  You'd never use ioctl() directly on a channel object.

> Anyways, it's pretty simple really. Every ioctl is mapped to a distinct
> RPC. For simple ioctls, the mapping is systematic, and is done
> automatically by some crazy preprocessor magic. For more complex ones
> this isn't possible however. libc has explicit stubs for these,
> transforming the parameters as necessary (dereferencing pointers etc.),
> and then invoking the actual RPC.

Thankfully, I didn't have to wade though all that crazy preprocessor
magic.  :-)

> These stubs are individual for every ioctl: to support a new type of
> device, new stubs need to be added to libc -- which is obviously
> painful. Would be nice to have a mechanism that loads the stubs
> dynamically from some external source.

Yes, but transforming ioctls to RPCs seems totally independent of the
server.  The fact that the client is attempting to do an ioctl is enough to
warrant loading the code.  I figure a proper solution would go something
like: libc gets a ioctl(fd, KDSETLED, ...), libc loads libioctl-kd.

> > > Now I see that it might be still useful to implement this using a
> > > common mobility framework, so they can be handled like something
> > > akin to translators -- providing objects that are not really
> > > standalone, but are named through filesystem locations.
> > 
> > They should be implementable as mobs.  However, as they are more
> > specialized I don't think they need more than a single interface, so
> > they might want to use a separate object system.
> Well, is it better to use a generic framework that does more than
> strictly necessary in this case, or to create a specific framework for
> this particular use case? This is a very hard question.
> A too generic framework is problematic, because in addition to
> understanding the framework itself, you need to decide on how to make
> good use of it for a particular use case; and you have to implement a
> lot of code on top of the framework to achieve the desired properties.
> Too specific frameworks are problematic, because you have to learn a lot
> of specifics for each use case; and for a new use cases, you either need
> to create new frameworks, or use some existing ones that aren't really
> suitable for the purpose.
> Finding a good middle ground in any particular area is one of the main
> challanges of good design...



reply via email to

[Prev in Thread] Current Thread [Next in Thread]