l4-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Hurdish applications for persistence


From: Marcus Brinkmann
Subject: Hurdish applications for persistence
Date: Mon, 10 Oct 2005 23:51:24 +0200
User-agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.7 (Sanjō) APEL/10.6 Emacs/21.4 (i386-pc-linux-gnu) MULE/5.0 (SAKAKI)

Hi,

I want to elaborate a bit why persistence is an attractive feature to
have, purely from a Hurd point of view.  This mail is targeted to
everyone who is familiar with the Hurd and wants to know why I have
even raised the issue of persistence in an earlier mail.

This mail does not discuss persistence itself, ie its technical
implications, its advantages and disadvantages, its problems.  That
would be a different discussion.  What follows is simply a line of
thought that will begin with "in the Hurd we want to do X" and end
with "persistence can do it for us".  Moreover, persistence is the
only way I know to achieve X in a practical manner.  If you have
better ideas, please step forward :)

This text is the result of discussions among Neal, Jonathan and me.
All errors are mine.

I will approach the issue from two different angles.

1. Passive translators

Passive translators are fundamentally flawed.  They "don't work",
meaning: They violate several good design principles, and even worse:
They are insecure in a way that can not be easily fixed.

To give a bit of background: A translator in the Hurd is nothing but a
server, but one which is integrated into the filesystem, and thus is
accessible through the filesystem as namespace.  We call a running
filesystem server an "active translator".  An active translator can be
started and attached ("mounted") to a filesystem node by any user who
has the necessary permissions for this node (I think you must be the
owner).  A passive translator is simply a command string in the
filesystem node.  If the node is accessed, and no active translator is
running on the node, the parent filesystem will transparently run the
command string stored in the filesystem node as the user of owner of
the node (if it has permission to do so), and transparently attach the
started filesystem to the node.  Only then the initial access
triggering this "auto-mounting" is processed.

As is apparent from this description, there are two possible ways a
translator can come into existance: It can be started explicitely by
the user, or it can be started implicitely by the filesystem.  This
difference is what makes translators problematic.

What exactly is the difference?  If the same command string is stored
as a passive translator as the user uses for the active translator,
then the difference can not be in the translator itself.  In both
cases, the same code is executed.  Rather, the difference is the
execution environment of the translator.  In particular, we are
talking about the environment variables, umask, sigmask, the initial
capabilities.  In the Unix world, the latter translates to:
authentication capabilities (user IDs), current and root directory
port, initial file descriptors (stdin, stdout, stderr), controlling
terminal, and the process server.  As you can see, this is quite a
hefty list.

What is the execution environment of an active translator?  This is
simple to answer.  It is inherited from the process starting the
translator.  This is usually the "settrans" utility, which itself
inherits it from the user's shell (for example).  settrans has the
ability to modify this environment a bit (for example, running the
translator in a chroot), but usually it will just pass it through.
It is important to note:

  There is nothing wrong with how active translators work.

The capabilities the active translator gets are obviously available to
the user.  The user has access to the node to which the translator is
attached.  No security issues relevant to this discussion arise.

What is the execution environment of a passive translator?  Here the
answer is trickier.  The execution environment is now provided not by
the task installing the translator, but by the filesystem doing the
transparent startup.  The situation is so muddy and underdocumented
that I couldn't even answer the question without looking at the source
code.  The filesystem will provide what it thinks is a "sane"
environment.  The root directory for example will be the root
directory of the file system.  The authentication port will be to the
filesystems authentication server, but representing only the user IDs
of the owner of the filesystem node on which the passive translator is
installed (if the filesystem can create such an authentication port,
ie the filesystems user ID list must contain the user ID of the user.
This is not a problem if the filesystem runs as superuser).  I assume
that the umask etc are set to some "sane" values.

But let's take a closer look.  Take for example a "firmlink"
translator, which is a bit of a cross between a soft link and a hard
link.  Furthermore, consider a user running a shell in a chrooted
environment.  The user is malicious and wants to escape the chroot.
If the user installs an active firmlink to "/" somewhere in its
chrooted filesystem tree, this fails: The firmlink translator will see
the same chrooted root directory as the user themselves.  But what if
the user installs a passive firmlink translator and then accesses it
(for example via an ls -l)?  Then the filesystem will do the startup
_and give the firmlink translator a port to the real, unchrooted
directory port_.  The user can escape the chroot following the
firmlink.  This is an obvious security exploit.

Now the question is, can this be fixed?  Can for example the
filesystem realize that the user installing the passive translator is
chrooted, and thus store the chroot information in the passive
translator setting?  I will leave it to you to think about this for a
while.  I hope you will realize that it is impossible to do this
correctly in the general case (I can follow up if this is not clear).
The main reason why this is impossible is that the chrooted root
directory of the program installing the passive translator is a
capability, and not a string, and a capability can not be represented
as a byte stream and written to disk in the Hurd.

And this is only one example.  You can easily contrive more, using
other parts of the execution environment as a lever for an attack.

The root of the problem is that the passive translator gets its
execution environment from the wrong source.  It gets it from the
wrong source, because the execution environment is not stored with the
passive translator setting.  In fact, there is no mechanism in the
Hurd to store the execution environment with the passive translator
setting.  Neither is there a "trusted" way to extract it from the
program installing the passive translator, nor is there a way to
actually write it out to disk (while the environment etc could be
written to disk easily, it is not clear how you could verify that it
still _means_ the same thing when the passive translator is started.
And of course, the capability set that is part of the execution
environment can not be written to disk at all in the Hurd).  Please
note that this argument goes _double_ if you shutdown the system and
the task installing the passive translator, including its execution
environment, is lost completely!

Thus, I declare passive translators to be thoroughly and unfixably
broken, while active translators are fine.

Now, let's revise the reason why passive translators where invented.
They were invented to make translator settings "persistent" across
reboot, because across reboots active translators are lost.  Above I
showed why passive translators do _not_ make translator settings
"persistent" across reboot.  They only make part of a translator
setting persistent, while they ignore others.

This problem simply goes away if active translators never, ever are
lost.  In other words: If you never reboot.  Then the active
translator, including its execution environment, will persist in the
real sense of the word.  This is how persistence solves the passive
translator problem.

Interludium: Orthogonal persistence.

Note that not the whole system needs to be persistent to make this
work.  "Orthogonal persistence" were only the active translator tasks
and their execution environment are persistent would suffice.  In
practice, the initial file descriptor set of an active translator
alone can contain arbitrary capabilities to arbitrary other servers.
It is very hard to define a useful "object perimeter", ie a boundary,
in a system like the Hurd.  Thus, orthogonal persistence seems to be
hard to achieve in praxis.  The same argument applies to the following
point.

2. User IDs

User IDs are the root of all evil.  Actually, it's not the user ID
itself.  It's the way it is used as the basis for access control list
(ACL) based authentication.  So let me restate that: ACLs are the root
of all evil.  Or in the words of Shapiro: "Unix is just as insecure as
Windows".  Here is a short list of problems with ACLs: "Failure of
least privilege", "Failure of selective access right delegation",
"Failure of rights and information transfer control", "Failure of
endogenous verification" (see Shapiro's thesis).

The Hurd is, in some sense, a capability system, because Mach is.
Mach does not know about user IDs.  However, after the root filesystem
and init, the very first server the Hurd starts is the "auth" server,
which implements ACL based authentication.  And it is going down-hill
from there.

In the end, the Hurd is a shizophrenic system in that it has the
powerful notion of capabilities, but capabilities are obtained through
a mix of other capabilities plus ACL based authentication.  It is true
that in the Hurd you can not open a file if you don't have a
capability to its parent directory.  So, you can completely hide whole
filesystem hierarchies from users, and no user ID in the world will
give them access.  In reality, all files are accessible from the root
directory port, and that can be obtained even in chrooted
environments, either with the above exploit, or by simply asking the
proc server for it (!!!).

So, to get rid of ACLs, we need to get rid of user IDs, too.  Now you
will ask, if we don't have user IDs, how can you authenticate yourself
to a server?  The answer is you can not!  That is the whole purpose of
having a capability server: That you can not access objects you don't
have the authorization to access.  So, to get access to an object,
someone will have to _give_ it to you.  Your initial set of
capabilities will define which objects you can access and which you
can't.  How do you get your initial set of capabilities?  The system
administrator will give them to you when he creates your "account".
So far, so good.

Now, it is unreasonable to have the system administrator come into
action every time you log on to the machine.  So, the system
administrator will set up a "session server", which will store your
capabilities while you are not logged on to the machine.  Again, easy.

However, it is also unreasonable to expect that the system
administrator will come into action and set up 1000s of accounts every
time the machine boots.  So, you will want to store the capabilities a
user has in their session to disk.  Note that we have come back to a
similar problem as storing the execution environment of a passive
translator to disk.  But now, we don't want to store the execution
environment of a task, but the whole session environment of a user.

So it seems that getting rid of ACLs means introducing sessions, and
you want to save sessions to disk.

Again, persistence solves the problem: If the session manager is
persistent, then the sessions will be created once, and then just stay
persistent over the whole time that it is meaningful to talk about an
account on the machine!

Again, only some "basic" set of capabilities needs to be stored in a
persistent manner.  Further capabilities could be derived from that
set.  However, finding an object perimeter is again a challenging
problem in general.

3. Summary

I have shown two examples where persistence solves a fundamental
security problem quite elegantly.  The first problem is that of
long-lasting translators which should survive a machine restart.  The
second is that of user sessions which replace the ACL based
authentication.

In both cases, some sort of orthogonal persistence could be used
instead, but the fundamental problem of finding suitable object
perimeters is challenging at least.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]