l4-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Drivers


From: olafBuddenhagen
Subject: Re: Drivers
Date: Sun, 5 Jun 2005 02:00:50 +0200
User-agent: Mutt/1.5.9i

Hi,

Thanks for all the input :-)

> > For one, with drivers at POSIX level, there is no need for a
> > sophisticated, full-featured special driver framework: Library
> > functions, loading and unloading drivers, communication amongst
> > drivers and to the outside world, configuration management/parameter
> > passing, resource management -- it's all already there, using the
> > ordinary POSIX/Hurd interfaces. All we need are a few extensions to
> > the standard POSIX/Hurd mechanisms, to allow for driver-specific
> > stuff like IRQ handling and I/O port access.
> > 
> 
> That depends on what you mean by 'POSIX level'. Drivers should be
> 'user processes', but they don't need the full POSIX functionality.
> Actually most of the posix functions are useless for driver
> programming.

Many functions are probably useless, but they do not do any harm either.
Originally I wasn't even sure about using the standard libc myself; the
proposal would work with a stripped down driver-specific library as
well. But how do you know in advance which functions really a driver
developer might want to use? Where do you draw the line between
low-level drivers using a stripped down library, and higher level stuff
using the standard libc? What about parts of a user-space UNIX program
being turned into a Hurd driver? After thinking about it a bit, I
arrived at the conclusion that unless there are some very good reasons
not to, we should use the standard libc. So far, I haven't found any
such reasons.

> At most you need basic C functions like memcpy etc. The standard posix
> read/write interface is not suited for communication between drivers
> as it's stream based. This means the driver has to figure out where
> the message boundaries are which is a waste of CPU time and error
> prone.

I'm not sure what you mean here. Could you elaborate on this a bit,
please?

In general, if the existing standard interfaces are too limited, we can
extend them as needed; but not replace. It's up to the individual driver
developers to decide which function best fits the purpose.

The whole point of this proposal is to allow drivers using as much
standard functionality as possible, not limiting the driver authors to
some arbitrary set of features the creator of the framework considered
useful.

> I never found the 'limited environment' a problem for doing drivers.

If you do not feel it to be a problem, could that be because you are
used to think of the driver environment as some closed world; that you
learned to accept the limits as natural?

With those borders lifted, we get completely new possibilities what the
drivers can do -- once getting used to the new way of looking at
drivers, any restrictions could very soon become painful. Remember:
Those who do not move won't feel the chains :-)

(The same is true for the Hurd in general: Most people have a very hard
time really understanding the great advantages of the Hurd, as they
aren't capable to think beyond the limitations of a monolithic system
they have grown to accept as natural; fail to see how vast numbers of
problems can be solved considerably simpler without such limitations.)

> Lack of clear specifications and correct understanding of the hardware
> operation is what generally causes mistakes.

This is not that much about mistakes; it's more about how many people
are willing to write drivers, and how long it takes to write or (even
more importantly) to port existing drivers. A nicer framework can help
here, I believe.

> The differences in the way hardware and software people look at a
> system obviously don't help here.

True... Recently I was asking several companies about IT jobs, and most
told me that their hardware developement and software developement are
strictly seperated :-( (So it's hard to find an interesting job for
people like me, interested in both...) No wonder most hardware comes out
crappy as it is. Well, but that's pretty OT :-)

> > Furthermore, drivers being ordinary programs removes the need for
> > any (imperfect) magic making drivers more accessible by giving them
> > some semantics of ordinary processes -- they just *are* ordinary
> > processes, with all the nice things that come with that. Perfect
> > transparency.
> > 
> 
> But they are not ordinary programs. You can't just kill -9 a driver
> and hope the system survives for example. If you're lucky not much
> happens, but likely some other programs will stop working and if
> you're unlucky you seriously damaged your filesystem for example. This
> is because hardware does things completely asynchronous from the main
> CPUs. Only the driver knows how to properly shut down the hardware so
> resources can be freed. 

Obviously you have to know what you are doing -- but that's the general
assumption when working with root privileges. Of course, some actions
may be risky or useless in certain cases; but the idea is that we should
nevertheless offer the possibility. After all, there still might be --
and in the example of forcefully removing malfunctioning drivers
definitely are -- legitimate use cases for any functionality. Only
because we do not see the need, it doesn't mean something is useless.
Leaving all possibilities open greatly enhances the power of the system.

The whole "Why?" section in my proposal is really about showing examples
how it creates new possibilities, and how these could be used. Not every
single example might be appealing to you; but it should show how the
nearly unlimited possiblities opened, generally turn out useful in many
many situations.

> > There is one more advantage of drivers being ordinary programs,
> > contributing to the ease of driver developement: All the ordinary
> > debugging tools, like GDB or rpctrace, can be used in the usual
> > manner. The fact that we use standard filesystem operations for all
> > the interfaces of a driver, also considerably helps debugging.
> > 
> 
> You don't need to have full POSIX functionality to be able to debug an
> L4 process.

Sure... But it makes it easier :-)

> Also you should be very careful as stopping drivers can cause the
> system to crash (eg. if you would stop the HD driver, other programs
> will likely fail when they need to page in code for example.) In
> general GDB is not very helpful for debugging drivers either. Some
> sort of trace functionality makes much more sense in my experience.

Of course not every debugging tool turns out useful in every situation.
But I can think of enough cases where standard tools are useful in
debugging parts of a driver, including GDB. Again, having the
possibility is definitely a Good Thing (TM).

Also note that while GDB might not be the optimal tool in many
situations, being able to use the tool they are familiar with, certainly
gives some confidence to novice driver developers/porters. It lowers the
entry barrier, which is very important IMHO.

> > If some application needs access at a lower level than usual, there
> > is no need for creating special interfaces circumventing the higher
> > level driver parts. Just plug in at the desired level in the
> > hierarchy -- the driver components not being any special, every
> > program can take over their function.
> > 
> 
> Most hardware has no provisions to allow access by multiple processes
> safely. Probably the most important task of the driver is to provide
> this multiplexing. 

My proposal mentions the need for extra driver layers wherever some
mulitplexing is necessary, in the context of locking. I guess I should
mention it in a more generic context as well.

An application might want to plug in just above the multiplexing layer,
so the resource can still be shared, but skipping all the higher-level
driver layers (abstraction). Or in some situations it might even be
useful to pass the control over a resource to some specific application
alltogether. We have full flexibility here :-)

> > Hot-plugging isn't special anymore: From a fully dynamic system,
> > with drivers always being launched explicitely (automatically on
> > system startup and hotplug events, or manually by the user), to a
> > totally static system, where drivers are set up only once and remain
> > there (using passive translators), everything is possible. You can
> > handle non-removable devices dynamically, to automatically adapt to
> > changed system configuration, or you can set up hot-pluggable
> > devices statically, like a mouse that is always connected to the
> > same USB port for example. You can even combine them in kind of an
> > inverse manner, e.g. handle connecting to a docking station by
> > dynamically inserting a tree that has drivers for all the devices in
> > the docking station set up statically.
> > 
> 
> The special thing about hotplugging is not the loading of the drivers
> itself, but deciding which driver to load, having the driver cope with
> unexpected removal of the device, handle power management properly,
> cope with a (theoretically) unlimited number of device instances, ...

Most of these I actually had mentioned in the document, though not that
explicitely. I have reworded the section somewhat now, hoping it is
clearer.

Several instances of the same device shouldn't actually require any
special attention in the driver itself, if the driver is cleanly
implemented. For the system it's an interesting problem of course, but
this is not really limited to hotpluggable devices (the same happens for
several sound cards plugged in the normal PCI slots for example), and
IMHO it is not inside the scope of the device driver framework either.
My proposal is flexible enough not to impose any limitations in this
regard at the driver level.

Power management is actually a topic I hadn't covered at all. Thanks for
pointing this out. I have added a new section now.

> > The drivers themself can also be designed very flexibly: Maybe using
> > a simplistic approach with a single process handling everything. Or
> > separate the critical lowest-level parts (register access) from the
> > more complex but uncritical higher-level parts into two separate
> > components (like KGI/GGI does).
> 
> I don't see how POSIX helps here. You can always divide functionality
> in multiple processes and have them communicate in some way. (eg using
> L4 messages)

Obviously. The problem is that if you have a separate driver domain, you
will have the lower-level driver parts running in the driver framework,
while higher-level functionality will work in the normal application
domain, using a different framework. With two different systems in the
stack, matching the driver components at both sides might be quite a
headache. In my proposal OTOH all the drivers at different levels work
within the same framework, allowing for uniform management and thus
avoiding such trouble. Modular drivers become more feasible.

> > One interesting problem is handling dependencies between the
> > drivers: What happens if a driver tries to access some functionality
> > that is not available due to some other driver missing?
[...]
> It's much more difficult then that. Eg the VM generally relies on the
> backingstore being available. So having the drivers for disk or
> network rely on libraries being paged in on demand is obvious a recipe
> for a deadlock.

This actually reveals two omissions: For one, I forgot to mention memory
management alltogether. (Though I thought about it.) Second, I didn't
consider the specific problem you mentioned...

The solution is quite easy, though. All we need to do is declaring the
relevant parts of drivers involved in swapping as non-pageable. (Using
the same interface necessary anyways for protecting time-critical driver
parts.)

> It's important to have as few overhead as possible for accessing the
> registers. Some devices (eg UART) are latency sensitive.

As I've mentioned, it's possible to implement shortcuts when it turns
out necessary. However, I'm not even sure it's worth implementing a
generic interface for that -- my guess is that there are only very few
cases (if any) where we really get into serious trouble. For most, we
can probably just ignore it, or add shortcuts as an optimization later
on.

In any case, knowing that we have a way to handle this if it really
turns out necessary, seems sufficient for now; no need to give too much
consideration on this in advance.

> > There are basically two kinds of DMA: The old ISA DMA uses a central
> > DMA controller in the chipset, with a number of DMA channels
> > statically assigned to individual devices. This one is easy to
> > handle: Simply use a driver for the DMA controller, which handles
> > requests from the drivers of devices using DMA. To make sure a
> > client actually has the permission to read/write the requested
> > memory region, it is required to map that region to the DMA driver.
> > 
> 
> Some Non PC architectures have seperate DMA controllers which can
> autonomously do memory - memory, memory - bus and bus - memory
> transactions. They are very useful as DMA controllers can use the
> busses more efficiently as they tend to have better capabilities in
> generating burst transactions. Also the CPUs can continue processing
> while the DMA is in progress. I believe it's sometimes more efficient
> to have the DMA controller on the host side then on the PCI side, as
> read transactions over a PCI bus are rather slow due to the bus
> turnaround times etc. So in practice we need to support both
> efficiently.

I'm not sure I understand this section properly. Is it to say that not
only ISA, but also some PCI systems use central DMA controllers?

Supporting them shouldn't be a problem; we can use a DMA driver just
like for ISA DMA.

> > The other kind of DMA is more problematic: Modern systems (PCI) use
> > a builtin DMA facility in the individual device, allowing the device
> > to access RAM completely on its own. This means however that only
> > the specific device driver knows how to setup DMA -- there is no way
> > for the parent driver to prevent the child from doing something
> > harmful, by writing wrong values into the device's address
> > registers.
> > 
> > While this efficiently prevents really optimal robustness, the
> > robustness can be improved by a few orders of magnitude using
> > modular drivers, as explained above: Use an extra process
> > (sub-driver) that is responsible exclusively for setting up DMA for
> > the specific device. (Of course checking for a valid memory mapping
> > from the requesting process beforehand.)
> > 
> 
> The driver process needs to know the physical address of the pages
> involved. There is no way around that in systems without an IO-MMU.

AIUI, all memory mapping operations on L4 use physical addresses, so
this shouldn't be a problem, I believe...

> > I'm not sure how IRQs should be handled. If possible, I'd go for a
> > solution using a central IRQ driver receiving all interrupts (L4
> > allows setting a receiver thread for each interrupt slot, which in
> > trun gets the interrupts through special IPCs), and passing them on
> > to the individual drivers through IRQ ports exported as a
> > filesystem. Connecting to those ports would be managed by the bus
> > drivers.
> > 
> > There are two problems with this approach: For one, the interrupt
> > driver doing an RPC to the actual driver on each interrupt
> > introduces an overhead, which can be considerable in some extreme
> > cases. (Fast serial ports and gigabit Ethernet can generate up to
> > several hundred thousand interrupts per second.)
> > 
> > Also, I'm not sure whether a central IRQ driver could handle PCI
> > shared IRQs. (Did I mention already that these should be outlawed?
> > ;-) )
> > 
> 
> The solution here is to have the driver directly listen for L4
> interrupts. For shared IRQs the drivers have to cooperate. Basically a
> driver forwards the IRQ message to the next driver unless it's the
> last driver in the chain.

I'm aware that this is a possible approach. (After all, I've read your
fabrica proposal :-) ) I only wonder: Do we *have* to do it that way, or
could it be possible to handle shared IRQs in a central IRQ server
somehow? I do not like the idea of chaining drivers very much, as it
makes drivers depend on other, unrelated drivers behaving correctly --
which is what I'm generally trying to avoid at all costs, even at the
expense of some efficiency. (IMHO, robustness is more important than
performance.)

Thanks again for your comments. Sorry the reply took me so long; I was
too busy to incorporate all your feedback in the proposal as well as
write up this reply sooner :-( Of course, I'm still very much interested
in any feedback...

Here is the updated proposal: (Some changes and several additions in the
"How?" section)

  POSIX Level Drivers
>=====================<

_What?_

In "traditional" monolithical kernels (Linux, BSD), hardware drivers (like many
other things) reside in the kernel. This is simple, but there are a couple of
drawbacks: Most notably a single malfunctioning driver running with kernel
privileges, can do any amount of harm to the whole system. (Data loss, crashes,
security problems.)

Thus, it is desirable to have the drivers in userspace instead, with strict
protection domains (separate address spaces etc.) -- a driver shouldn't have
access to anything it doesn't need for it's operation. While the
first-generation microkernel Mach still kept the drivers in kernelspace, the
second-generation L4 allows (actually, forces) us to have the drivers in
userspace.

However, userspace is a very broad term in the case of a microkernel based
system: It ranges from the most basic system services up to application
programs. Technically (from the processor's perspective), this is all the same:
Either code runs in full privileged mode (kernelspace), or in limited privilege
mode (userspace). No other distinction. However, to users and programmers,
there is a considerable difference between the level of the basic system
services, and that of application programs.

In Hurd/L4, there are some core servers directly above the kernel, providing
what is absolutely necessary for the system to run; and some more, providing
(with the help of the C library) a more abstract and convenient, POSIX-like
environment for applications to run in.

,------.   ,------.  ,------.
|      |   |      |  |      |
`------'   `------'  `------'
,------.        ,------. application programs,
|      |        |      | filesystems etc.
`------'        `------'
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ POSIX level
     ,----. ,----. ,----.
     |    | |    | |    |
     `----' `----' `----' system servers
 ,----. ,----. ,----. ,----.
 |    | |    | |    | |    |
 `----' `----' `----' `----'      userspace
===========================================
        ,----------.            kernelspace
        | ยต-kernel |
        `----------'

Programs running at that level can directly use all the features available in a
traditional UNIX environment: Users and groups, access permissions, resource
management, program execution, processes, signals, threading, the filesystem,
streams, controlling terminals, access to standard utilites and the shell; and
of course the C library, to make use of all of these features from within a
program.

The POSIX standard(s) define all of these features. Thus, we often talk of the
"POSIX environment" for convenience. It doesn't really refer to the standard;
it just describes the set of features available in a typical UNIX environment.
Throughout this document, the term "POSIX level" is used as a synonym for the
UNIX-like (POSIX compatible) environment provided by the Hurd. (Including also
the Hurd-specific extensions, like translators, generic RPCs, or direct access
to native facilities like the auth server for example, allowing implementing
alternative security schemes.)

It is roughly the level of functionality which is offered by a monolithical
kernel (plus C library) -- POSIX level can be considered what user space is in
a traditional UNIX system.

Note that even on such a monolithical system, various higher-level system
services run in userspace, using POSIX facilities. (Think of all the essential
daemons always running on a UNIX system.) On the Hurd, also all file systems
run at POSIX level, as translators. (Monolithical systems traditionally have
those in the Kernel, but some of them recently added an option of running file
systems in userspace too, e.g. via FUSE in Linux.)

So the Hurd system servers running *below* the POSIX level, providing the
facilities for those programs running above it, are roughly speaking what was
moved out of the kernel in the transition from a monolitical to a
mircokernel-based system.

Now when moving drivers to userspace, where should we put them? An obvious
place would be somewhere among the system servers, like the other stuff moved
out of the kernel. After all, they are very low-level and essential to the
system.

Well, are they really? How about putting even low-level hardware drivers at the
POSIX level, among filesystems and applications, running them as (almost)
ordinary translators?

_Why?_

This proposal might sound crazy at first. But I'm not only pretty certain it
can be done, but also very very certain this is a thing we really want to have.

Still, this is the hardest part to explain... It's easy to have visions; to see
wonderful things in one's mind. It's not easy to write them down such that
others can get at least a glimpse of the vision... Anyways, here's me trying:

For one, with drivers at POSIX level, there is no need for a sophisticated,
full-featured special driver framework: Library functions, loading and
unloading drivers, communication amongst drivers and to the outside world,
configuration management/parameter passing, resource management -- it's all
already there, using the ordinary POSIX/Hurd interfaces. All we need are a few
extensions to the standard POSIX/Hurd mechanisms, to allow for driver-specific
stuff like IRQ handling and I/O port access.

Having no extra framework for drivers also means driver developement becomes
much easier: No need to cope with some limited environment. No need to learn
special APIs for a driver-specific library; driver registering/startup and
shutdown; memory management; threading and locking; configuration management;
communication to other drivers and the outside world; permission handling.
Drivers are written just like any other translator, using all your normal
progamming experience. All one has to know are a few fairly simple extensions
to the ordinary POSIX/Hurd interfaces. The only specific APIs a driver handles,
are access to lower level drivers via their filesystem interfaces, and
exporting an own filesystem to the world.

To the user, having no separate driver domain, means there is no longer need
for magic enchantations to manage drivers that are accessible only indirectly
through special interfaces -- as drivers are now ordinary programs, all the
standard system tools can be used instead. Starting a driver is only a matter
of setting a translator, for example. Parameters can be changed at runtime
using fsysopts.

Furthermore, drivers being ordinary programs removes the need for any
(imperfect) magic making drivers more accessible by giving them some semantics
of ordinary processes -- they just *are* ordinary processes, with all the nice
things that come with that. Perfect transparency.

Transparency is also achieved by the fact that, in a hurdish manner, all
connection/communication between the drivers happens through the filesystem.

There is one more advantage of drivers being ordinary programs, contributing to
the ease of driver developement: All the ordinary debugging tools, like GDB or
rpctrace, can be used in the usual manner. The fact that we use standard
filesystem operations for all the interfaces of a driver, also considerably
helps debugging.

Another problem solved by drivers residing in the normal application space: In
the "traditional" monolithic approach, there is always the dilemma whether some
functionality should be included in the kernel together with the low-level
drivers, or pushed to userspace. There is often no obvious separation line in
the functionality; but some division needs to be made due to the strict
technical boarder between kernel and userspace.

Putting the drivers in a special driver realm in userspace doesn't lift that
dilemma: There is still a strict separation line between the drivers and the
actual application realm. The solution is making some provisions that allow
putting everything, even the lowest-level drivers, into the application realm
-- drivers at POSIX level.

With hardware drivers, higher-level driver layers, and the actual applications
all in a single uniform framework, we get perfect consistency. Hardware
autodetection and configuration for example -- from the low-level drivers up to
application program modules -- is possible without any anomalies caused by
having to work at several different layers to manage a single piece of
hardware. If some application needs access at a lower level than usual, there
is no need for creating special interfaces circumventing the higher level
driver parts. Just plug in at the desired level in the hierarchy -- the driver
components not being any special, every program can take over their function.

It gives us unprecedented simplicity and flexibility in combining the various
components in the system; in managing configuration, ranging from fully manual
setup, over simple config files and custom shell scripts, to sophisticated
fully-automated configuration managers, or any combination thereof, uniformly
through the whole driver/application stack.

Hot-plugging isn't special anymore: From a fully dynamic system, with drivers
always being launched explicitely (automatically on system startup and hotplug
events, or manually by the user), to a totally static system, where drivers
are set up only once and remain there (using passive translators), everything
is possible. You can handle non-removable devices dynamically, to automatically
adapt to changed system configuration, or you can set up hot-pluggable devices
statically, like a mouse that is always connected to the same USB port for
example. You can even combine them in kind of an inverse manner, e.g. handle
connecting to a docking station by dynamically inserting a tree that has
drivers for all the devices in the docking station set up statically.

Another thing that becomes trivial is starting drivers on demand: Just set a
passive translator, and the system will take care of the rest.

The drivers themself can also be designed very flexibly: Maybe using a
simplistic approach with a single process handling everything. Or separate the
critical lowest-level parts (register access) from the more complex but
uncritical higher-level parts into two separate components (like KGI/GGI does).
Or even use several processes at various levels to handle all the bits: All can
be done easily and without negative consequences. No problems like having to
match components at both sides of some driver realm vs. user realm border.

There are some more advantages from drivers being ordinary programs: Not only
who is allowed to access which device is decided by standard UNIX file
permissions, but also who is allowed to install drivers for a particular device
can be managed that way -- just change the permissions on the underlying node.

Another very useful feature is that process accounting applies like to every
other process on the system. How much memory and CPU time a driver can get
compared to other drivers and normal programs, depends solely on the priority
it is given. In a more sophisticated resource management system, drivers can
get resources on account of the programs that access them -- thus the sound
card driver for example can get a high priority if the sound recording
application has a high priority and generously donates resources to the driver,
making sure the recording won't be disrupted by lower priority processes.
(Quality of Service)

Summing up, we get a *much* simpler framework; *lots* of more convenience,
flexibility, transparency etc. to the users/admins/system builders; and *lots*
of more convenience/considerably lower entry barrier for driver developers.

The only disadvantages I can see are more dependencies, and a slight overhead
here and there. (Due to using filesystem semantics instead of free-form RPCs,
for example.)

_How?_

While I claimed earlier that I'm pretty sure what I'm proposing here is
possible, this doesn't mean I've worked it out in every detail yet :-) I tried
to think through all issues as good as I can; but having no experience with
driver development (only gotten dragged into it by my KGI work, and by the idea
of POSIX level drivers appealing to me mostly from a user's point of view), I'm
often at a loss here. Thus any comments, suggesions, corrections on this
section will be especially much appreciated.

Also note that this proposal builds on the original proposal for a driver
framwork using deva/fabrica by Peter de Schrijver and Daniel Wagner. While I'm
proposing some radical departures (drivers running at POSIX level making full
use of standard Hurd mechanisms, instead of running in a special driver domain
managed by deva and using many private facilities/interfaces), there are also
many things that can be taken directly from the original proposal. Wherever
something isn't explicitely mentioned here, it can be considered to refer to
the original deva/fabrica proposal.

   _Overview_

As mentioned at the setout, the fundamental idea is to run hardware drivers as
more or less ordinary translators. They can use all the POSIX (UNIX-like)
mechanisms any other program can use. Like every program running on the Hurd,
of course they can also use the Hurd extensions to the POSIX facilities, or
access the more generic underlying Hurd interfaces directly.

Various drivers (translators) at different levels are combined to form a driver
hierarchy: Root bus driver, intermediate bus drivers (possibly nested, e.g. for
an USB controller connected to the PCI bus), leaf device drivers, and possibly
higher-level drivers (e.g. a sound driver accessing the specific sound card
drivers).

The drivers at different levels communicate exclusively by file I/O (i.e. RPCs)
on filesystem nodes exported by the lower-level drivers.

The translators are only more or less ordinary, because there are obviously
things that are special about hardware drivers, requiring some additional
facilities (usually extensions of the standard mechanisms) not necessary for
other programs/translators. However, these facilities will be generic: A driver
can use some of them as needed, while other translators (non-drivers) -- and
probably even some higher-level hardware drivers -- won't use them at all.
Technically, there is no strict distinction between drivers and other
translators.

The drivers also do not get any special permissions from the system that other
programs do not have; all they need for their work is exported by the
lower-level drivers, using the standard filesystem mechanisms.

The following sections will discuss various topics relevant for hardware
drivers, and how they could be handled. (Sometimes requiring additional
facilities, often making use of the existing standard mechanisms.)

   _Dependencies_

One interesting problem is handling dependencies between the drivers: What
happens if a driver tries to access some functionality that is not available
due to some other driver missing?

Note that this problem is not really specific to POSIX level drivers; it's only
somewhat more tricky, because the use of libc can make such dependencies less
obvious. Nonetheless, after considering it for a while, it doesn't seem to be
such a big problem after all: If we call a libc function that relies on some
specific device, it will just generate an error, like it does in many other
situations. The calling driver has to decide whether a failure in this call is
non-critical and can be ignored, or it's better to bail out. Nothing special
about it. The user (or driver manager) has to fix the order in which drivers
are loaded, if such a prblem occurs. I don't think there is anything the system
can or needs to do about it.

A special case of dependencies are drivers referencing themselfs. A console
driver for example obviously shouldn't try to print an error message on the
screen. Note that such self-reference loops could go over several drivers, so
they are not always obvious. (The error could happen in some lower-level driver
the console driver depends on, for example.) Still, this can be easily fixed:
The driver just needs to set some kind of lock preventing it from reentering
itself. Again, an issue to be handled by the individual drivers.

There is another interesting dependency case: The pager relies on the backing
store (usually hard disk) being available -- you can't swap out parts the
drivers necessary for doing the swapping...

As some time-critical driver parts need to be protected from paging anyways,
all we really need to do is extend the protection for the drivers involved in
swapping to cover all relevant functionality. (Not only the time-critical parts
as normally.) Whether a driver is needed for swapping depends on the actual
setup; so this should be passed using a command line switch, telling the driver
whether it needs to protect itself from swapping out or not.

   _Bootstrap_

Another interesting point is loading the initial set of drivers, until we have
enough to fetch further drivers from disk. This too is an issue that is not
really specific to this proposal. It's only a greater problem here, because
there are more dependencies to fulfill before the framework becomes functional.

The usual way to handle this (on *every* system using dynamically loaded
drivers), is employing a ramdisk. It would need to contain at least the core
system servers, the necessary drivers, libc, ld, and a section of /dev (with
the necessary drivers as passive translators).

If we want to employ autodetection, or some other dynamic setup scheme instead
of passive translators at this stage already, more drivers are necessary, as
well as some program doing the detection and loading the appropriate drivers,
and possibly additional tools necessary to run the program (e.g. a shell).

Setting up the drivers is done by a special task that is started by wortel,
after all the other core system servers and the ramdisk are up. It does
whatever is needed to set up all drivers necessary for the rest of the bootup
process, until the real root partition can be accessed. (Root bus driver, some
more bus drivers and other core drivers, and typically a disk driver.)

Once the necessary drivers are loaded, the boot process can continue. (How this
happens is outside of the scope of the driver framework.) At some point after
the root partion is mounted, the initial driver tree is moved off the ramdisk
to the real root partition somehow. Additional drivers can then be loaded by
whatever method is used, through boot scripts and after the system is up. (This
is also out of scope; some suggestions can be found in the "Why?" section.)

An alternative approch could be using special minimal drivers for the bootup,
until we can replace them by the proper ones. The idea being that the boot
drivers can be considerably simpler, as they need support only very rudimentary
functionality, only read operations, no multiuser capabilities (resource
management/accounting, access permissons), no speed optimization; and most
important, they can use some much simpler framework. They could even be
statically linked into a single program.

The downside is of course that there would be a (slight) duplication of effort
between boot drivers and real drivers. It could also lead to an unnerving
situation where some device is supported by the boot drivers but not the real
ones, or the other way round. So this is a somewhat suboptimal solution.

Well -- unless we hack GRUB to allow us using its drivers. As GRUB needs
exactly the same drivers anyways (or we couldn't boot the machine at all), why
not simply reuse them for our bootstrapping purposes?

There is one more issue with bootstrapping: A set of capabilities for handling
system resources (IRQs, memory regions for I/O and standard I/O ports) needs to
be distributed to the drivers managing those resources. The capabilities should
be passed to the driver setup task from wortel, and need to be forwarded to the
appropriate drivers, probably using a special RPC.

This RPC could either be done immediately (after loading the drivers, or
loading them automatically if they are set up as passive translators), or upon
request from the drivers. In the second case, we explicitely need to make sure
we are talking to the translator attached to the node where the driver is
expected to be. (Otherwise, someone else might sneak in and ask for the
critical capability!)

   _I/O_

The fundametal distinction of hardware drivers against normal programs is of
course that they have to access the actual hardware devices. This means access
to regions in the I/O address space, as well as special regions in the normal
address space (for memory mapped I/O).

It is important to understand that drivers do *not* get any special privileges
to do the hardware access, not available to other programs. All hardware access
is managed by standard Hurd mechanisms.

The *only* driver that actually gets special access priviliges is the root bus
driver, which is the primary manager for all relevant memory and I/O regions.
The child bus drivers can access the subregions relevant to them through file
I/O operations on nodes of the filesystem exported by the root bus driver, and
in turn export subregions to their children. (Further bus drivers or leaf
device drivers.)

Whenever possible, a driver should attempt to export only those regions that
are really used by the device represented by the respective node. This is not
always possible: In some cases the parent driver has no way to know what
regions the device occupies without knowing the specific device. (E.g. for
non-PnP ISA devices.) In such a case the parent will have to offer a special
interface allowing the child driver to register for regions, which will then be
exported. It just has to trust the child driver not to request wrong regions.

For faster access to memory, the child can also request the region to be mapped
to it directly (instead of having to use file I/O operations for each and every
access), by means of the standard mmap RPC. If the parent doesn't have direct
access to that region yet itself, it will request a map from its own parent,
all the way up to the root bus server.

Direct access to memory regions can usually be safely granted, unless there is
some different memory region within the same memory page. (Shouldn't ever
happen, considering that the memory address space is big enough to allow for a
wasted page fragment for padding.)

I'm not sure how mapping I/O regions should be handled. My hope is that all
performace-relevant buffers will always allow memory mapped I/O, so the default
RPC method is fast enough for the remaining I/O registers, and we need not
bother about mapping at all.

Note that I/O registers are much more critical anyways, as the I/O space on x86
was originally severely constrained, resulting in many devices sharing a single
4k page in I/O space being standard. I'm not sure this can be safely handled in
a simple manner.

On a different note, even with access restricted to memory and I/O regions that
are really used by the device in question, we are not always on the safe side:
DMA for example can cause the device to read/write wrong memory regions, if
some bogus addresses is stored in the DMA setup registers. Just like with
regions requested by the child, we can only trust the child driver not to do
something harmful.

Of course, only a privileged user will be allowed to load such trusted drivers.
(This has to be ensured by appropriate file permissions on the relevant nodes.)

Also, those cases are candidates for modular drivers, using an extra process
for the dangerous low-level stuff, or better even several ones for individual
parts of the low-level functionality. (E.g. a micro-driver only for the DMA
setup.)

   _DMA_

There are basically two kinds of DMA: The old ISA DMA uses a central DMA
controller in the chipset, with a number of DMA channels statically assigned to
individual devices. This one is easy to handle: Simply use a driver for the DMA
controller, which handles requests from the drivers of devices using DMA. To
make sure a client actually has the permission to read/write the requested
memory region, it is required to map that region to the DMA driver.

The other kind of DMA is more problematic: Modern systems (PCI) use a builtin
DMA facility in the individual device, allowing the device to access RAM
completely on its own. This means however that only the specific device driver
knows how to setup DMA -- there is no way for the parent driver to prevent the
child from doing something harmful, by writing wrong values into the device's
address registers.

While this efficiently prevents really optimal robustness, the robustness can
be improved by a few orders of magnitude using modular drivers, as explained
above: Use an extra process (sub-driver) that is responsible exclusively for
setting up DMA for the specific device. (Of course checking for a valid memory
mapping from the requesting process beforehand.)

   _IRQ_

I'm not sure how IRQs should be handled. If possible, I'd go for a solution
using a central IRQ driver receiving all interrupts (L4 allows setting a
receiver thread for each interrupt slot, which in trun gets the interrupts
through special IPCs), and passing them on to the individual drivers through
IRQ ports exported as a filesystem. Connecting to those ports would be managed
by the bus drivers.

There are two problems with this approach: For one, the interrupt driver doing
an RPC to the actual driver on each interrupt introduces an overhead, which can
be considerable in some extreme cases. (Fast serial ports and gigabit Ethernet
can generate up to several hundred thousand interrupts per second.)

Also, I'm not sure whether a central IRQ driver could handle PCI shared IRQs.
(Did I mention already that these should be outlawed? ;-) )

   _Connecting_

Connecting the individual drivers to form a driver hierarchy is simple, as it
all happens through the filesystem, setting up translators referencing each
other as necessary.

There are two approaches to the structure of the driver setup however. The
simpler one is to put all translators directly in /dev, and let them only
indirectly reference each other. (The locations of the parent drivers need to
be passed on the command line.)

A more elegant and intuitive approch is to organize the translators themselfs
as a hierachy: The root bus server exports a couple of nodes, on which some
core drivers are set. One of them is the PCI bus driver, which in turn exports
one node for each device attached. On each of those device nodes, a driver for
the respective device is set. Some of these are leaf devices, while other are
further bus drivers, like IDE or USB. And so forth.

One problem with this approach is that, if we want to setup the drivers
statically using passive translators, we need some method to permanently store
a hierarchy of passive translators, e.g. using some special translatorfs. (Note
that this is an important thing also in other situations, so most likely we
will get something to handle this sooner or later anyways.)

In reality, we do not have a perfect tree structure anyways. Some driver might
depend on several lower-level drivers, for example on the bus driver and the
DMA driver. In this case we need to decide which is the major parent on which
to set the driver, and which one will only be supplied through a command line
option. This already suggests that in practice, we will have some combination
of the possible approaches mentioned above.

   _Permissions_

As stated before, hardware drivers are not distinguished from other
applications (translators) by any special hardware access permissions -- with
the single exception of the special drivers at the core of the driver system.
(Root bus driver for memory and I/O access, IRQ driver for interrupt handling.)

All hardware access by regular device drivers is done using filesystem
operations (RPCs) on nodes exported by the lower level drivers. (Bus drivers,
IRQ driver.) In principle, any program could access those nodes.

So how can we prevent unauthorized hardware access, if any program run by any
user could access the hardware nodes? Simple: By restricting file access
permissions on these nodes. So while anybody could run a programm (possibly
self-created) that tries to do hardware access, the program won't be able to
actually access any hardware unless the user who started it has the necessary
access permissions on the reqired hardware node.

Who is effectively allowed to access the nodes, depends on the policy of the
system vendor and/or administrator: Critical nodes, which allow breaking system
integrity if used improperly, will usually be only accessible to a privileged
user. (Root only, or optionally some special system user.) In a typical
workstation setup, some nodes might be accessible to the user logged in an the
system's primary console. (As that is the one who can physically use the
devices.) Devices that can be safely shared by higher-level drivers, can even
be accessible to everyone.

UNIX file access permissions also allow more sophisticated setups: The
administrator might choose for example to allow ordinary users to load drivers
accessing even some critical nodes, if the drivers themselfs are only from a
trusted set. This can be achieved by using suid or sgid on the trusted drivers,
so users can run them even if they have no direct access permissions to the
relevant hardware nodes.

   _Small Spaces_

Small spaces are an optimization in L4 on x86, using the segmentation features
to map a task's address space into all other tasks' address spaces. That allows
switching to and between the tasks in small spaces at any time without doing a
full context switch. (Otherwise a context switch is quite expensive on x86.)
Thus drivers, which are called often due to frequent hardware interrupts or
requests from the users, do not cause a considerable penalty.

To allow for small spaces to be used, we need to make sure that a task's
address space is compact (no big holes), so it actually fits in a small space.
This isn't driver-specific, though -- in principle, *any* task that is small
enough is eligable.

   _Locking_

I don't really know what kinds of locking issues can be involved with hardware
drivers. I think there should be nothing requiring special handling.

Whenever some resource needs to be used by more than one client, it should be
handled by an extra driver, which makes sure that access is properly
encapsulated. Priority to access the resource has to be handled by the
accounting mechanisms.

If we only need to make sure some operation happens quick enough, no real
locking should be necessary. Instead, temporarily raise the priority, again
using accounting mechanisms.

   _Hotplugging_

As already mentioned, hotplugging isn't really any special according to this
proposal. If a bus driver detects a new device, it will simply create a new
device node in the exported filesystem. The user could now set a driver on that
node manually, or some hotplugging daemon (shell script or C program or
whatever) can listen for file/dir change notifications, and perform the
necessary actions to load the correct driver and/or set appropriate access
permission, once a new node appears.

Unplugging means the driver node will disappear. The driver should detect this
and exit cleanly. (i.e. drivers need to be prepared for the situation where the
device is removed unexpectedly.) Higher-level drivers will in turn detect the
FS exported by the driver going away. A hotplug manager will also get a
file/dir change notification, and can take appropriate action (like shutting
down services connected to the device) if necessary.

All this means that bus and device drivers allowing hotplugging of course need
some handling for that; but we do not need any special functionality in the
framework itself. (Nor any interface beyond the obvious fact of device nodes
appearing or disappearing.)

   _Power Saving_

When entering power saving states, various drivers need to prepare first. For
suspend to RAM or suspend to disk, all hardware state needs to be saved, so it
can be restored when waking up. Services attached to external interfaces (e.g.
network card) may need to be shut down, as they won't be able to react on input
while suspended.

Shutting down all drivers can be done recursively: The root bus driver is
informed that the system wants to suspend. It notifies all the bus drivers
about it, which subsequently inform the child bus drivers and so on, all the
way down to the leaf device drivers. The interested drivers receive the
notification, do whatever is needed, and then tell the bus drivers that they
are ready. Once all device drivers on a bus have indicated readiness, the bus
driver does it's shutdown and subsequently reports to the parent bus driver
that it is ready. Once all busses are finished, the root bus driver initiates
the necessary steps to actually perform the suspend.

On wakeup, the drivers are recursively notified again. (Only that the bus
drivers do their reinitialisation before notifying their children, and no
confirmation is needed.) If some device has been removed during the suspend,
the device driver will get the notification about that from the parent bus
driver, instead of a wakeup notification. (Again, drivers for hot-pluggable
devices must be prepared for the device being removed at *any* time...)

Shutting down higher-level services is handled the same way, only that instead
of the device driver, the interested application itself and/or some manager
process is listening for the notifications.

   _Logging_

Another interesting issue for drivers is logging: For debugging purposes, the
user will want the drivers to report some diagnostic messages, which will
usually show up on the screen and also be written to a log file on disk.
However, (unless using boot drivers) the messages can be printed to screen only
once the display driver is up, and to disk only once the disk driver and
filesystem are up. Furthermore, in an alternative setup a user might want the
messages reported to a serial console for example, or a network connection, or
whatever.

The power of the POSIX level driver approach shows again here: We need
absolutely *no* special handling for any of the scenarios described above; all
of them and many more can be set up easily and intuitively using standard
facilities.

Basically, the drivers just print all output to the standard output and
standard error streams, like any other program would do. (Optionally,
additional streams might be used for verbose diagnostic messages.) For testing
purposes, the user might run the driver directly, with debugging output printed
to the console.

In typical use, the streams will be redirected however, e.g. to a FIFO. Once
the display driver is up, we could tee the output of that FIFO to the screen
and to anther FIFO, which in turn will be forwarded to a log file once the disk
and FS drivers are up. Or instead of such a FIFO setup, we could use a special
log translator, which could handle this automatically. (More elegant but less
tranparent and flexible probably...) In either case, instead of writing to
screen/disk directly, we could forward the messages to a more traditional
syslog daemon. And so on. Again, the number of possibilities is virtually
unlimited :-)

   _Memory Management_

Hardware drivers can have some specific memory requirements: Some code and data
areas need to be safe from swapping, as otherwise the driver might not be able
to react to certain critical events fast enough. Other memory regions not only
are not allowed to be swapped out, but actually need to keep a fixed physical
address, as the hardware relies on that (DMA).

To avoid the need for a special memory manager, we need to extend the standard
memory manager library with an API allowing to request such special memory
regions.

   _Implementation_

>From the above considerations, there are only few extensions over the standard
POSIX/Hurd mechanisms necessary. Most of the driver infrastructure is actually
contained within the core drivers, like the root bus driver, IRQ driver etc.,
as well as the intermediate bus drivers.

What extensions/additions we need are:
- Some interface for requesting non-pageable or wired memory in the default
  memory manager
- The initial driver setup task
- The interfaces for passing the hardware access capabilites
- Handling of RPCs on these capabilities in wortel
- Maybe the interfaces for passing direct access on I/O ports, and other
  possible shortcuts as necessary

That's it. Everything else is in the drivers themself, and the interfaces
between them.

Of course, the core drivers are special and need to be considered part of the
framework, unlike other drivers that can just be plugged in. Defining the
interfaces for the bus drivers (and maybe some helper functions for handling
them) is also part of the framework. So when implementing the proposal, all of
these have to be considered.

Note that heavily depending on the functionality offered by the Hurd, the
developement of the driver framework is inherently interwoven with the
developement of the Hurd on L4 itself.

Implementing the proposal should probably start with getting a ramdisk (or
hacking GRUB2 so we can access the real disk), and writing a first take on the
driver setup task and the hardware capability handling. Having this, the root
bus driver and PCI driver can be implemented. (Possibly without direct memory
region mapping at first -- this is an optimization that is not strictly
necessary.) At this point, we can start implementing the first device drivers.
Meanwhile, the IRQ driver and more bus drivers can be created, gradually
extending the range of possible device drivers to write.

All along the way, missing functionality in the Hurd needs to be filled: libc
functions, core server facilities, process accounting.

Note that, while the way outlined is probably the logical route following the
dependencies, different approaches are possible. In principle, we can write
some first preliminary drivers just now: Running the (few) drivers directly
from wortel for now, not implementing the proper bootstrapping procedure using
filesystem and process (translator) startup mechanisms; implementing the bare
RPC interfaces directly, without help of the libc and libnetfs wrappers for the
filesystem abstraction; and passing the RPC ports through the environment,
instead of using filesystem lookups -- all the missing facilities can be
(temporarily) avoided. This way we can get us started right away, creating the
badly needed IDE driver for example. It can then be moved to the proper
infrastructure later, once the foundations are in place.

===============================================================================

-antrik-




reply via email to

[Prev in Thread] Current Thread [Next in Thread]