[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Paging Interface

From: Neal H. Walfield
Subject: Paging Interface
Date: 14 Jun 2002 22:17:20 +0200
User-agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/21.2

The Mach Paging Interface as Used by the Hurd
--- ------------------------------------- - -

Mach uses a single monolithic virtual memory server.  This server is
implemented in the kernel and dictates the paging policy (i.e. when
pages are evicted and by whom).  Where pages come from and where they
go is irrelevant from the kernel's perspective: Mach merely provides
the mechanism in the IPC subsystem.

Establishing a Mapping
--- -------------- - -

The following describes how mappings are currently established under
Mach by the Hurd.  Some of the details have been abridged for the sake
of brevity.

                          /        \
                         |   Mach   |
                     /| /           |\  \
        (C) vm_map  /  / m_o_ready (E)\  \ (D) memory_object_init
                   / |/ (F) return     \  \|
                ________              ________
               /        \   ----->   /        \
              |  Client  | (A) open |  Server  |
               \________/   <-----   \________/
                     (B) memory_object

(A) The client sends an "open" rpc to the server.

(B) The server creates a handle, i.e. a receive right, adds it to the
port set that it is listening on and returns a send right to the

(C) The client attempts to map the object into its address space using
the vm_map rpc.  It passes a copy of the handle that the server gave
it to the vm server, i.e. Mach.

(D) Since Mach has never seen the object before, it queues a
memory_object_init on the given port along with a send right (the
memory control port) for the manager to use to send messages to the
kernel and also as an authentication mechanism for future
interactions: the port is supplied so that the manager will be able to
identify from which kernel a given memory_object_* rpc is from.

(E) The server dequeues the message, initializes internal data
structures to manage the mapping and then queues a memory_object_ready
rpc on the supplied kernel port.

(F) The kernel sees that the manager is ready, sets up the appropriate
mappings in the client and then replies to the vm_map rpc indicating

There is nothing stopping others from playing "the kernel."  This is
not a security problem: clients must trust the server from whom they
obtain memory object ports and also the servers with whom they share
the object.  Multiple memory managers are a reality that should be
dealt with gracefully: they are useful for network transparent
mappings etc.

Resolving Page Faults
--- ------------- - -

  (G) Client      ________
      resumed    /        \
                |   Mach   |
 (A) Fault +----|------+   |  \ (B) m_o_request  (C) store_read
       ____|___  \_____|__/ |\  \| ________         _________  
      /    +---\-------+       \  /        \       /         \ 
     |  Client  |          (F)   |  Server  |<===>|  storeio  |
      \________/       m_o_supply \________/       \_________/ 
                                      (E) return data  | ^
                                                       | | (D) device_read 
                                                       v |
                                                    / Device \
                                                   |  Driver  |
                                                       | ^
                                                       | |
                                                 /  Hardware  \

(A) The client does a memory access and faults.  The kernel catches
the fault and maps the address to the appropriate manager.  It then
queues memory_object_request on the port that was initially supplied
by the client (that is, the memory object handle).  It also supplies
the control port which the server can use to determine which kernel
sent the message.

(B) The manager dequeues the message.  On the Hurd, this is translated
into a store_read: a function in the libstore library which is used to
transparently manage block devices.  The storeio server starts off as
a separate process, however, if the server has the appropriate
permission, the backing object can be contacted directly by the
server.  This layer of indirection is desirable when, for instance, a
storeio running as root may want to only permit read only access to a
resource, yet it cannot safely transfer its handle to the client.  In
this case, it would proxy the requests.

(C) The storeio server contacts, for instance, a device driver to do
the read.  This could also be a network block device (the NBD server
in GNU/Linux), a file, a memory object etc.

(D) The device driver allocates an anonymous page from the default
pager and reads the data into it.  Once all of the operation is
complete, the device returns the data to the client unmapping it from
its own address space at the same time.  In L4 terminology, this is a
grant operation.

(E) The storeio transfers the page to the server.  The page is still

(F) The manager does a memory_object_supply transferring the page to
the kernel.  Only now is the page not considered to be anonymous but

(G) The kernel caches the page, installs it in the client's virtual
address space and finally, resumes the client.

Paging Data Out
--- ------- - -
           Change manager   Pager m_o_return    store_write
    \      _________  (B)  __(A)__   (C)  ________  (D)  _______
  S  |    / Default \     /        \     /        \     /       \ 
  W  |<=>|   Pager   |<=>|   Mach   |==>|  server  |<=>| storeio |<=>
  A  |    \_________/     \________/     \________/     \_______/
  P  |

(A) The paging policy is chosen by the kernel: servers must implement
the mechanism.  The kernel chooses the pages that it wants to evict
using a second chance fifo to approximate LRU.

(B) One the kernel has selected a page that it would like to evict, it
changes the manager from the server to the default pager.  This way,
if the server does not deallocate the page quick enough, it cannot
cause a denial of service: the kernel will just later double page it
to swap (the default pager is part of the secure computing base).

(C) Mach then transfers the page to the server in the form of a
memory_object_return rpc.  The server is expected to take the page
send it to the appropriate backing store in a timely fashion.  The
server is not required to send a response to the kernel.

(D) The manager then transfers the data to the storeio which
eventually sends it to disk.  The device driver consumes the memory
doing the equivalent of a vm_deallocate.

Paging Under L4
--- ------- - -

In L4, the kernel provides mechanism and minimal policy.  Even more so
than with Mach, the operating system can happily shoot itself in each
finger individually if it so chooses.

During system boot, the sigma0 server is started.  This server
initially contains all of the memory in the system and acts as the
default pager at least in so far as it services page faults for the
the root task providing a one-to-one physical to virtual memory
mapping.  Memory can also be requested explicitly.  Sigma0 will grant
it to whomever who asks, however, once memory is handed out, sigma0
will not map it a second time.

Each thread is L4 has a pager associated with it.  When it faults, the
pager is sent an IPC by the kernel.  The pager can supply the
indicated page or just ignore the request.  The faulted thread is
suspend in the kernel waiting for an ipc.  To resume the thread,
someone sends it an ipc containing a grant or map (i.e. memory) and
the thread is placed in the run queue.  When it next starts up, it may
either continue running or immediately fault again.

Beyond this, the memory policy is up to the underlying operating
system; L4 provides all of the mechanism that is required to do so.
The important question is, what policy do we want to enforce?  There
are several flaws with the current system and exporting policy to the
servers seems quite natural.  Consider reading _Extending The Mach
External Pager Interface To Accommodate User-Level Page Replacement
Policies_ [1].

In the very minimum, we want to be able to supply the interfaces
described in _The SawMill Framework for Virtual Memory Diversity_.
The Mach interface already does a good job of meeting most of these

One of my idea is to have a pager thread for each task.  This thread
would manage all of the mappings: both files and anonymous memory;
essentially, all memory objects.  When a mapping is made, the task
tells the pager that a mapping has been established over a given range
and the back store server.  Then, when any faults occur in that
region, the pager thread can quickly contact the appropriate server.
There is a bit more hair for task startup, however, this is just a
question of details.  This is scheme is similar to the region mapper
that is being used in Sawmill (I came up with it independently so it
must be very obvious).

What is not clear to me, however, is how we should manage physical
memory.  Would it be appropriate to have make every server a vm
server?  This could be coupled with a single monolithic anonymous
memory server which does system wide paging like Mach.  Perhaps, we
should have multiply core servers which compete for physical memory at
startup and then give it out according to some as of yet undetermined

I await your reactions and ideas.


[1] http://citeseer.nj.nec.com/mcnamee90extending.html
[2] http://i30www.ira.uka.de/research/publications/sawmill-framework.pdf

reply via email to

[Prev in Thread] Current Thread [Next in Thread]