bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Review of Thomas's >2GB ext2fs proposal


From: Neal H. Walfield
Subject: Re: Review of Thomas's >2GB ext2fs proposal
Date: Tue, 17 Aug 2004 04:37:29 -0400
User-agent: Wanderlust/2.8.1 (Something) SEMI/1.14.3 (Ushinoya) FLIM/1.14.3 (Unebigoryƍmae) APEL/10.6 Emacs/21.2 (i386-debian-linux-gnu) MULE/5.0 (SAKAKI)

At 16 Aug 2004 13:11:26 -0700,
Thomas Bushnell BSG wrote:
> 
> "Neal H. Walfield" <neal@cs.uml.edu> writes:
> 
> > > 4) And finally, what about data caching--which is vastly more
> > >    important than mapping caching?  My version has it that data is
> > >    cached as long as the kernel wants to keep it around, and in a
> > >    fashion decoupled from mappings.
> > 
> > I evict when the kernel evicts.  This keeps the accounting data
> > proportional to the data in core.
> 
> How do you know when the kernel evicts?

If the pages are dirty, then we already get a message from the kernel,
specifically, memory_object_data_return.  In the case of non-dirty
pages, we have to ask the kernel to tell us explicitly via precious
pages.  Because there is some overhead involved, data is only marked
precious if it needs to be.  This is decided in my proposal on a per
pager basis (although Ogi's code does it on a per page basis) via a
simple extension to pager_create.

If you didn't have this eviction strategy in mind for draining the
mapping cache, I am curious what you were going to do.  It seems to me
that anything else would be in far greater contention with the
kernel's eviction strategy.  This is my analysis from the I sent on
the 16th entitled "Review of Thomas's >2GB ext2fs proposal:"

    We must also consider what the eviction mechanism will look like.
    Thomas makes no suggestions of which I am aware.  If we just evict
    mappings when the reference count drops to zero, the only mappings
    which will profit from the cache are pages being accessed
    concurrently.  Although I have done no experiments to suggest that
    this is never the case, we need only consider a sequential read of
    a file to realize that it is often not the case: a client sends a
    request of X blocks to the file system.  The server replies and
    then, after the client processes the returned blocks, the client
    asks for more.  Clearly, the inode will be consulted again.  This
    scenario would have elided a vm_unmap and vm_map had the mapping
    remained in the cache.  Given this, I see a strong theoretical
    motivation to make cache entries more persistent.
    
    If we make cache entries semi-persistent, a mechanism needs to be
    in place to drain the cache when it is full.  The easiest place to
    trigger the draining is just before a new mapping is created.  But
    how do we find the best region to evict?  The information we are
    able to account for is: when a given region is mapped into the
    address space, the last time the kernel requested pages from a
    region, and when the last references to a region were added.
    Except for the last bit, this information is already in the
    kernel.  One way we could take advantage of this is to use the
    pager_notify_eviction mechanism which Ogi is using and I described
    in a previous email [1].  If the kernel does not have a copy (and
    there are no exant user references), then the page likely makes a
    good eviction candidate.  This data can be augmented by the amount
    of recent references in conjunction with a standard clock againg
    algorithm.  But really, that final bit is unnecessary: once the
    kernel has dropped a page, the only way we can get the data back
    is by reading it from disk making an extra vm_unmap and vm_map
    rather cheap.  Strictly following this offers another advantage:
    the cache data in the file system remains proportional to the
    amount of data cached in the kernel.  This, it seems to me, is a
    good arguement to keep the region size equal to vm_page_size, as I
    have in my proposal.

Thanks,
Neal




reply via email to

[Prev in Thread] Current Thread [Next in Thread]