bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: removing an ext2fs file forces disk activity


From: Marcus Brinkmann
Subject: Re: removing an ext2fs file forces disk activity
Date: Mon, 25 Mar 2002 22:09:17 +0100
User-agent: Mutt/1.3.27i

On Mon, Mar 25, 2002 at 09:53:41PM +0100, Niels Möller wrote:
> One problem is that if the filesystem modifies block A, then block B,
> and then block A again, then you may need to keep this ordering, and
> not merge it as one write to A and one to B. Is the touch-rm-loop of
> this kind? Then I guess the simple ordering breaks down.

That's basically what we are talking about.  If you only want to delay a
write, you can always do that, but currently we can not delay a write _and_
keep some ordering, because if you write and don't sync, and then unlock the
node, all information about the write is "forgotten".

If you read Thomas' reply again, he suggests that one way to optimize it is
to record the dependencies, so delayed writes can be ordered appropriately.

> Ah, thanks for the explanation. Are directories on disk treated just
> like files, in this respect?

directories are always treated like files, except by the user (who will call
different callbacks on either one), and for the supported operations on
them.  We call this thing a node.
 
> > Well, this is the interface used by the pager to read/write the underlying
> > store data.  Note that the actual caching is done in the kernel
> > (gnumach/vm/*).
> 
> Hmm, and this level of caching deals with for both caching reads and
> delaying writes?

If you look at the implementation of io_write in libdiskfs, at the root of
the stuff is _diskfs_rdwr_internal, which calls:

  memobj = diskfs_get_filemap (np, prot);

  if (memobj == MACH_PORT_NULL)
    return errno;

  err = pager_memcpy (diskfs_get_filemap_pager_struct (np), memobj,
                      offset, data, amt, prot);

What makes the disk blink is the explicit synchronization at specific
places where it is required.  A normal write will simply write to the mapped
file content, which will mark the pages as dirty in Mach for later page out.
 
> > There is also a synchronization interface (and the amount of
> > synchronization done by the filesystem here is what started the
> > discussion.
> 
> Syncronization is performed on a per-memory-object-basis, or per page? 

You choose.  If you want per page.  In libpager/pager-sync.c:

/* Have the kernel write back all dirty pages in the pager; if
   WAIT is set, then wait for them to be finally written before
   returning. */
void pager_sync (struct pager *p, int wait);

/* Have the kernel write back some pages of a pager; if WAIT is set,
   then wait for them to be finally written before returning. */
void pager_sync_some (struct pager *p, vm_address_t offset,
                      vm_size_t size, int wait);

> I think I'm starting to realize what the hair is about. The filesystem
> wants ordered transactions on the store. And to give it that, without
> updating the real disk blocks at transation boundaries, one needs to
> maintain information about the transactions in the actual caching
> code, in the kernel.

Well, I think what Thomas wants is to maintain the information in the
filesystem pagers.  But for this they have to cooperate and synchronize each
other.  For example, when deleting a node, the directory node pager must
make sure it can write back the directory node before the metadata (disk
pager) writes back the page with the inode information.
And this cooperation between pagers is what makes it hard.
 
> Does the block caches of modern unices deal with this syncronization
> problem at all, or do they just leave it to fsck or to special
> transactional file systems?

Well, I haven't checked.  Thomas seems to think that Linux doesn't deal with
this specific synchronization in ext2fs.

Thanks,
Marcus

-- 
`Rhubarb is no Egyptian god.' Debian http://www.debian.org address@hidden
Marcus Brinkmann              GNU    http://www.gnu.org    address@hidden
address@hidden
http://www.marcus-brinkmann.de



reply via email to

[Prev in Thread] Current Thread [Next in Thread]