[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ext2fs kernel profiling: most of the time in thread management

From: Sergio Lopez
Subject: Re: ext2fs kernel profiling: most of the time in thread management
Date: Thu, 29 Dec 2011 18:45:39 +0100

El Thu, 29 Dec 2011 01:25:36 +0100
Samuel Thibault <samuel.thibault@gnu.org> escribió:

> It might be biaised by the clock not being able to tick everywhere in
> the kernel (though I guess e.g. most of the IPC machinery is running
> at ipl0?), but I believe it's still a bit revealing: I had already
> noticed that ext2fs spends most of its time in the kernel (like 90%),
> and it here seems we're spending a lot of time just managing the
> ext2fs thread sleeps (no, there aren't many threads in that test,
> just two dozen).

It's good to see some real numbers, thanks Samuel.

In fact, what for other systems is a (relatively) simple and
straightforward operation (like read()/write()), for us it's (too
much?) complex and could potentially involve several context changes.

> while true; do rm -f blop ; \cp -f blip blop ; done

In this example, "blip" is cached on the first run, so the rest of
io_read requests can be solved without blocking. But with "blop", for
each io_write:

  1) Eventually the thread attending the io_write reaches pager_memcpy.
  When trying to touch the first page of the destination object, it
  faults, enters the kernel, sends a m_o_data_request message and waits
  for the page fault resolution at vm_fault_continue.

  2) Other thread receives the m_o_data_request, and solves it.

  3) The first thread briefly continues, sends a m_o_data_unlock message
  and waits sitting again at vm_fault_continue.

  4) Other thread receives the m_o_data_unlock, and solves it.

  5) The first thread continues, returns to user space, copies the data,
  and faults in the next page, returning to 2.

  6) When there's no more data to copy, the first thread exits
  pager_memcpy and answers to its client.

But, at this point, no data has been really written to disk. So, we
also have the synchronization interval, which:

  1) Iterates over all active pagers, locks them, and requests the
  return of all its dirty pages.

  2) At the same time, a bunch of m_o_data_return messages are being
  received (at an arbitrary thread), due to the actions in 1). Those
  messages require to lock (again) the pager, and generate a lot of I/O
  (page by page).

And we also have the "rm", which terminates the object, which in turn
makes it compete with the synchronization interval at generating those

We have plenty of reasons for not being happy with this approach. In
fact, in addition of poor performance, we suffer other problems that
can be directly related with this, like thread storms, erratic
pageouts, bad cache utilization, and multiple headaches dealing with
locks :-)

In memfs (which is still WIP, but it's able to do simple operations)
I'm trying a different strategy: pagers will only be used for mmap'ed
objects, and simple read/write operations to the backing store will
solve the rest of request. This would require some kind of cache in
libstore for the rest of FS, but it's not a problem for memfs, as it
only works with memory.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]