qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] AIO: Reduce number of threads for 32bit hosts


From: Alexander Graf
Subject: Re: [Qemu-devel] [PATCH] AIO: Reduce number of threads for 32bit hosts
Date: Wed, 14 Jan 2015 15:09:24 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.0

On 01/14/15 15:07, Kevin Wolf wrote:
Am 14.01.2015 um 14:49 hat Paolo Bonzini geschrieben:

On 14/01/2015 14:38, Kevin Wolf wrote:
Well, what do you want to use it for? I thought it would only be for a
one-time check where we usually end up rather than something that would
be enabled in production, but maybe I misunderstood.
No, you didn't.  Though I guess we could limit the checks to the yield
points.  If we have BDS recursion, as in the backing file case, yield
points should not be far from the deepest part of the stack.

Another possibility (which cannot be enabled in production) is to fill
the stack with a known 64-bit value, and do a binary search when the
coroutine is destroyed.
Maybe that's the easiest one, yes.

I tried gathering warning from GCC's -Wstack-usage=1023 option and the
block layer does not seem to have functions with huge stacks in the I/O
path.

So, assuming a maximum stack depth of 50 (already pretty generous since
there shouldn't be any recursive calls) a 100K stack should be pretty
much okay for coroutines and thread-pool threads.
The potential problem in the block layer is long backing file chains.
Perhaps we need to do something to solve that iteratively instead of
recursively.
Basically first read stuff from the current BDS, and then "fill in the
blanks" with a tail call on bs->backing_file?  That would be quite a
change, and we'd need a stopgap measure like Alex's patch in the meanwhile.
Basically block.c would do something like get_block_status() first and
only then call the read/write functions of the individual drivers. But
yes, that's more a theoretical consideration at this point.

I think with the 50 recursions that you calculated we should be fine in
practice for now. I would however strongly recommend finally implementing
a guard page for coroutine stacks before we make that change.

We could just write mprotect an excessively mapped page as guard page, no?

Anyway, the thread pool workers aren't affected by any of this, so they
would be the obvious first step.

Yes, ideally we would have the maximum number of threads be runtime configurable too. That way you can adjust them to your workload.


Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]