[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows gu

From: Anthony Liguori
Subject: Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
Date: Thu, 25 Feb 2010 09:23:19 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20091209 Fedora/3.0-4.fc12 Lightning/1.0pre Thunderbird/3.0

On 02/25/2010 09:06 AM, Paul Brook wrote:
Idle bottom halves (i.e. qemu_bh_schedule_idle) are just bugs waiting to
happen, and should never be used for anything.
Idle bottom halves make considerable more sense than the normal bottom

The fact that rescheduling a bottom half within a bottom half results in
an infinite loop is absurd.  It is equally absurd that bottoms halves
alter the select timeout.  The result of that is that if a bottom half
schedules another bottom half, and that bottom half schedules the
previous, you get a tight infinite loop.  Since bottom halves are used
often times deep within functions, the result is very subtle infinite
loops (that we've absolutely encountered in the past).
I disagree. The "select timeout" is a completely irrelevant implementation
detail. Anything that relies on it is just plain wrong.

No, it's an extremely important detail because its the difference between an infinite loop in an idle function.

Very simply, without idle bottom halves, there's no way to implement polling with the main loop. If we dropped idle bottom halves, we would have to add explicit polling back to the main loop.

How would you implement polling?

  If you require a delay
then you should be using a timer. If scheduling a BH directly then you should
expect it to be processed without delay.

If you need something that doesn't require a delay, schedule a timer with no delay. What's the point of BHs?

It's because there's very subtle differences between BHs and timers and because we don't adjust the select timeout for the next timer deadline, there's a race between when we check for timer expiration and when we invoke select.

Actually, that's probably fixed now because we've changed the SIGALRM handler to write to a file descriptor to eliminate the signal/select race. I'll give it a try, it might actually work now.

If a BH reschedules itself (indirectly or indirectly) without useful work
occuring then you absolutely should expect an infinite loop. Rescheduling
itself after doing useful work should never cause an infinite loop. The only
way it can loop inifinitely is if we have infinite amount of work to do, in
which case you loose either way.  Looping over work via recursive BHs is
probably not the most efficient way to do things, but I guess it may sometimes
be the simplest in practice.

I think the point is getting lost. My contention is that a main loop needs three things 1) an idle callback 2) timers 3) io notification. Bottom halves act both today as a no-delay timer and an idle callback. I agree, that's unfortunate. What we should do is remove idle bottom halves, add proper idle callbacks, and make bottom halves implemented in terms of no-delay timers.

Interaction between multiple BH is slightly trickier. By my reading BH are
processed in the order they are created. It may be reasonable to guarantee
that BH are processed in the order they are scheduled. However I'm reluctant
to even go that far without a good use-case. You could probably come up with
arguments for processing them in most-recently-scheduled order.

It's no different than two timers that happen to fire at the same deadline. You can process in terms of when the timers were created initially or when they were last scheduled. Ultimately, it shouldn't matter to any code that uses them.

A main loop should have only a few characteristics.  It should enable
timeouts (based on select), it should enable fd event dispatch, and it
should allow for idle functions to be registered.  There should be no
guarantees on when idle functions are executed other than they'll
eventually be executed.
If you don't provide any guarantees, then surely processing them immediately
must be an acceptable implementation.  I don't believe there is a useful
definition of "idle".

idle is necessary to implement polling. You can argue that polling should never be necessary but practically speaking, it almost always is.

All existing uses of qemu_bh_schedule_idle are in fact poorly implemented
periodic polling. Furthermore these should not be using periodic polling, they
can and should be event driven.  They only exist because noone has bothered to
fix the old code properly. Having arbitrary 10ms latency on DMA transfers is
just plain wrong.

I agree with you here :-)

But there are more legitimate uses of polling. For instance, with virtio-net, we use a timer to implement tx mitigation. It's a huge boost for throughput but it tends to add latency on packet transmission because our notification comes at a longer time. We really just need the notification as a means to re-enable interrupts, not necessarily to slow down packet transmission.

So using an idle callback to transmit packets would certainly decrease latency in many cases while still getting the benefits of throughput improvement.

The way we use "bottom halves" today should be implemented in terms of a
relative timeout of 0 or an absolute timeout of now.  The fact that we
can't do that in our main loop is due to the very strange dependencies
deep within various devices on io dispatch ordering.  I would love to
eliminate this but I've not been able to spend any time on this recently.
I don't see how this helps. A self-triggering event with a timeout of "now" is
still an infinite loop. Any delay is a bugs in the dispatch loop. "idle" BHs
are relying on this bug.

The main point is that BHs should not be implemented in the actual main loop and that "idle" BHs are really the only type of BHs that should exist as far as the main loop is concerned. s/"idle" BHs/idle callbacks/g and I think we're at a more agreeable place.


Anthony Liguori


reply via email to

[Prev in Thread] Current Thread [Next in Thread]