[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Re: [PATCH 09/10] Exit loop if we have been there too long
From: |
Michael S. Tsirkin |
Subject: |
[Qemu-devel] Re: [PATCH 09/10] Exit loop if we have been there too long |
Date: |
Wed, 24 Nov 2010 13:14:42 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Wed, Nov 24, 2010 at 12:01:51PM +0100, Juan Quintela wrote:
> "Michael S. Tsirkin" <address@hidden> wrote:
> > On Wed, Nov 24, 2010 at 12:03:06AM +0100, Juan Quintela wrote:
> >> From: Juan Quintela <address@hidden>
> >>
> >> cheking each 64 pages is a random magic number as good as any other.
> >> We don't want to test too many times, but on the other hand,
> >> qemu_get_clock_ns() is not so expensive either.
> >>
> >
> > Could you please explain what's the problem this fixes?
> > I would like to see an API that documents the contract
> > we are making with the backend.
>
> buffered_file is an "abstraction" that uses a buffer.
>
> live migration code (remember it can't sleep, it runs on the main loop)
> stores its "stuff" on that buffer. And a timer writes that buffer to
> the fd that is associated with migration.
>
> This design is due to the main_loop/no threads qemu model.
>
> buffered_file timer runs each 100ms. And we "try" to measure channel
> bandwidth from there. If we are not able to run the timer, all the
> calculations are wrong, and then stalls happens.
So the problem is the timer in the buffered file abstraction?
Why don't we just flush out data if the buffer is full?
>
>
> >> @@ -269,6 +272,19 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int
> >> stage, void *opaque)
> >> if (bytes_sent == 0) { /* no more blocks */
> >> break;
> >> }
> >> + /* we want to check in the 1st loop, just in case it was the 1st time
> >> + and we had to sync the dirty bitmap.
> >> + qemu_get_clock_ns() is a bit expensive, so we only check each
> >> some
> >> + iterations
> >> + */
> >> + if ((i & 63) == 0) {
> >> + uint64_t t1 = (qemu_get_clock_ns(rt_clock) - t0) / 1000000;
> >
> > This adds even more non-determinism to savevm behaviour. If bandwidth
> > limit is higth enough, I expect it to just keep going.
>
> If we find a row of 512MB of zero pages together (and that happens if
> you have a 64GB iddle guest, then you can spent more than 3seconds to
> fill the default bandwith). After that everything that uses the main
> loop has had stalls.
>
>
> >> + if (t1 > buffered_file_interval/2) {
> >
> > arch_init should not depend on buffered_file implementation IMO.
> >
> > Also - / 2?
>
> We need to run a timer each 100ms. For times look at the 0/6 patch.
> We can't spent more that 50ms in each function. It is something that
> should happen for all funnctions called from io_handlers.
>
> >> + printf("big delay %ld milliseconds, %d iterations\n", t1,
> >> i);
> >
> > Is this a debugging aid?
>
> I left that on purpose, to show that it happens a lot. There is no
> DEBUG_ARCH or DEBUG_RAM around, I can create them if you preffer. But
> notice that this is something that shouldn't happen (but it happens).
>
> DPRINTF for that file should be a good idea, will do.
>
> >> + break;
> >> + }
> >> + }
> >> + i++;
> >> }
> >>
> >> t0 = qemu_get_clock_ns(rt_clock) - t0;
> >> --
> >> 1.7.3.2
> >>
- [Qemu-devel] Re: [PATCH 03/10] Add printf debug to savevm, (continued)
[Qemu-devel] [PATCH 06/10] Only calculate expected_time for stage 2, Juan Quintela, 2010/11/23
[Qemu-devel] [PATCH 07/10] ram_save_remaining() returns an uint64_t, Juan Quintela, 2010/11/23
- Message not available
- Message not available
[Qemu-devel] [PATCH 08/10] Count nanoseconds with uint64_t not doubles, Juan Quintela, 2010/11/23
[Qemu-devel] [PATCH 09/10] Exit loop if we have been there too long, Juan Quintela, 2010/11/23
Message not available
[Qemu-devel] Re: [PATCH 09/10] Exit loop if we have been there too long, Avi Kivity, 2010/11/30
[Qemu-devel] Re: [PATCH 09/10] Exit loop if we have been there too long, Anthony Liguori, 2010/11/30
[Qemu-devel] Re: [PATCH 09/10] Exit loop if we have been there too long, Avi Kivity, 2010/11/30