[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9 |
Date: |
Fri, 31 Mar 2017 14:14:30 +0100 |
User-agent: |
mu4e 0.9.19; emacs 25.2.12 |
Alex Bennée <address@hidden> writes:
> Pavel Dovgalyuk <address@hidden> writes:
>
>>> From: address@hidden [mailto:address@hidden
>>> Pavel Dovgalyuk <address@hidden> writes:
>>> >> From: address@hidden [mailto:mttcg-
>>> address@hidden
>>> >> Pavel Dovgalyuk <address@hidden> writes:
>>> >> >> From: address@hidden [mailto:mttcg-
>>> >> address@hidden
>>> >> >> Pavel Dovgalyuk <address@hidden> writes:
>>> <snip>
>>> >> >> > I tested on vexpress-a9 platform with Debian wheezy.
>>> >> >>
>>> >> >> Thanks for that. I now have a test case that I can reproduce failures
>>> >> >> on
>>> >> >> without needing graphics.
>>> >> >>
>>> >> >> I've been investigating if there are any problems with the timer
>>> >> >> processing now they have been moved into the TCG thread. The record
>>> >> >> stage seems to work fine but I'm having difficulty figuring out why
>>> >> >> playback freezes. It seems we get to a point where we are stuck
>>> >> >> waiting
>>> >> >> for a suspiciously exact timer deadline:
>>> >> >
>>> >> > I've encountered the following behavior at replay stage:
>>> >> > - replay takes some instructions to execute (qemu_icount += counter)
>>> >> > - virtual timer is fired
>>> >>
>>> >> This is the virtual timer based on icount not the virtual rt timer? So
>>> >
>>> > Regular virtual timer. It's value is based on icount.
>>> > virtual_rt is used for internal icount purposes.
>>>
>>> And this is where the clock warps come in? The bias brings the
>>> virtual_rt time forward because execution is waiting for some external
>>> event to fire (e.g. a timer IRQ)?
>>
>> I guess so. But bias is not updated when the vCPU works.
>> vCPU thread updates only qemu_icount which is used for virtual clock
>> calculation.
>>
>>> >> under the new scheme of being processed in the vCPU loop we should only
>>> >> fire this one once all execution is done (although you may exit the loop
>>> >> early before icount is exhausted).
>>> >
>>> > We should stop the vCPU before processing virtual timers because:
>>> > - virtual clock depends on instruction counter
>>> > - virtual timers may change virtual hardware state
>>>
>>> If we do the processing in the top of main vCPU loop this should be
>>> equivalent. The instruction counter cannot run because we haven't
>>> entered tcg_exec_cpu. We also process virtual timers in this thread
>>> outside the loop so nothing else can be poking the hardware state.
>>
>> This is how qemu worked in older version - it switched between
>> processing tasks (vCPU and timers) in one thread.
>
> The only real difference is the sequencing in the old case was protected
> by the BQL - now its my program order.
>
>> But how we'll join this behavior with the current design and MTTCG?
>
> Ignore MTTCG for now. We don't even try and run multiple-threads when
> icount is enabled. However the change in the BQL locking is what has
> triggered the failures.
>
> Anyway I think I'm getting closer to narrowing it down. On record I see
> replay_current_step jump backwards with this:
>
> /* A common event print, called when reading or saving an event */
> static void print_event(uint8_t event)
> {
> static int event_count;
> static uint64_t last_step;
> uint64_t this_step = replay_get_current_step();
>
> fprintf(stderr, "replay: event %d is %d @ step=%#" PRIx64 "\n",
> event_count, event, this_step);
>
> if (this_step < last_step) {
> fprintf(stderr,"%s: !!! step %d went backwards
> %#"PRIx64"=>%#"PRIx64"!!!\n",
> __func__, event_count, last_step, this_step);
> abort();
> }
> last_step = this_step;
> event_count++;
> }
>
> void replay_put_event(uint8_t event)
> {
> assert(event < EVENT_COUNT);
> replay_put_byte(event);
> print_event(event);
> }
>
> The jump back is fairly consistent in my case (event 66 in the vexpress
> run) but I'm fairly sure that is the bug. I can't see any reason for the
> step count to go backwards - indeed that breaks the premise of .
>
> Unfortunately when I start putting break and watchpoints in to see how
> this jump back occurs the problem goes away until I disable them. So
> this very much looks like a race condition corrupting the icount data.
So this is a symptom of cpu_get_icount_raw(void) only accounting for in
progress instructions when in the vCPU context and the fact
timers_state.qemu_icount is "in credit" while the vCPU is running.
--
Alex Bennée
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, (continued)
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Alex Bennée, 2017/03/22
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Pavel Dovgalyuk, 2017/03/29
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Alex Bennée, 2017/03/29
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Pavel Dovgalyuk, 2017/03/30
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Alex Bennée, 2017/03/30
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Pavel Dovgalyuk, 2017/03/31
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Paolo Bonzini, 2017/03/31
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Alex Bennée, 2017/03/31
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Paolo Bonzini, 2017/03/31
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Alex Bennée, 2017/03/31
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9,
Alex Bennée <=