qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-6.2 0/2] target/sparc: Drop use of gen_io_end()


From: Peter Maydell
Subject: Re: [PATCH for-6.2 0/2] target/sparc: Drop use of gen_io_end()
Date: Sun, 25 Jul 2021 16:00:40 +0100

On Sat, 24 Jul 2021 at 21:48, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 7/24/21 10:27 AM, Peter Maydell wrote:
> > On Sat, 24 Jul 2021 at 14:49, Peter Maydell <peter.maydell@linaro.org> 
> > wrote:
> >> There is a slight difficulty here with testing this: icount
> >> doesn't seem to work for sparc Linux guests in master at the
> >> moment. For instance if you get the advent calendar image from
> >>    https://www.qemu-advent-calendar.org/2018/download/day11.tar.xz
> >> it will boot without icount with a command line like
> >>    qemu-system-sparc -display none -vga none -machine SS-20 -serial stdio 
> >> -kernel /tmp/day11/zImage.elf
> >> But if you add '-icount auto' it will get as far as
> >> "bootconsole [earlyprom0] disabled" and then apparently hang.
> >> I'm not sure what's going on here :-(
> >> (I filed this as https://gitlab.com/qemu-project/qemu/-/issues/499)
> >
> > This turns out to be a recent regression, caused by commit 78ff82bb
> > ("accel/tcg: Reduce CF_COUNT_MASK to match TCG_MAX_INSNS"). It's
> > an intermittent rather than a 100% reproducible hang.
>
> Ouch.  Ok, I'll have a look.

I did a bit more messing around with a repro case under rr,
and I think I now see why we end up hanging, although I'm not
100% sure what best to do to fix it:

 * We have a TB with 512 insns (tb->icount == 512)
 * We want to execute 511 insns (icount_decr == 511)
 * the code generated by gen_tb_start() does "subtract
   this TB's instruction count from icount_decr.u16.low, and
   if this is negative jump to the exitlabel". 511 - 512 == -1,
   so we exit the TB with status TB_EXIT_REQUESTED and without
   executing any guest insns
 * in cpu_loop_exit_tb() we look at insns_left, which is still 511,
   so we don't take the "early return because exit_request" path
 * we calculate a new insns_left = MIN(CF_COUNT_MASK, cpu->icount_budget).
   icount_budget is 59267, so the new insns_left is still 511.
 * we set icount_decr.u16.low to insns_left (ie same value as before)
 * we set cpu->icount_extra to icount_budget - insns_left, which
   is 58756
 * because icount_extra is non-zero, we don't set cflags_next_tb
   to force us to find an exactly 511 insn TB
 * so we come out to the cpu_exec() main loop, and find again
   the same 512 insn TB we started with.
 * Nothing changed from the last time we tried to execute it so
   we just go round and round in circles never making any progress...

We don't get this failure mode if CF_COUNT_MASK is larger than
TCG_MAX_INSNS, because the calculation of insns_left will
produce a larger number than TCG_MAX_INSNS, unless we really
are running out of icount budget (in which case icount_extra
should be 0 and we will force execution of that smaller TB).

So the primary bug here is that cpu_loop_exec_tb() needs updating
to follow the new logic of "allow insns_left = TCG_MAX_INSNS and
indicate that with 0 in the CF_COUNT_MASK field".

Q: in cpu_loop_exec_tb() in this calculation:

    /*
     * If the next tb has more instructions than we have left to
     * execute we need to ensure we find/generate a TB with exactly
     * insns_left instructions in it.
     */
    if (!cpu->icount_extra && insns_left > 0 && insns_left < tb->icount)  {
        cpu->cflags_next_tb = (tb->cflags & ~CF_COUNT_MASK) | insns_left;
    }

why are we testing "!cpu->icount_extra" ? The thing the comment
says we're looking for ("this TB has more insns than we have
left to execute") would be just "insns_left > 0 && insns_left < tb->icount".
And the code generated in gen_tb_start() will exit without doing anything
if insns_left < tb->icount (as described above), so we'll get into
an infinite loop pretty much any time we decide not to force execution
of a smaller TB. It's merely that we're much more likely to do so
with CF_COUNT_MASK==511, because we are accidentally very often trying
to execute 511 insns of a 512 insn TB when we could execute all 512.

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]