[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Re: sparc64 lazy conditional codes evaluation
From: |
Blue Swirl |
Subject: |
[Qemu-devel] Re: sparc64 lazy conditional codes evaluation |
Date: |
Sun, 9 May 2010 23:22:55 +0300 |
On 5/8/10, Igor Kovalenko <address@hidden> wrote:
> On Thu, May 6, 2010 at 10:51 PM, Blue Swirl <address@hidden> wrote:
> > On 5/5/10, Igor Kovalenko <address@hidden> wrote:
> >> On Wed, May 5, 2010 at 12:21 AM, Blue Swirl <address@hidden> wrote:
> >> > On 5/3/10, Igor Kovalenko <address@hidden> wrote:
> >> >> On Mon, May 3, 2010 at 11:54 PM, Blue Swirl <address@hidden> wrote:
> >> >> > On 5/3/10, Igor Kovalenko <address@hidden> wrote:
> >> >> >> On Mon, May 3, 2010 at 11:24 PM, Blue Swirl <address@hidden>
> wrote:
> >> >> >> > On 5/3/10, Igor Kovalenko <address@hidden> wrote:
> >> >> >> >> Hi!
> >> >> >> >>
> >> >> >> >> There is an issue with lazy conditional codes evaluation
> where
> >> >> >> >> we return from trap handler with mismatching conditionals.
> >> >> >> >>
> >> >> >> >> I seldom reproduce it here when dragging qemu window while
> >> >> >> >> machine is working through silo initialization. I use gentoo
> minimal cd
> >> >> >> >> install-sparc64-minimal-20100322.iso but I think anything
> with silo boot
> >> >> >> >> would experience the same. Once in a while it would report
> crc error,
> >> >> >> >> unable to open cd partition or it would fail to decompress
> image.
> >> >> >> >
> >> >> >> > I think I've also seen this.
> >> >> >> >
> >> >> >> >> Pattern that fails appears to require a sequence of compare
> insn
> >> >> >> >> possibly followed by a few instructions which do not touch
> conditionals,
> >> >> >> >> then conditional branch insn. If it happens that we trap
> while processing
> >> >> >> >> conditional branch insn so it is restarted after return from
> trap then
> >> >> >> >> seldom conditional codes are calculated incorrectly.
> >> >> >> >>
> >> >> >> >> I cannot point to exact cause but it appears that after trap
> return
> >> >> >> >> we may have CC_OP and CC_SRC* mismatch somewhere,
> >> >> >> >> since adding more cond evaluation flushes over the code
> helps.
> >> >> >> >>
> >> >> >> >> We already tried doing flush more frequently and it is still
> not
> >> >> >> >> complete, so the question is how to finally do this once and
> right :)
> >> >> >> >>
> >> >> >> >> Obviously I do not get the design of lazy evaluation right,
> but
> >> >> >> >> the following list appears to be good start. Plan is to
> prepare
> >> >> >> >> a change to qemu and find a way to test it.
> >> >> >> >>
> >> >> >> >> 1. Since SPARC* is a RISC CPU it seems to be not profitable
> to
> >> >> >> >> use DisasContext->cc_op to predict if flags should be not
> evaluated
> >> >> >> >> due to overriding insn. Instead we can drop cc_op from
> disassembler
> >> >> >> >> context and simplify code to only use cc_op from env.
> >> >> >> >
> >> >> >> > Not currently, but in the future we may use that to do even
> lazier
> >> >> >> > flags computation. For example the sequence 'cmp x, y; bne
> target'
> >> >> >> > could be much more optimal by changing the branch to do the
> >> >> >> > comparison. Here's an old unfinished patch to do some of this.
> >>
> >>
> >> I wonder if it buys anything. Sparc RISC architecture means optimizing
> compiler
> >> would prevent any extra flags computation, right? So it is basically
> 1-to-1
> >> conditional computation and use. Or even worse, if we delay computation
> >> until there are two or more consumers, correct?
> >
> > For x86 target, that is the other part of savings from using lazy
> > condition computation. It's true that it will not benefit RISC targets
> > much.
> >
> > But I think you are missing the other part, where the actual flags are
> > not calculated but instead the original comparison can be used.
> >
> > For example, consider this Sparc64 code:
> > IN:
> > 0x000001fff000be58: cmp %g2, 0x51
> >
> > OUT: [size=82]
> > 0x40df16b0: mov 0x10(%r14),%rbp
> > 0x40df16b4: mov %rbp,%rbx
> > 0x40df16b7: sub $0x51,%rbx
> > 0x40df16bb: mov %rbx,%r12
> > 0x40df16be: mov %r12,0x10a60(%r14)
> > 0x40df16c5: mov %rbp,0x60(%r14)
> > 0x40df16c9: mov $0x51,%ebp
> > 0x40df16ce: mov %rbp,0x68(%r14)
> > 0x40df16d2: mov %rbx,0x70(%r14)
> > 0x40df16d6: mov $0x7,%ebp
> > 0x40df16db: mov %ebp,0x78(%r14)
> > 0x40df16df: mov $0x1fff000be5c,%rbp
> > 0x40df16e9: mov %rbp,0x48(%r14)
> > 0x40df16ed: mov $0x1fff000be60,%rbp
> > 0x40df16f7: mov %rbp,0x50(%r14)
> > 0x40df16fb: xor %eax,%eax
> > 0x40df16fd: jmpq 0xbfface
> >
> > --------------
> > IN:
> > 0x000001fff000be5c: bne 0x1fff000c268
> >
> > OUT: [size=95]
> > 0x40df1710: callq 0x518260
> > 0x40df1715: mov 0x98(%r14),%ebp
> > 0x40df171c: mov %ebp,%ebx
> > 0x40df171e: shr $0x16,%rbx
> > 0x40df1722: and $0x1,%ebx
> > 0x40df1725: xor $0x1,%rbx
> > 0x40df1729: mov %rbx,0x90(%r14)
> > 0x40df1730: mov $0x1fff000be60,%rbp
> > 0x40df173a: mov %rbp,0x48(%r14)
> > 0x40df173e: test %rbx,%rbx
> > 0x40df1741: je 0x40df175a
> > 0x40df1747: mov $0x1fff000c268,%rbp
> > 0x40df1751: mov %rbp,0x50(%r14)
> > 0x40df1755: jmpq 0x40df1768
> > 0x40df175a: mov $0x1fff000be64,%rbp
> > 0x40df1764: mov %rbp,0x50(%r14)
> > 0x40df1768: xor %eax,%eax
> > 0x40df176a: jmpq 0xbfface
> >
> > Instead of calculating any flags, we should generate for combined
> > 'cmp/btest + branch' sequences something like:
> > mov 0x10(%r14),%rbp
> > mov %rbp,%rbx
> > cmp $0x51,%rbx
> > je 0x40ac275a
> > mov $0x1fff000c268,%rbp
> > mov %rbp,0x50(%r14)
> > jmpq 0x40ac2768
> > mov $0x1fff000be64,%rbp
> > mov %rbp,0x50(%r14)
> > xor %eax,%eax
> > jmpq 0xbfface
> >
> > But I fully agree that correct code generation is the most important issue.
> >
> >> >> >> >
> >> >> >> >> Another point is that we always write to env->cc_op when
> >> >> >> >> translating *cc insns
> >> >> >> >> This should solve any issue with dc->cc_op prediction going
> >> >> >> >> out of sync with env->cc_op and cpu_cc_src*
> >> >> >> >
> >> >> >> > I think this is what is happening now.
> >> >> >> >
> >> >> >> >> 2. We must flush lazy evaluation back to CC_OP_FLAGS in a
> few cases when
> >> >> >> >> a. conditional code is required by insn (like addc, cond
> branch etc.)
> >> >> >> >> - here we can optimize by evaluating specific bits
> (carry?)
> >> >> >> >> - not sure if it works in case we have two cond
> consuming insns,
> >> >> >> >> where first needs carry another needs the rest of
> flags
> >> >> >> >
> >> >> >> > Here's another patch to optimize C flag handling. It doesn't
> pass my
> >> >> >> > tests though.
> >> >> >> >
> >> >> >> >> b. CCR is read by rdccr (helper_rdccr)
> >> >> >> >> - have to compute all flags
> >> >> >> >> c. trap occurs and we prepare trap level context (saving
> pstate)
> >> >> >> >> - have to compute all flags
> >> >> >> >> d. control goes out of tcg runtime (so gdbstub reads
> correct value from env)
> >> >> >> >> - have to compute all flags
> >> >> >> >
> >> >> >> > Fully agree.
> >> >> >>
> >> >> >>
> >> >> >> Cool
> >> >> >>
> >> >> >> Still I'd propose to kill dc->cc_op, find a reliable way to test
> it
> >> >> >> and then add it back possibly with more optimizations.
> >> >> >> I'm lost in the code up to the point where I believe we need to
> >> >> >> save/restore cc_op and cpu_cc* while switching trap levels.
> >> >> >
> >> >> > I'd think this should do the trick:
> >> >> >
> >> >> > diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
> >> >> > index b27778b..94921cd 100644
> >> >> > --- a/target-sparc/op_helper.c
> >> >> > +++ b/target-sparc/op_helper.c
> >> >> > @@ -3506,6 +3506,8 @@ void do_interrupt(CPUState *env)
> >> >> > }
> >> >> > tsptr = cpu_tsptr(env);
> >> >> >
> >> >> > + helper_compute_psr();
> >> >> > +
> >> >> > tsptr->tstate = ((uint64_t)GET_CCR(env) << 32) |
> >> >> > ((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) |
> >> >> > GET_CWP64(env);
> >> >> >
> >> >>
> >> >>
> >> >> Thanks, this change seems to work here for silo issue.
> >> >>
> >> >> Another change would be to flush for gdbstub use of GET_CCR and for
> >> >> helper_rdccr.
> >> >> I tried to embed flush into GET_CCR but the code looks ugly since we
> >> >> need to proxy a call to helper_compute_psr from gdbstub passing
> >> >> available env pointer.
> >> >>
> >> >> Not really tested with your changes, but still what is the breakage
> you see?
> >> >
> >> > Aurora 2.0
> (http://distro.ibiblio.org/pub/linux/distributions/aurora/build-2.0/sparc/iso/)
> >> > breaks.
> >> >
> >> > This is what I get with git HEAD, having pressed enter key twice:
> >> > Welcome to Aurora SPARC Linux
> >> >
> >> >
> >> >
> >> >
> >> > +--------------+ CD Found +--------------+
> >> > | |
> >> > | To begin testing the CD media before |
> >> > | installation press OK. |
> >> > | |
> >> > | Choose Skip to skip the media test |
> >> > | and start the installation. |
> >> > | |
> >> > | +----+ +------+ |
> >> > | | OK | | Skip | |
> >> > | +----+ +------+ |
> >> > | |
> >> > | |
> >> > +----------------------------------------+
> >> >
> >> >
> >> >
> >> >
> >> > <Tab>/<Alt-Tab> between elements | <Space> selects | <F12> next
> screen
> >> >
> >> > This is what I get with the C flag patch applied:
> >> > Welcome to Aurora SPARC Linux
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > +--------------+ Error +---------------+
> >> > | |
> >> > | failed to read keymap information: |
> >> > | Success |
> >> > | |
> >> > | +----+ |
> >> > | | OK | |
> >> > | +----+ |
> >> > | |
> >> > | |
> >> > +--------------------------------------+
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > <Tab>/<Alt-Tab> between elements | <Space> selects | <F12> next
> screen
> >> >
> >>
> >>
> >> I do reproduce the issue here with 0001-Convert-C-flag-input-BROKEN.patch
> >
> > The patch seems harmless, so maybe it uncovers some hideous bug.
> >
>
>
> Right, the bug is inside the gen_op_*cc helpers - we need to compute C
> using current flag source, whereas current implementation clobbers source
> then computes C.
>
> I think I now get the lazy evaluation design right :)
>
> Something along these changes to your 0001-Convert-C-flag-input-BROKEN.patch
> appears to be good for aurora 2.0 test:
>
> diff --git a/target-sparc/translate.c b/target-sparc/translate.c
> index 94c343d..ea7c71b 100644
> --- a/target-sparc/translate.c
> +++ b/target-sparc/translate.c
> @@ -334,9 +334,9 @@ static inline void gen_op_add_cc(TCGv dst, TCGv
> src1, TCGv src2)
>
> static inline void gen_op_addxi_cc(TCGv dst, TCGv src1, target_long src2)
> {
> + gen_helper_compute_C_icc(cpu_tmp0);
> tcg_gen_mov_tl(cpu_cc_src, src1);
> tcg_gen_movi_tl(cpu_cc_src2, src2);
> - gen_helper_compute_C_icc(cpu_tmp0);
> tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
> tcg_gen_addi_tl(cpu_cc_dst, cpu_cc_dst, src2);
> tcg_gen_mov_tl(dst, cpu_cc_dst);
> @@ -344,9 +344,9 @@ static inline void gen_op_addxi_cc(TCGv dst, TCGv
> src1, target_long src2)
>
> static inline void gen_op_addx_cc(TCGv dst, TCGv src1, TCGv src2)
> {
> + gen_helper_compute_C_icc(cpu_tmp0);
> tcg_gen_mov_tl(cpu_cc_src, src1);
> tcg_gen_mov_tl(cpu_cc_src2, src2);
> - gen_helper_compute_C_icc(cpu_tmp0);
> tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
> tcg_gen_add_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
> tcg_gen_mov_tl(dst, cpu_cc_dst);
> @@ -417,9 +417,9 @@ static inline void gen_op_sub_cc(TCGv dst, TCGv
> src1, TCGv src2)
>
> static inline void gen_op_subxi_cc(TCGv dst, TCGv src1, target_long src2)
> {
> + gen_helper_compute_C_icc(cpu_tmp0);
> tcg_gen_mov_tl(cpu_cc_src, src1);
> tcg_gen_movi_tl(cpu_cc_src2, src2);
> - gen_helper_compute_C_icc(cpu_tmp0);
> tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
> tcg_gen_subi_tl(cpu_cc_dst, cpu_cc_dst, src2);
> tcg_gen_mov_tl(dst, cpu_cc_dst);
> @@ -427,9 +427,9 @@ static inline void gen_op_subxi_cc(TCGv dst, TCGv
> src1, target_long src2)
>
> static inline void gen_op_subx_cc(TCGv dst, TCGv src1, TCGv src2)
> {
> + gen_helper_compute_C_icc(cpu_tmp0);
> tcg_gen_mov_tl(cpu_cc_src, src1);
> tcg_gen_mov_tl(cpu_cc_src2, src2);
> - gen_helper_compute_C_icc(cpu_tmp0);
> tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
> tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
> tcg_gen_mov_tl(dst, cpu_cc_dst);
Thanks a lot, with this patch my tests passed! I applied the combined patch.
I also did a bit of refactoring to get the original Sparc64 issue fixed.
- [Qemu-devel] sparc64 lazy conditional codes evaluation, Igor Kovalenko, 2010/05/03
- [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Blue Swirl, 2010/05/03
- [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Igor Kovalenko, 2010/05/03
- [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Blue Swirl, 2010/05/03
- [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Igor Kovalenko, 2010/05/03
- [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Blue Swirl, 2010/05/04
- [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Igor Kovalenko, 2010/05/05
- [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Blue Swirl, 2010/05/06
- [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Igor Kovalenko, 2010/05/08
- [Qemu-devel] Re: sparc64 lazy conditional codes evaluation,
Blue Swirl <=
- Re: [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Mark Cave-Ayland, 2010/05/10
- Re: [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Blue Swirl, 2010/05/10
- Re: [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Mark Cave-Ayland, 2010/05/15