[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v7 04/11] target-mips: improve exception handlin
From: |
Pavel Dovgaluk |
Subject: |
Re: [Qemu-devel] [PATCH v7 04/11] target-mips: improve exception handling |
Date: |
Wed, 16 Sep 2015 15:10:53 +0300 |
> From: Leon Alrae [mailto:address@hidden
> On 28/08/2015 10:08, Pavel Dovgaluk wrote:
> >> From: Aurelien Jarno [mailto:address@hidden
> >> On 2015-08-13 14:12, Leon Alrae wrote:
> >>> On 10/07/2015 10:57, Pavel Dovgalyuk wrote:
> >>>> @@ -2364,14 +2363,12 @@ static void gen_st_cond (DisasContext *ctx,
> >>>> uint32_t opc, int rt,
> >>>> #if defined(TARGET_MIPS64)
> >>>> case OPC_SCD:
> >>>> case R6_OPC_SCD:
> >>>> - save_cpu_state(ctx, 1);
> >>>> op_st_scd(t1, t0, rt, ctx);
> >>>> opn = "scd";
> >>>> break;
> >>>> #endif
> >>>> case OPC_SC:
> >>>> case R6_OPC_SC:
> >>>> - save_cpu_state(ctx, 1);
> >>>> op_st_sc(t1, t0, rt, ctx);
> >>>> opn = "sc";
> >>>> break;
> >>>
> >>> Wouldn't we be better off assuming that conditional stores in linux-user
> >>> always take an exception (we generate fake EXCP_SC exception) and avoid
> >>> retranslation? After applying these changes I observed significant impact
> >>> on
> >>> performance in linux-user multithreaded apps, for instance c11-atomic-exec
> >>> test before the change took just 2 seconds to finish, whereas now more
> >>> than 30...
> >>
> >> This really show the impact of retranslation and why we should avoid
> >> it when not necessary. Coming back to the issue here, the fact that we
> >> go through retranslation is actually due to the fact that
> >> helper_raise_exception has been changed to go through retranslation.
> >>
> >> Given the code path between user-mode and softmmu is quite different,
> >> we definitely need a different code path wrt exception and retranslation
> >> for the two cases. That said if we want deterministic code execution
> >> (the original purpose of this patch), I don't see how we can do without
> >> forcing retranslation. Pavel, do you have an idea for that?
> >
> > There is only one case when we can execute without retranslation -
> > when the instruction is the last instruction in translation block.
> > Then we can setup PC and flags before this last instruction.
> > If the exception happens, we can just break the execution.
> > The drawback of this method is breaking translation blocks into
> > the smaller parts.
>
> c11-atomic-exec.4 test execution time in linux-user:
>
> * no changes:
> real 0m3.039s
> user 0m2.976s
> sys 0m1.908s
>
> * tb_lock + patch:
> real 1m1.167s
> user 0m57.240s
> sys 0m36.678s
>
> * tb_lock + patch + SC-without-retranslation:
> real 0m3.016s
> user 0m2.988s
> sys 0m1.848s
>
> I had to add tb_lock() to cpu_restore_state() in the first place, otherwise
> all of my multithreaded user mode tests crash QEMU with this patch.
>
> SC-without-retranslation (the diff below) seems to improve the situation,
> and if I understand correctly we retain deterministic code execution.
> Therefore if there are no objections I'll apply this patch + SC correction
> to mips-next.
diff below implements exactly what I meant.
Pavel Dovgalyuk
>
> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index 99b99c5..006cb96 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -2060,7 +2060,7 @@ static inline void op_st_##insn(TCGv arg1, TCGv arg2,
> int rt,
> DisasContext *ctx)
> tcg_gen_movi_tl(t0, rt | ((almask << 3) & 0x20));
> \
> tcg_gen_st_tl(t0, cpu_env, offsetof(CPUMIPSState, llreg));
> \
> tcg_gen_st_tl(arg1, cpu_env, offsetof(CPUMIPSState, llnewval));
> \
> - gen_helper_0e0i(raise_exception, EXCP_SC);
> \
> + generate_exception_end(ctx, EXCP_SC);
> \
> gen_set_label(l2);
> \
> tcg_gen_movi_tl(t0, 0);
> \
> gen_store_gpr(t0, rt);
> \