qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ldrd implementation issue?


From: Peter Maydell
Subject: Re: ldrd implementation issue?
Date: Tue, 11 Feb 2025 10:46:04 +0000

(added qemu-devel to the cc list)

On Mon, 10 Feb 2025 at 17:26, Stu Grossman <stu.grossman@gmail.com> wrote:
>
> I've been getting SIGBUS cores with a bunch of user apps running under
> linux 5.15 and qemu-system-aarch64.  These happen to be 32 bit (T32?)
> programs.
>
> All of the cores point at the following instruction:
>
>         ldrd r2,r3,[r2]
>
> where r2 points at four bytes prior to the end of a page.  Eg: 0x10ffc.
>
> The bug appears to be that we get a page fault between the accesses for
> each word.  Prior to the fault, the first register is updated.  Later,
> after the fault is serviced, the instruction is restarted with the index
> register containing the word loaded prior to the fault, which is
> probably not the desired address, resulting in a coredump.
>
> So the conditions for the problem are:
>
>         - the index register must be one of the registers being loaded.
>         - the index register must point at a uint64_t that spans a page
>           boundary.
>
> I understand that the uint64_t spanning a page boundary may not be
> allowed by the arm 32 bit API.  However, the ldrd instruction definition
> allows for four byte alignment.

Yes, you're right; this is a longstanding bug in our LDRD implementation.
The Arm ARM defines in section G1.17.8.3 that if the execution of
an instruction generates a synhcronous Data Abort then the base
register should be restored to its original value if the aborted
instruction is a load and the list of registers to be loaded includes
the base register.

> Note that ldm may have similar issues if the index is one of the
> registers being loaded and the words cross a page boundary.

I believe in LDM we get this correct: in the do_ldm() function
we have a case for 'i == a->rn' which stashes the loaded value
into a TCG temporary, and we don't write that into the actual
register until after we've completed all the loads.

> The fix is to defer the register stores till after both words have been
> read from memory.
>
> Here is my fix:
>
> diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
> index 9ee761fc64..78ad0ed186 100644
> --- a/target/arm/tcg/translate.c
> +++ b/target/arm/tcg/translate.c
> @@ -5006,7 +5006,7 @@ static bool op_store_rr(DisasContext *s,
> arg_ldst_rr *a,
>   static bool trans_LDRD_rr(DisasContext *s, arg_ldst_rr *a)
>   {
>       int mem_idx = get_mem_index(s);
> -    TCGv_i32 addr, tmp;
> +    TCGv_i32 addr, tmp1, tmp2;
>
>       if (!ENABLE_ARCH_5TE) {
>           return false;
> @@ -5017,15 +5017,16 @@ static bool trans_LDRD_rr(DisasContext *s,
> arg_ldst_rr *a)
>       }
>       addr = op_addr_rr_pre(s, a);
>
> -    tmp = tcg_temp_new_i32();
> -    gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
> -    store_reg(s, a->rt, tmp);
> +    tmp1 = tcg_temp_new_i32();
> +    gen_aa32_ld_i32(s, tmp1, addr, mem_idx, MO_UL | MO_ALIGN);
>
>       tcg_gen_addi_i32(addr, addr, 4);
>
> -    tmp = tcg_temp_new_i32();
> -    gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
> -    store_reg(s, a->rt + 1, tmp);
> +    tmp2 = tcg_temp_new_i32();
> +    gen_aa32_ld_i32(s, tmp2, addr, mem_idx, MO_UL | MO_ALIGN);
> +
> +    store_reg(s, a->rt, tmp1);
> +    store_reg(s, a->rt + 1, tmp2);
>
>       /* LDRD w/ base writeback is undefined if the registers overlap.  */
>       op_addr_rr_post(s, a, addr, -4);
> @@ -5153,19 +5154,20 @@ static bool op_store_ri(DisasContext *s,
> arg_ldst_ri *a,
>   static bool op_ldrd_ri(DisasContext *s, arg_ldst_ri *a, int rt2)
>   {
>       int mem_idx = get_mem_index(s);
> -    TCGv_i32 addr, tmp;
> +    TCGv_i32 addr, tmp1, tmp2;
>
>       addr = op_addr_ri_pre(s, a);
>
> -    tmp = tcg_temp_new_i32();
> -    gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
> -    store_reg(s, a->rt, tmp);
> +    tmp1 = tcg_temp_new_i32();
> +    gen_aa32_ld_i32(s, tmp1, addr, mem_idx, MO_UL | MO_ALIGN);
>
>       tcg_gen_addi_i32(addr, addr, 4);
>
> -    tmp = tcg_temp_new_i32();
> -    gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
> -    store_reg(s, rt2, tmp);
> +    tmp2 = tcg_temp_new_i32();
> +    gen_aa32_ld_i32(s, tmp2, addr, mem_idx, MO_UL | MO_ALIGN);
> +
> +    store_reg(s, a->rt, tmp1);
> +    store_reg(s, rt2, tmp2);
>
>       /* LDRD w/ base writeback is undefined if the registers overlap.  */
>       op_addr_ri_post(s, a, addr, -4);

Yes, this fix looks correct to me. Can you provide a
Signed-off-by: tag for it? We can't accept it as a patch
without that. (I can do the other administrative tidying
up of it into a commit, but the signed-off-by is what says
you have the legal right and are happy to submit it to QEMU
under our license (LGPLv2.1+ in this case)).

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]