[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ldrd implementation issue?
From: |
Peter Maydell |
Subject: |
Re: ldrd implementation issue? |
Date: |
Tue, 11 Feb 2025 10:46:04 +0000 |
(added qemu-devel to the cc list)
On Mon, 10 Feb 2025 at 17:26, Stu Grossman <stu.grossman@gmail.com> wrote:
>
> I've been getting SIGBUS cores with a bunch of user apps running under
> linux 5.15 and qemu-system-aarch64. These happen to be 32 bit (T32?)
> programs.
>
> All of the cores point at the following instruction:
>
> ldrd r2,r3,[r2]
>
> where r2 points at four bytes prior to the end of a page. Eg: 0x10ffc.
>
> The bug appears to be that we get a page fault between the accesses for
> each word. Prior to the fault, the first register is updated. Later,
> after the fault is serviced, the instruction is restarted with the index
> register containing the word loaded prior to the fault, which is
> probably not the desired address, resulting in a coredump.
>
> So the conditions for the problem are:
>
> - the index register must be one of the registers being loaded.
> - the index register must point at a uint64_t that spans a page
> boundary.
>
> I understand that the uint64_t spanning a page boundary may not be
> allowed by the arm 32 bit API. However, the ldrd instruction definition
> allows for four byte alignment.
Yes, you're right; this is a longstanding bug in our LDRD implementation.
The Arm ARM defines in section G1.17.8.3 that if the execution of
an instruction generates a synhcronous Data Abort then the base
register should be restored to its original value if the aborted
instruction is a load and the list of registers to be loaded includes
the base register.
> Note that ldm may have similar issues if the index is one of the
> registers being loaded and the words cross a page boundary.
I believe in LDM we get this correct: in the do_ldm() function
we have a case for 'i == a->rn' which stashes the loaded value
into a TCG temporary, and we don't write that into the actual
register until after we've completed all the loads.
> The fix is to defer the register stores till after both words have been
> read from memory.
>
> Here is my fix:
>
> diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
> index 9ee761fc64..78ad0ed186 100644
> --- a/target/arm/tcg/translate.c
> +++ b/target/arm/tcg/translate.c
> @@ -5006,7 +5006,7 @@ static bool op_store_rr(DisasContext *s,
> arg_ldst_rr *a,
> static bool trans_LDRD_rr(DisasContext *s, arg_ldst_rr *a)
> {
> int mem_idx = get_mem_index(s);
> - TCGv_i32 addr, tmp;
> + TCGv_i32 addr, tmp1, tmp2;
>
> if (!ENABLE_ARCH_5TE) {
> return false;
> @@ -5017,15 +5017,16 @@ static bool trans_LDRD_rr(DisasContext *s,
> arg_ldst_rr *a)
> }
> addr = op_addr_rr_pre(s, a);
>
> - tmp = tcg_temp_new_i32();
> - gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
> - store_reg(s, a->rt, tmp);
> + tmp1 = tcg_temp_new_i32();
> + gen_aa32_ld_i32(s, tmp1, addr, mem_idx, MO_UL | MO_ALIGN);
>
> tcg_gen_addi_i32(addr, addr, 4);
>
> - tmp = tcg_temp_new_i32();
> - gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
> - store_reg(s, a->rt + 1, tmp);
> + tmp2 = tcg_temp_new_i32();
> + gen_aa32_ld_i32(s, tmp2, addr, mem_idx, MO_UL | MO_ALIGN);
> +
> + store_reg(s, a->rt, tmp1);
> + store_reg(s, a->rt + 1, tmp2);
>
> /* LDRD w/ base writeback is undefined if the registers overlap. */
> op_addr_rr_post(s, a, addr, -4);
> @@ -5153,19 +5154,20 @@ static bool op_store_ri(DisasContext *s,
> arg_ldst_ri *a,
> static bool op_ldrd_ri(DisasContext *s, arg_ldst_ri *a, int rt2)
> {
> int mem_idx = get_mem_index(s);
> - TCGv_i32 addr, tmp;
> + TCGv_i32 addr, tmp1, tmp2;
>
> addr = op_addr_ri_pre(s, a);
>
> - tmp = tcg_temp_new_i32();
> - gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
> - store_reg(s, a->rt, tmp);
> + tmp1 = tcg_temp_new_i32();
> + gen_aa32_ld_i32(s, tmp1, addr, mem_idx, MO_UL | MO_ALIGN);
>
> tcg_gen_addi_i32(addr, addr, 4);
>
> - tmp = tcg_temp_new_i32();
> - gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
> - store_reg(s, rt2, tmp);
> + tmp2 = tcg_temp_new_i32();
> + gen_aa32_ld_i32(s, tmp2, addr, mem_idx, MO_UL | MO_ALIGN);
> +
> + store_reg(s, a->rt, tmp1);
> + store_reg(s, rt2, tmp2);
>
> /* LDRD w/ base writeback is undefined if the registers overlap. */
> op_addr_ri_post(s, a, addr, -4);
Yes, this fix looks correct to me. Can you provide a
Signed-off-by: tag for it? We can't accept it as a patch
without that. (I can do the other administrative tidying
up of it into a commit, but the signed-off-by is what says
you have the legal right and are happy to submit it to QEMU
under our license (LGPLv2.1+ in this case)).
thanks
-- PMM