qemu-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-commits] [qemu/qemu] ae4e46: KVM: Dynamic sized kvm memslots array


From: Christian Schoenebeck
Subject: [Qemu-commits] [qemu/qemu] ae4e46: KVM: Dynamic sized kvm memslots array
Date: Fri, 08 Nov 2024 09:44:04 -0800

  Branch: refs/heads/staging-7.2
  Home:   https://github.com/qemu/qemu
  Commit: ae4e46cd20f39e8cbd9d83785434a881f582efca
      
https://github.com/qemu/qemu/commit/ae4e46cd20f39e8cbd9d83785434a881f582efca
  Author: Peter Xu <peterx@redhat.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M accel/kvm/kvm-all.c
    M accel/kvm/trace-events
    M include/sysemu/kvm_int.h

  Log Message:
  -----------
  KVM: Dynamic sized kvm memslots array

Zhiyi reported an infinite loop issue in VFIO use case.  The cause of that
was a separate discussion, however during that I found a regression of
dirty sync slowness when profiling.

Each KVMMemoryListerner maintains an array of kvm memslots.  Currently it's
statically allocated to be the max supported by the kernel.  However after
Linux commit 4fc096a99e ("KVM: Raise the maximum number of user memslots"),
the max supported memslots reported now grows to some number large enough
so that it may not be wise to always statically allocate with the max
reported.

What's worse, QEMU kvm code still walks all the allocated memslots entries
to do any form of lookups.  It can drastically slow down all memslot
operations because each of such loop can run over 32K times on the new
kernels.

Fix this issue by making the memslots to be allocated dynamically.

Here the initial size was set to 16 because it should cover the basic VM
usages, so that the hope is the majority VM use case may not even need to
grow at all (e.g. if one starts a VM with ./qemu-system-x86_64 by default
it'll consume 9 memslots), however not too large to waste memory.

There can also be even better way to address this, but so far this is the
simplest and should be already better even than before we grow the max
supported memslots.  For example, in the case of above issue when VFIO was
attached on a 32GB system, there are only ~10 memslots used.  So it could
be good enough as of now.

In the above VFIO context, measurement shows that the precopy dirty sync
shrinked from ~86ms to ~3ms after this patch applied.  It should also apply
to any KVM enabled VM even without VFIO.

NOTE: we don't have a FIXES tag for this patch because there's no real
commit that regressed this in QEMU. Such behavior existed for a long time,
but only start to be a problem when the kernel reports very large
nr_slots_max value.  However that's pretty common now (the kernel change
was merged in 2021) so we attached cc:stable because we'll want this change
to be backported to stable branches.

Cc: qemu-stable <qemu-stable@nongnu.org>
Reported-by: Zhiyi Guo <zhguo@redhat.com>
Tested-by: Zhiyi Guo <zhguo@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240917163835.194664-2-peterx@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5504a8126115d173687b37e657312a8ffe29fc0c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: context fixup in accel/kvm/kvm-all.c and accel/kvm/trace-events;
 also remove now-unused local variable `KVMState *s` in 
kvm-all.c:kvm_log_sync_global() )


  Commit: fbc2a49d3cb3ab9b40b44e0276b3f602bd23146d
      
https://github.com/qemu/qemu/commit/fbc2a49d3cb3ab9b40b44e0276b3f602bd23146d
  Author: Tom Dohrmann <erbse.13@gmx.de>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M accel/kvm/kvm-all.c

  Log Message:
  -----------
  accel/kvm: check for KVM_CAP_READONLY_MEM on VM

KVM_CAP_READONLY_MEM used to be a global capability, but with the
introduction of AMD SEV-SNP confidential VMs, this extension is not
always available on all VM types [1,2].

Query the extension on the VM level instead of on the KVM level.

[1] 
https://patchwork.kernel.org/project/kvm/patch/20240809190319.1710470-2-seanjc@google.com/
[2] 
https://patchwork.kernel.org/project/kvm/patch/20240902144219.3716974-1-erbse.13@gmx.de/

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
Link: https://lore.kernel.org/r/20240903062953.3926498-1-erbse.13@gmx.de
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 64e0e63ea16aa0122dc0c41a0679da0ae4616208)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: f709f03d77ea0c86e13dae539724f7ec768e7a36
      
https://github.com/qemu/qemu/commit/f709f03d77ea0c86e13dae539724f7ec768e7a36
  Author: Stefan Berger <stefanb@linux.ibm.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M tests/qtest/tpm-tests.c

  Log Message:
  -----------
  tests: Wait for migration completion on destination QEMU to avoid failures

Rather than waiting for the completion of migration on the source side,
wait for it on the destination QEMU side to avoid accessing the TPM TIS
memory mapped registers before QEMU could restore their state. This
error condition could be triggered on busy systems where the destination
QEMU did not have enough time to restore the TIS state while the test case
was already reading its registers. The test case was for example reading
the STS register and received an unexpected value (0xffffffff), which
lead to a segmentation fault later on due to trying to read 0xffff bytes
from the TIS into a buffer.

Cc:  <qemu-stable@nongnu.org>
Reported-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
(cherry picked from commit d9280ea3174700170d39c4cdd3f587f260757711)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 5a72b8785289c115099c20997984122b5ae0d0ca
      
https://github.com/qemu/qemu/commit/5a72b8785289c115099c20997984122b5ae0d0ca
  Author: Kevin Wolf <kwolf@redhat.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M block/raw-format.c

  Log Message:
  -----------
  raw-format: Fix error message for invalid offset/size

s->offset and s->size are only set at the end of the function and still
contain the old values when formatting the error message. Print the
parameters with the new values that we actually checked instead.

Fixes: 500e2434207d ('raw-format: Split raw_read_options()')
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20240829185527.47152-1-kwolf@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 04bbc3ee52b32ac465547bb40c1f090a1b8f315a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: b3265c7d8688866f5f2ae03a2e5c89ec5a22ce07
      
https://github.com/qemu/qemu/commit/b3265c7d8688866f5f2ae03a2e5c89ec5a22ce07
  Author: Richard Henderson <richard.henderson@linaro.org>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M tcg/tcg.c

  Log Message:
  -----------
  tcg: Reset data_gen_ptr correctly

This pointer needs to be reset after overflow just like
code_buf and code_ptr.

Cc: qemu-stable@nongnu.org
Fixes: 57a269469db ("tcg: Infrastructure for managing constant pools")
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit a7cfd751fb269de4a93bf1658cb13911c7ac77cc)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 34198e465ab46baccc7809cfbc26c9a9b5843075
      
https://github.com/qemu/qemu/commit/34198e465ab46baccc7809cfbc26c9a9b5843075
  Author: Peter Maydell <peter.maydell@linaro.org>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M target/i386/tcg/sysemu/excp_helper.c

  Log Message:
  -----------
  target/i386: Avoid unreachable variable declaration in mmu_translate()

Coverity complains (CID 1507880) that the declaration "int error_code;"
in mmu_translate() is unreachable code. Since this is only a declaration,
this isn't actually a bug, but:
 * it's a bear-trap for future changes, because if it was changed to
   include an initialization 'int error_code = foo;' then the
   initialization wouldn't actually happen (being dead code)
 * it's against our coding style, which wants declarations to be
   at the start of blocks
 * it means that anybody reading the code has to go and look up
   exactly what the C rules are for skipping over variable declarations
   using a goto

Move the declaration to the top of the function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20230406155946.3362077-1-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 987b63f24afe027a09b1c549c05a032a477f7e96)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: cherry-pick this for stable-7.2 so that the next patch applies cleanly)


  Commit: 74d8efa1487cffd57aec610c5110d0d9e2dae927
      
https://github.com/qemu/qemu/commit/74d8efa1487cffd57aec610c5110d0d9e2dae927
  Author: Alexander Graf <graf@amazon.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M target/i386/tcg/sysemu/excp_helper.c

  Log Message:
  -----------
  target/i386: Walk NPT in guest real mode

When translating virtual to physical address with a guest CPU that
supports nested paging (NPT), we need to perform every page table walk
access indirectly through the NPT, which we correctly do.

However, we treat real mode (no page table walk) special: In that case,
we currently just skip any walks and translate VA -> PA. With NPT
enabled, we also need to then perform NPT walk to do GVA -> GPA -> HPA
which we fail to do so far.

The net result of that is that TCG VMs with NPT enabled that execute
real mode code (like SeaBIOS) end up with GPA==HPA mappings which means
the guest accesses host code and data. This typically shows as failure
to boot guests.

This patch changes the page walk logic for NPT enabled guests so that we
always perform a GVA -> GPA translation and then skip any logic that
requires an actual PTE.

That way, all remaining logic to walk the NPT stays and we successfully
walk the NPT in real mode.

Cc: qemu-stable@nongnu.org
Fixes: fe441054bb3f0 ("target-i386: Add NPT support")
Signed-off-by: Alexander Graf <graf@amazon.com>
Reported-by: Eduard Vlad <evlad@amazon.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240921085712.28902-1-graf@amazon.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit b56617bbcb473c25815d1bf475e326f84563b1de)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 394cb8a76974db4145975aa075c402648d29ffd6
      
https://github.com/qemu/qemu/commit/394cb8a76974db4145975aa075c402648d29ffd6
  Author: Ilya Leoshkevich <iii@linux.ibm.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M linux-user/ppc/signal.c

  Log Message:
  -----------
  linux-user/ppc: Fix sigmask endianness issue in sigreturn

do_setcontext() copies the target sigmask without endianness handling
and then uses target_to_host_sigset_internal(), which expects a
byte-swapped one. Use target_to_host_sigset() instead.

Fixes: bcd4933a23f1 ("linux-user: ppc signal handling")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241017125811.447961-2-iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 8704132805cf7a3259d1c5a073b3c2b92afa2616)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: db3299ca1ba491211c7fedff1e86061f8d23e263
      
https://github.com/qemu/qemu/commit/db3299ca1ba491211c7fedff1e86061f8d23e263
  Author: Alex Bennée <alex.bennee@linaro.org>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M .gitlab-ci.d/check-dco.py
    M .gitlab-ci.d/check-patch.py

  Log Message:
  -----------
  gitlab: make check-[dco|patch] a little more verbose

When git fails the rather terse backtrace only indicates it failed
without some useful context. Add some to make the log a little more
useful.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20241023113406.1284676-11-alex.bennee@linaro.org>
(cherry picked from commit 97f116f9c6fd127b6ed2953993fa9fb05e82f450)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: context fix for stable-7.2)


  Commit: 4f6d0b08be69f4331f8d8d5acef757a6ce0a60fe
      
https://github.com/qemu/qemu/commit/4f6d0b08be69f4331f8d8d5acef757a6ce0a60fe
  Author: Stefan Weil <sw@weilnetz.de>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M net/colo-compare.c

  Log Message:
  -----------
  Fix calculation of minimum in colo_compare_tcp

GitHub's CodeQL reports a critical error which is fixed by using the MIN macro:

    Unsigned difference expression compared to zero

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Cc: qemu-stable@nongnu.org
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit e29bc931e1699a98959680f6776b48673825762b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 0643301b955de78f720977c0f0b3662a6d64b169
      
https://github.com/qemu/qemu/commit/0643301b955de78f720977c0f0b3662a6d64b169
  Author: Bernhard Beschow <shentey@gmail.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M net/tap-win32.c

  Log Message:
  -----------
  net/tap-win32: Fix gcc 14 format truncation errors

The patch fixes the following errors generated by GCC 14.2:

../src/net/tap-win32.c:343:19: error: '%s' directive output may be truncated 
writing up to 255 bytes into a region of size 176 [-Werror=format-truncation=]
  343 |              "%s\\%s\\Connection",
      |                   ^~
  344 |              NETWORK_CONNECTIONS_KEY, enum_name);
      |                                       ~~~~~~~~~

../src/net/tap-win32.c:341:9: note: 'snprintf' output between 92 and 347 bytes 
into a destination of size 256
  341 |         snprintf(connection_string,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
  342 |              sizeof(connection_string),
      |              ~~~~~~~~~~~~~~~~~~~~~~~~~~
  343 |              "%s\\%s\\Connection",
      |              ~~~~~~~~~~~~~~~~~~~~~
  344 |              NETWORK_CONNECTIONS_KEY, enum_name);
      |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

../src/net/tap-win32.c:242:58: error: '%s' directive output may be truncated 
writing up to 255 bytes into a region of size 178 [-Werror=format-truncation=]
  242 |         snprintf (unit_string, sizeof(unit_string), "%s\\%s",
      |                                                          ^~
  243 |                   ADAPTER_KEY, enum_name);
      |                                ~~~~~~~~~

../src/net/tap-win32.c:242:9: note: 'snprintf' output between 79 and 334 bytes 
into a destination of size 256
  242 |         snprintf (unit_string, sizeof(unit_string), "%s\\%s",
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  243 |                   ADAPTER_KEY, enum_name);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~

../src/net/tap-win32.c:620:52: error: '%s' directive output may be truncated 
writing up to 255 bytes into a region of size 245 [-Werror=format-truncation=]
  620 |     snprintf (device_path, sizeof(device_path), "%s%s%s",
      |                                                    ^~
  621 |               USERMODEDEVICEDIR,
  622 |               device_guid,
      |               ~~~~~~~~~~~
../src/net/tap-win32.c:620:5: note: 'snprintf' output between 16 and 271 bytes 
into a destination of size 256
  620 |     snprintf (device_path, sizeof(device_path), "%s%s%s",
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  621 |               USERMODEDEVICEDIR,
      |               ~~~~~~~~~~~~~~~~~~
  622 |               device_guid,
      |               ~~~~~~~~~~~~
  623 |               TAPSUFFIX);
      |               ~~~~~~~~~~

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2607
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 75fe36b4e8a994cdf9fd6eb601f49e96b1bc791d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 2d3109e239c7044eec6b65083cde987061d97526
      
https://github.com/qemu/qemu/commit/2d3109e239c7044eec6b65083cde987061d97526
  Author: Peter Maydell <peter.maydell@linaro.org>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M target/arm/internals.h

  Log Message:
  -----------
  target/arm: Don't assert in regime_is_user() for E10 mmuidx values

In regime_is_user() we assert if we're passed an ARMMMUIdx_E10_*
mmuidx value. This used to make sense because we only used this
function in ptw.c and would never use it on this kind of stage 1+2
mmuidx, only for an individual stage 1 or stage 2 mmuidx.

However, when we implemented FEAT_E0PD we added a callsite in
aa64_va_parameters(), which means this can now be called for
stage 1+2 mmuidx values if the guest sets the TCG_ELX.{E0PD0,E0PD1}
bits to enable use of the feature. This will then result in
an assertion failure later, for instance on a TLBI operation:

#6  0x00007ffff6d0e70f in g_assertion_message_expr
    (domain=0x0, file=0x55555676eeba "../../target/arm/internals.h", line=978, 
func=0x555556771d48 <__func__.5> "regime_is_user", expr=<optimised out>)
    at ../../../glib/gtestutils.c:3279
#7  0x0000555555f286d2 in regime_is_user (env=0x555557f2fe00, 
mmu_idx=ARMMMUIdx_E10_0) at ../../target/arm/internals.h:978
#8  0x0000555555f3e31c in aa64_va_parameters (env=0x555557f2fe00, 
va=18446744073709551615, mmu_idx=ARMMMUIdx_E10_0, data=true, el1_is_aa32=false)
    at ../../target/arm/helper.c:12048
#9  0x0000555555f3163b in tlbi_aa64_get_range (env=0x555557f2fe00, 
mmuidx=ARMMMUIdx_E10_0, value=106721347371041) at ../../target/arm/helper.c:5214
#10 0x0000555555f317e8 in do_rvae_write (env=0x555557f2fe00, 
value=106721347371041, idxmap=21, synced=true) at ../../target/arm/helper.c:5260
#11 0x0000555555f31925 in tlbi_aa64_rvae1is_write (env=0x555557f2fe00, 
ri=0x555557fbeae0, value=106721347371041) at ../../target/arm/helper.c:5302
#12 0x0000555556036f8f in helper_set_cp_reg64 (env=0x555557f2fe00, 
rip=0x555557fbeae0, value=106721347371041) at 
../../target/arm/tcg/op_helper.c:965

Since we do know whether these mmuidx values are for usermode
or not, we can easily make regime_is_user() handle them:
ARMMMUIdx_E10_0 is user, and the other two are not.

Cc: qemu-stable@nongnu.org
Fixes: e4c93e44ab103f ("target/arm: Implement FEAT_E0PD")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20241017172331.822587-1-peter.maydell@linaro.org
(cherry picked from commit 1505b651fdbd9af59a4a90876a62ae7ea2d4cd39)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 8844f4fb0aa2988930739b78898b8f922858ac3a
      
https://github.com/qemu/qemu/commit/8844f4fb0aa2988930739b78898b8f922858ac3a
  Author: Evgenii Prokopiev <evgenii.prokopiev@syntacore.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M target/riscv/csr.c

  Log Message:
  -----------
  target/riscv/csr.c: Fix an access to VXSAT

The register VXSAT should be RW only to the first bit.
The remaining bits should be 0.

The RISC-V Instruction Set Manual Volume I: Unprivileged Architecture

The vxsat CSR has a single read-write least-significant bit (vxsat[0])
that indicates if a fixed-point instruction has had to saturate an output
value to fit into a destination format. Bits vxsat[XLEN-1:1]
should be written as zeros.

Signed-off-by: Evgenii Prokopiev <evgenii.prokopiev@syntacore.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241002084436.89347-1-evgenii.prokopiev@syntacore.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 5a60026cad4e9dba929cab4f63229e4b9110cf0a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 74a47fbb890808f99fab0352657b74340e1e5ca1
      
https://github.com/qemu/qemu/commit/74a47fbb890808f99fab0352657b74340e1e5ca1
  Author: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M target/riscv/cpu.h

  Log Message:
  -----------
  target/riscv: Correct SXL return value for RV32 in RV64 QEMU

Ensure that riscv_cpu_sxl returns MXL_RV32 when runningRV32 in an
RV64 QEMU.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Fixes: 05e6ca5e156 ("target/riscv: Ignore reserved bits in PTE for RV64")
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20240919055048.562-4-zhiwei_liu@linux.alibaba.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 929e4277c128772bad41cc795995f754cb9991af)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 085eef5918aa9705acdf20c0d8d471aba62ad625
      
https://github.com/qemu/qemu/commit/085eef5918aa9705acdf20c0d8d471aba62ad625
  Author: Sergey Makarov <s.makarov@syntacore.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M hw/intc/sifive_plic.c

  Log Message:
  -----------
  hw/intc: Don't clear pending bits on IRQ lowering

According to PLIC specification (chapter 5), there
is only one case, when interrupt is claimed. Fix
PLIC controller to match this behavior.

Signed-off-by: Sergey Makarov <s.makarov@syntacore.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20240918140229.124329-3-s.makarov@syntacore.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit a84be2baa9eca8bc500f866ad943b8f63dc99adf)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 4f9ade22a0846232412d897b39ca42b2e75a590c
      
https://github.com/qemu/qemu/commit/4f9ade22a0846232412d897b39ca42b2e75a590c
  Author: Rob Bradford <rbradford@rivosinc.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M target/riscv/cpu.c

  Log Message:
  -----------
  target/riscv: Set vtype.vill on CPU reset

The RISC-V unprivileged specification "31.3.11. State of Vector
Extension at Reset" has a note that recommends vtype.vill be set on
reset as part of ensuring that the vector extension have a consistent
state at reset.

This change now makes QEMU consistent with Spike which sets vtype.vill
on reset.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20240930165258.72258-1-rbradford@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit f8c1f36a2e3dab4935e7c5690e578ac71765766b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: fd9d48a1111544560aeec39944b50291f949e4b7
      
https://github.com/qemu/qemu/commit/fd9d48a1111544560aeec39944b50291f949e4b7
  Author: Anup Patel <apatel@ventanamicro.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M hw/intc/riscv_aplic.c

  Log Message:
  -----------
  hw/intc/riscv_aplic: Fix in_clrip[x] read emulation

The reads to in_clrip[x] registers return rectified input values of the
interrupt sources.

A rectified input value of an interrupt source is defined by the section
"4.5.2 Source configurations (sourcecfg[1]–sourcecfg[1023])" of the RISC-V
AIA specification as:
"rectified input value = (incoming wire value) XOR (source is inverted)"

Update the riscv_aplic_read_input_word() implementation to match the above.

Fixes: e8f79343cfc8 ("hw/intc: Add RISC-V AIA APLIC device emulation")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20240306095722.463296-3-apatel@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 0678e9f29c2301d0a1afc8d01a78cdfa7ad2ddbd)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: c17bbcade84211049c9822f8be174c21eef97576
      
https://github.com/qemu/qemu/commit/c17bbcade84211049c9822f8be174c21eef97576
  Author: Yong-Xuan Wang <yongxuan.wang@sifive.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M hw/intc/riscv_aplic.c

  Log Message:
  -----------
  hw/intc/riscv_aplic: Check and update pending when write sourcecfg

The section 4.5.2 of the RISC-V AIA specification says that any write
to a sourcecfg register of an APLIC might (or might not) cause the
corresponding interrupt-pending bit to be set to one if the rectified
input value is high (= 1) under the new source mode.

If an interrupt is asserted before the driver configs its interrupt
type to APLIC, it's pending bit will not be set except a relevant
write to a setip or setipnum register. When we write the interrupt
type to sourcecfg register, if the APLIC device doesn't check
rectified input value and update the pending bit, this interrupt
might never becomes pending.

For APLIC.m, we can manully set pending by setip or setipnum
registers in driver. But for APLIC.w, the pending status totally
depends on the rectified input value, we can't control the pending
status via mmio registers. In this case, hw should check and update
pending status for us when writing sourcecfg registers.

Update QEMU emulation to handle "pre-existing" interrupts.

Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241004104649.13129-1-yongxuan.wang@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 2ae6cca1d3389801ee72fc5e58c52573218f3514)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: context fixup in hw/intc/riscv_aplic.c)


  Commit: f7c0528e0cec07c22c280733b0304d9cf65b56ce
      
https://github.com/qemu/qemu/commit/f7c0528e0cec07c22c280733b0304d9cf65b56ce
  Author: Anton Blanchard <antonb@tenstorrent.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M target/riscv/vector_helper.c

  Log Message:
  -----------
  target/riscv: Fix vcompress with rvv_ta_all_1s

vcompress packs vl or less fields into vd, so the tail starts after the
last packed field. This could be more clearly expressed in the ISA,
but for now this thread helps to explain it:

https://github.com/riscv/riscv-v-spec/issues/796

Signed-off-by: Anton Blanchard <antonb@tenstorrent.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241030043538.939712-1-antonb@tenstorrent.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit c128d39edeff337220fc536a3e935bcba01ecb49)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 3e7842e231fcfdf2b77c41f9c9e1ff6a329a8d24
      
https://github.com/qemu/qemu/commit/3e7842e231fcfdf2b77c41f9c9e1ff6a329a8d24
  Author: Ilya Leoshkevich <iii@linux.ibm.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M target/ppc/translate.c

  Log Message:
  -----------
  target/ppc: Set ctx->opcode for decode_insn32()

divdu (without a dot) sometimes updates cr0, even though it shouldn't.
The reason is that gen_op_arith_divd() checks Rc(ctx->opcode), which is
not initialized. This field is initialized only for instructions that
go through decode_legacy(), and not decodetree.

There already was a similar issue fixed in commit 86e6202a57b1
("target/ppc: Make divw[u] handler method decodetree compatible.").

It's not immediately clear what else may access the uninitialized
ctx->opcode, so instead of playing whack-a-mole and changing the check
to compute_rc0, simply initialize ctx->opcode.

Cc: qemu-stable@nongnu.org
Fixes: 99082815f17f ("target/ppc: Add infrastructure for prefixed insns")
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit c9b8a13a8841e0e23901e57e24ea98eeef16cf91)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: fb64da3c32ac33fd8a557a528fbf8e06d25a8ac8
      
https://github.com/qemu/qemu/commit/fb64da3c32ac33fd8a557a528fbf8e06d25a8ac8
  Author: Peter Maydell <peter.maydell@linaro.org>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M target/arm/vec_helper.c

  Log Message:
  -----------
  target/arm: Fix SVE SDOT/UDOT/USDOT (4-way, indexed)

Our implementation of the indexed version of SVE SDOT/UDOT/USDOT got
the calculation of the inner loop terminator wrong.  Although we
correctly account for the element size when we calculate the
terminator for the first iteration:
   intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n);
we don't do that when we move it forward after the first inner loop
completes.  The intention is that we process the vector in 128-bit
segments, which for a 64-bit element size should mean (1, 2), (3, 4),
(5, 6), etc.  This bug meant that we would iterate (1, 2), (3, 4, 5,
6), (7, 8, 9, 10) etc and apply the wrong indexed element to some of
the operations, and also index off the end of the vector.

You don't see this bug if the vector length is small enough that we
don't need to iterate the outer loop, i.e.  if it is only 128 bits,
or if it is the 64-bit special case from AA32/AA64 AdvSIMD.  If the
vector length is 256 bits then we calculate the right results for the
elements in the vector but do index off the end of the vector. Vector
lengths greater than 256 bits see wrong answers. The instructions
that produce 32-bit results behave correctly.

Fix the recalculation of 'segend' for subsequent iterations, and
restore a version of the comment that was lost in the refactor of
commit 7020ffd656a5 that explains why we only need to clamp segend to
opr_sz_n for the first iteration, not the later ones.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2595
Fixes: 7020ffd656a5 ("target/arm: Macroize helper_gvec_{s,u}dot_idx_{b,h}")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241101185544.2130972-1-peter.maydell@linaro.org
(cherry picked from commit e6b2fa1b81ac6b05c4397237c846a295a9857920)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 6ebf6c67838e739d42a07071829fd20c5ffafe05
      
https://github.com/qemu/qemu/commit/6ebf6c67838e739d42a07071829fd20c5ffafe05
  Author: Klaus Jensen <k.jensen@samsung.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M hw/nvme/ctrl.c

  Log Message:
  -----------
  hw/nvme: fix handling of over-committed queues

If a host chooses to use the SQHD "hint" in the CQE to know if there is
room in the submission queue for additional commands, it may result in a
situation where there are not enough internal resources (struct
NvmeRequest) available to process the command. For a lack of a better
term, the host may "over-commit" the device (i.e., it may have more
inflight commands than the queue size).

For example, assume a queue with N entries. The host submits N commands
and all are picked up for processing, advancing the head and emptying
the queue. Regardless of which of these N commands complete first, the
SQHD field of that CQE will indicate to the host that the queue is
empty, which allows the host to issue N commands again. However, if the
device has not posted CQEs for all the previous commands yet, the device
will have less than N resources available to process the commands, so
queue processing is suspended.

And here lies an 11 year latent bug. In the absense of any additional
tail updates on the submission queue, we never schedule the processing
bottom-half again unless we observe a head update on an associated full
completion queue. This has been sufficient to handle N-to-1 SQ/CQ setups
(in the absense of over-commit of course). Incidentially, that "kick all
associated SQs" mechanism can now be killed since we now just schedule
queue processing when we return a processing resource to a non-empty
submission queue, which happens to cover both edge cases. However, we
must retain kicking the CQ if it was previously full.

So, apparently, no previous driver tested with hw/nvme has ever used
SQHD (e.g., neither the Linux NVMe driver or SPDK uses it). But then OSv
shows up with the driver that actually does. I salute you.

Fixes: f3c507adcd7b ("NVMe: Initial commit for new storage interface")
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2388
Reported-by: Waldemar Kozaczuk <jwkozaczuk@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
(cherry picked from commit 9529aa6bb4d18763f5b4704cb4198bd25cbbee31)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 64d9173a02917aeb6289da471b203c363d836f5b
      
https://github.com/qemu/qemu/commit/64d9173a02917aeb6289da471b203c363d836f5b
  Author: Christian Schoenebeck <qemu_oss@crudebyte.com>
  Date:   2024-11-08 (Fri, 08 Nov 2024)

  Changed paths:
    M hw/9pfs/9p.c

  Log Message:
  -----------
  9pfs: fix crash on 'Treaddir' request

A bad (broken or malicious) 9p client (guest) could cause QEMU host to
crash by sending a 9p 'Treaddir' request with a numeric file ID (FID) that
was previously opened for a file instead of an expected directory:

  #0  0x0000762aff8f4919 in __GI___rewinddir (dirp=0xf) at
    ../sysdeps/unix/sysv/linux/rewinddir.c:29
  #1  0x0000557b7625fb40 in do_readdir_many (pdu=0x557bb67d2eb0,
    fidp=0x557bb67955b0, entries=0x762afe9fff58, offset=0, maxsize=131072,
    dostat=<optimized out>) at ../hw/9pfs/codir.c:101
  #2  v9fs_co_readdir_many (pdu=pdu@entry=0x557bb67d2eb0,
    fidp=fidp@entry=0x557bb67955b0, entries=entries@entry=0x762afe9fff58,
    offset=0, maxsize=131072, dostat=false) at ../hw/9pfs/codir.c:226
  #3  0x0000557b7625c1f9 in v9fs_do_readdir (pdu=0x557bb67d2eb0,
    fidp=0x557bb67955b0, offset=<optimized out>,
    max_count=<optimized out>) at ../hw/9pfs/9p.c:2488
  #4  v9fs_readdir (opaque=0x557bb67d2eb0) at ../hw/9pfs/9p.c:2602

That's because V9fsFidOpenState was declared as union type. So the
same memory region is used for either an open POSIX file handle (int),
or a POSIX DIR* pointer, etc., so 9p server incorrectly used the
previously opened (valid) POSIX file handle (0xf) as DIR* pointer,
eventually causing a crash in glibc's rewinddir() function.

Root cause was therefore a missing check in 9p server's 'Treaddir'
request handler, which must ensure that the client supplied FID was
really opened as directory stream before trying to access the
aforementioned union and its DIR* member.

Cc: qemu-stable@nongnu.org
Fixes: d62dbb51f7 ("virtio-9p: Add fidtype so that we can do type ...")
Reported-by: Akihiro Suda <suda.kyoto@gmail.com>
Tested-by: Akihiro Suda <suda.kyoto@gmail.com>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <E1t8GnN-002RS8-E2@kylie.crudebyte.com>
(cherry picked from commit 042b4ebfd2298ae01553844124f27d651cdb1071)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


Compare: https://github.com/qemu/qemu/compare/51c136eb0bb2...64d9173a0291

To unsubscribe from these emails, change your notification settings at 
https://github.com/qemu/qemu/settings/notifications



reply via email to

[Prev in Thread] Current Thread [Next in Thread]