qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level


From: Qiuhao Li
Subject: Re: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
Date: Fri, 17 Dec 2021 06:27:44 +0000

Thanks Alex. It seems this patch sets and checks if the destination device is busy. But how about the data transfers not triggered directly by PMIO/MMIO handlers? For example:

1. Device A Timer's callback -> Device A MMIO handler
2. Device A BH's callback -> Device A MMIO handler

In these situations, when A launches a DMA to itself, the dev->engaged_in_direct_io is not set, so the operation is allowed. Maybe we should log the source and check the destination when we launch data transfers. Is there a way to do that?

Below is a reproducer in NVMe which triggers DMA in a timer's callback (nvme_process_sq). I can still trigger use-after-free exception with this patch on qemu-6.1.0:

cat << EOF | ./qemu-system-x86_64 -display none -machine accel=qtest \
-machine q35 -nodefaults -drive file=null-co://,if=none,format=raw,id=disk0 \
-device nvme,drive=disk0,serial=1 -qtest stdio \

outl 0xcf8 0x80000810               /* MLBAR (BAR0) – Memory Register Base Address, lower 32-bits */
outl 0xcfc 0xe0000000               /* MMIO Base Address = 0xe0000000 */
outl 0xcf8 0x80000804               /* CMD - Command */
outw 0xcfc 0x06                     /* Bus Master Enable, Memory Space Enable */
write 0xe0000024 0x4 0x02000200     /* [3] 3.1.8, Admin Queue Attributes */
write 0xe0000028 0x4 0x00100000     /* asq = 0x1000 */
write 0xe0000030 0x4 0x00200000     /* acq = 0x2000 */
write 0xe0000014 0x4 0x01004600     /* [3] 3.1.5, Controller Configuration, start ctrl */
write 0xe0001000 0x1 0x01           /* [3] 3.1.24, SQyTDBL – Submission Queue y Tail Doorbell */
write 0x1000 0x1 0x02               /* cmd->opcode, NVME_ADM_CMD_GET_LOG_PAGE, nvme_get_log() */
write 0x1018 0x4 0x140000e0         /* prp1 = 0xe0000014, NVME_REG_CC, nvme_ctrl_reset() */
write 0x1028 0x4 0x03000004         /* cmd->cdw10, lid = 3 NVME_LOG_FW_SLOT_INFO, nvme_fw_log_info, buf_len = 4 */
write 0x1030 0x4 0xfc010000         /* cmd->cdw12 = 0x1fc, Log Page Offset, trans_len = sizeof(fw_log) - 0x1fc = 4 */
clock_step
EOF

CC: Mauro Matteo Cascella and Philippe Mathieu-Daudé. Should we put the reproducer above to https://gitlab.com/qemu-project/qemu/-/issues/556?


From: Alexander Bulekov <alxndr@bu.edu>
Sent: Friday, December 17, 2021 11:08
To: qemu-devel@nongnu.org <qemu-devel@nongnu.org>
Cc: Alexander Bulekov <alxndr@bu.edu>; Philippe Mathieu-Daudé <philmd@redhat.com>; Mauro Matteo Cascella <mcascell@redhat.com>; Qiuhao Li <Qiuhao.Li@outlook.com>; Peter Xu <peterx@redhat.com>; Jason Wang <jasowang@redhat.com>; David Hildenbrand <david@redhat.com>; Gerd Hoffmann <kraxel@redhat.com>; Peter Maydell <peter.maydell@linaro.org>; Li Qiang <liq3ea@gmail.com>; Thomas Huth <thuth@redhat.com>; Laurent Vivier <lvivier@redhat.com>; Bandan Das <bsd@redhat.com>; Edgar E . Iglesias <edgar.iglesias@gmail.com>; Darren Kenny <darren.kenny@oracle.com>; Bin Meng <bin.meng@windriver.com>; Paolo Bonzini <pbonzini@redhat.com>; Stefan Hajnoczi <stefanha@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>; Eduardo Habkost <eduardo@habkost.net>
Subject: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
 
Here's my shot at fixing dma-reentracy issues. This patch adds a flag to
the DeviceState, which is set/checked when we call an accessor
associated with the device's IO MRs.

The problem, in short, as I understand it: For the vast majority of
cases, we want to prevent a device from accessing it's own PIO/MMIO
regions over DMA.

This patch/solution is based on some assumptions:
1. DMA accesses that hit mmio regions are only dangerous if they end up
interacting with memory-regions belonging to the device initiating the
DMA.
Not dangerous:  sdhci_pio->dma_write->e1000_mmio
Dangerous:      sdhci_pio->dma_write->sdhci_mmio

2. Most devices do not interact with their own PIO/MMIO memory-regions
using DMA.

3. There is no way for there to be multiple simultaneous accesses to a
device's PIO/MMIO memory-regions.

4. All devices are QOMified :-)

With this patch, I wasn't able to reproduce the issues being tracked
here, with QTest reproducers:
https://gitlab.com/qemu-project/qemu/-/issues/556

This passes the i386 qos/qtests for me and I was able to boot some linux/windows
VMs with basic devices configured, without any apparent problems.

Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Mauro Matteo Cascella <mcascell@redhat.com>
Cc: Qiuhao Li <Qiuhao.Li@outlook.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Li Qiang <liq3ea@gmail.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Bandan Das <bsd@redhat.com>
Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Cc: Darren Kenny <darren.kenny@oracle.com>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
---
 include/hw/qdev-core.h |  1 +
 softmmu/memory.c       | 15 +++++++++++++++
 softmmu/trace-events   |  1 +
 3 files changed, 17 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 20d3066595..32f7c779ab 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -193,6 +193,7 @@ struct DeviceState {
     int instance_id_alias;
     int alias_required_for_version;
     ResettableState reset;
+    int engaged_in_direct_io;
 };
 
 struct DeviceListener {
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7340e19ff5..255c3c602f 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -532,6 +532,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
     uint64_t access_mask;
     unsigned access_size;
     unsigned i;
+    DeviceState *dev = NULL;
     MemTxResult r = MEMTX_OK;
 
     if (!access_size_min) {
@@ -541,6 +542,17 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
         access_size_max = 4;
     }
 
+    /* Do not allow more than one simultanous access to a device's IO Regions */
+    if (mr->owner &&
+        !mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
+        dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
+        if (dev->engaged_in_direct_io) {
+            trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, size);
+            return MEMTX_ERROR;
+        }
+        dev->engaged_in_direct_io = true;
+    }
+
     /* FIXME: support unaligned access? */
     access_size = MAX(MIN(size, access_size_max), access_size_min);
     access_mask = MAKE_64BIT_MASK(0, access_size * 8);
@@ -555,6 +567,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
                         access_mask, attrs);
         }
     }
+    if (dev) {
+        dev->engaged_in_direct_io = false;
+    }
     return r;
 }
 
diff --git a/softmmu/trace-events b/softmmu/trace-events
index 9c88887b3c..d7228316db 100644
--- a/softmmu/trace-events
+++ b/softmmu/trace-events
@@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, u
 memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u name '%s'"
 memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
+memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
 memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr '%s' listener '%s' synced (global=%d)"
--
2.33.0


reply via email to

[Prev in Thread] Current Thread [Next in Thread]