[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC PATCH] memory: Don't use memcpy for ram marked as
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [RFC PATCH] memory: Don't use memcpy for ram marked as skip_dump |
Date: |
Sat, 22 Oct 2016 05:14:21 -0400 (EDT) |
----- Original Message -----
> From: "Alex Williamson" <address@hidden>
> To: address@hidden
> Cc: address@hidden, "thorsten kohfeldt" <address@hidden>
> Sent: Friday, October 21, 2016 7:11:44 PM
> Subject: [RFC PATCH] memory: Don't use memcpy for ram marked as skip_dump
>
> With a vfio assigned device we lay down a base MemoryRegion registered
> as an IO region, giving us read & write accessors. If the region
> supports mmap, we lay down a higher priority sub-region MemoryRegion
> on top of the base layer initialized as a RAM pointer to the mmap.
> Finally, if we have any quirks for the device (ie. address ranges that
> need additional virtualization support), we put another IO sub-region
> on top of the mmap MemoryRegion. When this is flattened, we now
> potentially have sub-page mmap MemoryRegions exposed which cannot be
> directly mapped through KVM.
>
> This is as expected, but a subtle detail of this is that we end up
> with two different access mechanisms through QEMU. If we disable the
> mmap MemoryRegion, we make use of the IO MemoryRegion and service
> accesses using pread and pwrite to the vfio device file descriptor.
> If the mmap MemoryRegion is enabled and we end up in one of these
> sub-page gaps, QEMU handles the access as RAM, using memcpy to the
> mmap. Using the mmap through QEMU is a subtle difference, but it's
> fine, the problem is the memcpy. My assumption is that memcpy makes
> no guarantees about access width and potentially uses all sorts of
> optimized memory transfers that are not intended for talking to device
> MMIO. It turns out that this has been a problem for Realtek NIC
> assignment, which has such a quirk that creates a sub-page mmap
> MemoryRegion access.
>
> My proposal to fix this is to leverage the skip_dump flag that we
> already use for special handling of these device-backed MMIO ranges.
> When skip_dump is set for a MemoryRegion, we mark memory access as
> non-direct and automatically insert MemoryRegionOps with basic
> semantics to handle accesses. Note that we only enable dword
> accesses because some devices don't particularly like qword accesses
> (Realtek NICs are such a device). This actually also fixes memory
> inspection via the xp command in the QEMU monitor as well.
>
> Please comment. Is this the best way to solve this problem? Thanks
Looks good to me.
Paolo
>
> Reported-by: Thorsten Kohfeldt <address@hidden>
> Signed-off-by: Alex Williamson <address@hidden>
> ---
> include/exec/memory.h | 6 ++++--
> memory.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 48 insertions(+), 2 deletions(-)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 10d7eac..a4c3acf 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -1464,9 +1464,11 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t
> addr);
> static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
> {
> if (is_write) {
> - return memory_region_is_ram(mr) && !mr->readonly;
> + return memory_region_is_ram(mr) &&
> + !mr->readonly && !memory_region_is_skip_dump(mr);
> } else {
> - return memory_region_is_ram(mr) || memory_region_is_romd(mr);
> + return (memory_region_is_ram(mr) && !memory_region_is_skip_dump(mr))
> ||
> + memory_region_is_romd(mr);
> }
> }
>
> diff --git a/memory.c b/memory.c
> index 58f9269..7ed7ca9 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1136,6 +1136,46 @@ const MemoryRegionOps unassigned_mem_ops = {
> .endianness = DEVICE_NATIVE_ENDIAN,
> };
>
> +static uint64_t skip_dump_mem_read(void *opaque, hwaddr addr, unsigned size)
> +{
> + uint64_t val = (uint64_t)~0;
> +
> + switch (size) {
> + case 1:
> + val = *(uint8_t *)(opaque + addr);
> + break;
> + case 2:
> + val = *(uint16_t *)(opaque + addr);
> + break;
> + case 4:
> + val = *(uint32_t *)(opaque + addr);
> + break;
> + }
> +
> + return val;
> +}
> +
> +static void skip_dump_mem_write(void *opaque, hwaddr addr, uint64_t data,
> unsigned size)
> +{
> + switch (size) {
> + case 1:
> + *(uint8_t *)(opaque + addr) = (uint8_t)data;
> + break;
> + case 2:
> + *(uint16_t *)(opaque + addr) = (uint16_t)data;
> + break;
> + case 4:
> + *(uint32_t *)(opaque + addr) = (uint32_t)data;
> + break;
> + }
> +}
> +
> +const MemoryRegionOps skip_dump_mem_ops = {
> + .read = skip_dump_mem_read,
> + .write = skip_dump_mem_write,
> + .endianness = DEVICE_NATIVE_ENDIAN,
> +};
> +
> bool memory_region_access_valid(MemoryRegion *mr,
> hwaddr addr,
> unsigned size,
> @@ -1366,6 +1406,10 @@ void memory_region_init_ram_ptr(MemoryRegion *mr,
> void memory_region_set_skip_dump(MemoryRegion *mr)
> {
> mr->skip_dump = true;
> + if (mr->ram && mr->ops == &unassigned_mem_ops) {
> + mr->ops = &skip_dump_mem_ops;
> + mr->opaque = mr->ram_block->host;
> + }
> }
>
> void memory_region_init_alias(MemoryRegion *mr,
>
>