qemu-riscv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hw/riscv: split RAM into low and high memory


From: Daniel Henrique Barboza
Subject: Re: [PATCH] hw/riscv: split RAM into low and high memory
Date: Mon, 31 Jul 2023 19:46:37 -0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0



On 7/30/23 22:53, Fei Wu wrote:
riscv virt platform's memory started at 0x80000000 and
straddled the 4GiB boundary. Curiously enough, this choice
of a memory layout will prevent from launching a VM with
a bit more than 2000MiB and PCIe pass-thru on an x86 host, due
to identity mapping requirements for the MSI doorbell on x86,
and these (APIC/IOAPIC) live right below 4GiB.

So just split the RAM range into two portions:
- 1 GiB range from 0x80000000 to 0xc0000000.
- The remainder at 0x100000000

...leaving a hole between the ranges.

I am afraid this breaks some existing distro setups, like Ubuntu. After this 
patch
this emulation stopped working:

~/work/qemu/build/qemu-system-riscv64 \
        -machine virt -nographic -m 8G -smp 8 \
        -kernel ./uboot-ubuntu/usr/lib/u-boot/qemu-riscv64_smode/uboot.elf \
        -drive file=snapshot.img,format=qcow2,if=virtio \
        -netdev bridge,id=bridge1,br=virbr0 -device 
virtio-net-pci,netdev=bridge1


This is basically a guest created via the official Canonical tutorial:

https://wiki.ubuntu.com/RISC-V/QEMU

The error being thrown:

=================

Boot HART ID              : 4
Boot HART Domain          : root
Boot HART Priv Version    : v1.12
Boot HART Base ISA        : rv64imafdch
Boot HART ISA Extensions  : time,sstc
Boot HART PMP Count       : 16
Boot HART PMP Granularity : 4
Boot HART PMP Address Bits: 54
Boot HART MHPM Count      : 16
Boot HART MIDELEG         : 0x0000000000001666
Boot HART MEDELEG         : 0x0000000000f0b509


U-Boot 2022.07+dfsg-1ubuntu4.2 (Nov 24 2022 - 18:47:41 +0000)

CPU:   
rv64imafdch_zicbom_zicboz_zicsr_zifencei_zihintpause_zawrs_zfa_zca_zcd_zba_zbb_zbc_zbs_sstc_svadu
Model: riscv-virtio,qemu
DRAM:  Unhandled exception: Store/AMO access fault
EPC: 00000000802018b8 RA: 00000000802126a0 TVAL: 00000000ff733f90

Code: b823 06b2 bc23 06b2 b023 08b2 b423 08b2 (b823 08b2)


resetting ...
System reset not supported on this platform
### ERROR ### Please RESET the board ###
QEMU 8.0.90 monitor - type 'help' for more infor
=================


Based on the change made I can make an educated guess on what is going wrong.
We have another board with a similar memory topology you're making here, the
Microchip Polarfire (microchip_pfsoc.c). We were having some problems with this
board while trying to consolidate the boot process between all boards in
hw/riscv/boot.c because of its non-continuous RAM bank. The full story can be
read in the commit message of 4b402886ac89 ("hw/riscv: change 
riscv_compute_fdt_addr()
semantics") but the short version can be seen in riscv_compute_fdt_addr()
from boot.c:

 - if ram_start is less than 3072MiB, the FDT will be  put at the lowest value
between 3072 MiB and the end of that RAM bank;

- if ram_start is higher than 3072 MiB the FDT will be put at the end of the
RAM bank.

So, after this patch, since riscv_compute_fdt_addr() is being used with the now
lower RAM bank, the fdt is being put in LOW_MEM - fdt_size for any setup that 
has
more than 1Gb RAM, and this breaks assumptions made by uboot and Ubuntu and 
possibly
others that are trying to retrieve the FDT from the gap that you created between
low and hi mem in this patch.

In fact, this same Ubuntu guest I mentioned above will boot if I put only 1 Gb 
of RAM
(-m 1Gb). If I try with -m 1.1Gb I reproduce this error. This can be a 
validation of
the guess I'm making here: Ubuntu is trying to fetch stuff (probably the fdt) 
from
the gap between the memory areas.

This change on top of this patch doesn't work either:

$ git diff
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 8fbdc7220c..dfff48d849 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -1335,9 +1335,16 @@ static void virt_machine_done(Notifier *notifier, void 
*data)
                                          kernel_start_addr, true, NULL);
     }
- fdt_load_addr = riscv_compute_fdt_addr(memmap[VIRT_DRAM].base,
+    if (machine->ram_size < memmap[VIRT_DRAM].size) {
+        fdt_load_addr = riscv_compute_fdt_addr(memmap[VIRT_DRAM].base,
                                            memmap[VIRT_DRAM].size,
                                            machine);
+    } else {
+        fdt_load_addr = riscv_compute_fdt_addr(memmap[VIRT_DRAM_HIGH].base,
+                                           memmap[VIRT_DRAM_HIGH].size,
+                                           machine);
+    }
+
This would put the fdt at the end of the HI RAM for guests with more than 1Gb 
of RAM.
This change in fact makes the situation even worse, breaking setups that were 
working
before with this patch.

There's a chance that reducing the gap between the RAM banks can make Ubuntu 
work
reliably again but now I'm a little cold feet with this change.


I think we'll need some kernel/Opensbi folks to weight in here to see if 
there's a
failsafe memory setup that won't break distros out there and allow your 
passthrough
to work.



Thanks,

Daniel


Signed-off-by: Andrei Warkentin <andrei.warkentin@intel.com>
Signed-off-by: Fei Wu <fei2.wu@intel.com>
---
  hw/riscv/virt.c         | 74 ++++++++++++++++++++++++++++++++++++-----
  include/hw/riscv/virt.h |  1 +
  2 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index d90286dc46..8fbdc7220c 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -75,7 +75,9 @@
  #error "Can't accomodate all IMSIC groups in address space"
  #endif
-static const MemMapEntry virt_memmap[] = {
+#define LOW_MEM (1 * GiB)
+
+static MemMapEntry virt_memmap[] = {
      [VIRT_DEBUG] =        {        0x0,         0x100 },
      [VIRT_MROM] =         {     0x1000,        0xf000 },
      [VIRT_TEST] =         {   0x100000,        0x1000 },
@@ -96,6 +98,7 @@ static const MemMapEntry virt_memmap[] = {
      [VIRT_PCIE_ECAM] =    { 0x30000000,    0x10000000 },
      [VIRT_PCIE_MMIO] =    { 0x40000000,    0x40000000 },
      [VIRT_DRAM] =         { 0x80000000,           0x0 },
+    [VIRT_DRAM_HIGH] =    { 0x100000000,          0x0 },
  };
/* PCIe high mmio is fixed for RV32 */
@@ -295,15 +298,12 @@ static void create_fdt_socket_cpus(RISCVVirtState *s, int 
socket,
      }
  }
-static void create_fdt_socket_memory(RISCVVirtState *s,
-                                     const MemMapEntry *memmap, int socket)
+static void create_fdt_socket_mem_range(RISCVVirtState *s, uint64_t addr,
+                                        uint64_t size, int socket)
  {
      char *mem_name;
-    uint64_t addr, size;
      MachineState *ms = MACHINE(s);
- addr = memmap[VIRT_DRAM].base + riscv_socket_mem_offset(ms, socket);
-    size = riscv_socket_mem_size(ms, socket);
      mem_name = g_strdup_printf("/memory@%lx", (long)addr);
      qemu_fdt_add_subnode(ms->fdt, mem_name);
      qemu_fdt_setprop_cells(ms->fdt, mem_name, "reg",
@@ -313,6 +313,34 @@ static void create_fdt_socket_memory(RISCVVirtState *s,
      g_free(mem_name);
  }
+static void create_fdt_socket_memory(RISCVVirtState *s,
+                                     const MemMapEntry *memmap, int socket)
+{
+    uint64_t addr, size;
+    MachineState *mc = MACHINE(s);
+    uint64_t sock_offset = riscv_socket_mem_offset(mc, socket);
+    uint64_t sock_size = riscv_socket_mem_size(mc, socket);
+
+    if (sock_offset < memmap[VIRT_DRAM].size) {
+        uint64_t low_mem_end = memmap[VIRT_DRAM].base + memmap[VIRT_DRAM].size;
+
+        addr = memmap[VIRT_DRAM].base + sock_offset;
+        size = MIN(low_mem_end - addr, sock_size);
+        create_fdt_socket_mem_range(s, addr, size, socket);
+
+        size = sock_size - size;
+        if (size > 0) {
+            create_fdt_socket_mem_range(s, memmap[VIRT_DRAM_HIGH].base,
+                                        size, socket);
+        }
+    } else {
+        addr = memmap[VIRT_DRAM_HIGH].base +
+               sock_offset - memmap[VIRT_DRAM].size;
+
+        create_fdt_socket_mem_range(s, addr, sock_size, socket);
+    }
+}
+
  static void create_fdt_socket_clint(RISCVVirtState *s,
                                      const MemMapEntry *memmap, int socket,
                                      uint32_t *intc_phandles)
@@ -1334,10 +1362,12 @@ static void virt_machine_done(Notifier *notifier, void 
*data)
static void virt_machine_init(MachineState *machine)
  {
-    const MemMapEntry *memmap = virt_memmap;
+    MemMapEntry *memmap = virt_memmap;
      RISCVVirtState *s = RISCV_VIRT_MACHINE(machine);
      MemoryRegion *system_memory = get_system_memory();
      MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
+    MemoryRegion *ram_below_4g, *ram_above_4g;
+    uint64_t ram_size_low, ram_size_high;
      char *soc_name;
      DeviceState *mmio_irqchip, *virtio_irqchip, *pcie_irqchip;
      int i, base_hartid, hart_count;
@@ -1448,6 +1478,17 @@ static void virt_machine_init(MachineState *machine)
          }
      }
+ if (machine->ram_size > LOW_MEM) {
+        ram_size_high = machine->ram_size - LOW_MEM;
+        ram_size_low = LOW_MEM;
+    } else {
+        ram_size_high = 0;
+        ram_size_low = machine->ram_size;
+    }
+
+    memmap[VIRT_DRAM].size = ram_size_low;
+    memmap[VIRT_DRAM_HIGH].size = ram_size_high;
+
      if (riscv_is_32bit(&s->soc[0])) {
  #if HOST_LONG_BITS == 64
          /* limit RAM size in a 32-bit system */
@@ -1460,7 +1501,8 @@ static void virt_machine_init(MachineState *machine)
          virt_high_pcie_memmap.size = VIRT32_HIGH_PCIE_MMIO_SIZE;
      } else {
          virt_high_pcie_memmap.size = VIRT64_HIGH_PCIE_MMIO_SIZE;
-        virt_high_pcie_memmap.base = memmap[VIRT_DRAM].base + 
machine->ram_size;
+        virt_high_pcie_memmap.base = memmap[VIRT_DRAM_HIGH].base +
+                                     memmap[VIRT_DRAM_HIGH].size;
          virt_high_pcie_memmap.base =
              ROUND_UP(virt_high_pcie_memmap.base, virt_high_pcie_memmap.size);
      }
@@ -1468,8 +1510,22 @@ static void virt_machine_init(MachineState *machine)
      s->memmap = virt_memmap;
/* register system main memory (actual RAM) */
+    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
+    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", machine->ram,
+                             0, memmap[VIRT_DRAM].size);
      memory_region_add_subregion(system_memory, memmap[VIRT_DRAM].base,
-        machine->ram);
+                                ram_below_4g);
+
+    if (memmap[VIRT_DRAM_HIGH].size) {
+        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
+        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g",
+                                 machine->ram,
+                                 memmap[VIRT_DRAM].size,
+                                 memmap[VIRT_DRAM_HIGH].size);
+        memory_region_add_subregion(system_memory,
+                                    memmap[VIRT_DRAM_HIGH].base,
+                                    ram_above_4g);
+    }
/* boot rom */
      memory_region_init_rom(mask_rom, NULL, "riscv_virt_board.mrom",
diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
index e5c474b26e..36004eb6ef 100644
--- a/include/hw/riscv/virt.h
+++ b/include/hw/riscv/virt.h
@@ -79,6 +79,7 @@ enum {
      VIRT_IMSIC_S,
      VIRT_FLASH,
      VIRT_DRAM,
+    VIRT_DRAM_HIGH,
      VIRT_PCIE_MMIO,
      VIRT_PCIE_PIO,
      VIRT_PLATFORM_BUS,



reply via email to

[Prev in Thread] Current Thread [Next in Thread]