[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 8/9] spapr: Implement processor compatibility in

From: Alexander Graf
Subject: Re: [Qemu-devel] [PATCH 8/9] spapr: Implement processor compatibility in ibm, client-architecture-support
Date: Mon, 19 May 2014 11:01:09 +0200
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.5.0

On 17.05.14 03:45, Alexey Kardashevskiy wrote:
On 05/17/2014 06:46 AM, Alexander Graf wrote:
On 16.05.14 17:57, Alexey Kardashevskiy wrote:
On 05/17/2014 12:16 AM, Alexander Graf wrote:
On 15.05.14 13:28, Alexey Kardashevskiy wrote:
Modern Linux kernels support last POWERPC CPUs so when a kernel boots,
in most cases it can find a matching cpu_spec in the kernel's cpu_specs
list. However if the kernel is quite old, it may be missing a definition
of the actual CPU. To provide an ability for old kernels to work on modern
hardware, a Processor Compatibility Mode has been introduced
by the PowerISA specification.

   From the hardware prospective, it is supported by the Processor
Compatibility Register (PCR) which is defined in PowerISA. The register
enables one of the compatibility modes (2.05/2.06/2.07).
Since PCR is a hypervisor privileged register and cannot be
accessed from the guest, the mode selection is done via
ibm,client-architecture-support (CAS) RTAS call using which the guest
specifies what "raw" and "architected" CPU versions it supports.
QEMU works out the best match, changes a "cpu-version" property of
every CPU and notifies the guest about the change by setting these
properties in the buffer passed as a response on a custom H_CAS hypercall.

Signed-off-by: Alexey Kardashevskiy <address@hidden>
    hw/ppc/spapr.c       | 40 +++++++++++++++++++++++++
    hw/ppc/spapr_hcall.c | 83
    trace-events         |  5 ++++
    3 files changed, 128 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a2c9106..a0882a1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -35,6 +35,7 @@
    #include "kvm_ppc.h"
    #include "mmu-hash64.h"
    #include "cpu-models.h"
+#include "qom/cpu.h"
      #include "hw/boards.h"
    #include "hw/ppc/ppc.h"
@@ -592,11 +593,50 @@ int spapr_h_cas_compose_response(target_ulong addr,
target_ulong size)
        void *fdt;
        sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 };
+    CPUState *cs;
+    int smt = kvmppc_smt_threads();
          size -= sizeof(hdr);
          fdt = g_malloc0(size);
        _FDT((fdt_create(fdt, size)));
+    _FDT((fdt_begin_node(fdt, "cpus")));
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+        DeviceClass *dc = DEVICE_GET_CLASS(cpu);
+        int smpt = spapr_get_compat_smp_threads(cpu);
+        int i, index = ppc_get_vcpu_dt_id(cpu);
+        uint32_t servers_prop[smpt];
+        uint32_t gservers_prop[smpt * 2];
+        char tmp[32];
+        if ((index % smt) != 0) {
+            continue;
+        }
+        snprintf(tmp, 32, "address@hidden", dc->fw_name, index);
+        trace_spapr_cas_add_subnode(tmp);
+        _FDT((fdt_begin_node(fdt, tmp)));
+        _FDT((fdt_property_cell(fdt, "cpu-version", cpu->cpu_version)));
+        /* Build interrupt servers and gservers properties */
+        for (i = 0; i < smpt; i++) {
+            servers_prop[i] = cpu_to_be32(index + i);
+            /* Hack, direct the group queues back to cpu 0 */
+            gservers_prop[i*2] = cpu_to_be32(index + i);
+            gservers_prop[i*2 + 1] = 0;
+        }
+        _FDT((fdt_property(fdt, "ibm,ppc-interrupt-server#s",
+                           servers_prop, sizeof(servers_prop))));
+        _FDT((fdt_property(fdt, "ibm,ppc-interrupt-gserver#s",
+                           gservers_prop, sizeof(gservers_prop))));
+        _FDT((fdt_end_node(fdt)));
+    }
+    _FDT((fdt_end_node(fdt)));
Why is this so much code? Can we only replace full nodes?
It is a diff tree, in only has properties to change.
So why do we change so much then? :)

3 (three) properties are "much" now? :)

Then please
extract the original CPU node creation into a function and just call it
from the original generation and from here.

If we can also replace single properties I'd say we only need to override
Mmm. The user could run qemu with threads=8 on POWER8 with SLES11 which
must not see 8 threads but it can update itself and reboot with the kernel
which is aware of POWER8. You are saying we do not want to support that?
First off, I think that practically speaking people will want to use
2-thread granularity on POWER8 because that gives them the best load
exploitation on the system.
Practically speaking, people will want to use kernels which can work on
POWER8 in raw mode, no? Here we are trying to fix older guests.

So if we just disable CPU nodes that would not be visible usually (there
was a disabled flag in cpu nodes, right?) for threads that are incompatible
with a new CAS mode, we can then remove the disabled flag when the guest
accepts more threads again.

That means we could expose 8 threads (POWER8), guest only supports 2
threads (POWER6), so we waste 6 threads. But the remaining 2 will be fast
;). We basically degrade the system to SMT2 mode.
We do not want to show more threads (i.e. bigger ibm,ppc-interrupt-server#s
properties) if we do not support more threads as we do not really know what
guest kernel (including AIX) or some ppc64_cpu/drmgr/whatever tool will do
if they find more.

What does pHyp do here?
I wish it was easy to verify :)

I fail to see what exactly your objections are all about, I definitely miss
something big here. Is it a minor code duplication or some dangerous bug?
If it is just a code duplication, I could do something about it.

My main concern is that this is a big chunk of magic code that swizzles around random device tree properties without an easy to follow explanation that would show people what exactly the code does.

I think you can overcome all of this by combining the function you introduce here with the one that we already have for cpu node generation. You would of course need to parameterize it.

While you're at the process of combining the functions, you will realize that some things are not exactly obvious to every reader. For example

+        if ((index % smt) != 0) {
+            continue;
+        }

is based on the fact that only cores are listed in /cpus. But the code doesn't tell you, so if anyone who is not Alexey wants to debug this code and find out what's going wrong, he has to sit down for a few minutes are read up on that fact. It'd be a lot nicer to hint him towards the correct direction. This could either be a comment or reader friendly code such as

  for_each_core(...) {

at which point the reader would immediately know what we're trying to do.

The same goes for the interrupt servers and gservers properties. If we don't explain the format at least by variable / function names in the code, you either have to know the format off the top of your head or constantly have the sPAPR spec (which is not public) lying next to you. This increases the barrier to understanding the code.

For example, if you would just extract the interrupt server generation into a function and give it parameters such as "num_threads" and give every value a good name before you assign it to the property array, the code suddenly becomes self-documenting. If you then share that function with the normal cpu node generation, anyone who fixes a bug in it also only has to look at a single spot.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]