[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH v3 0/4] tcg: support heterogenous CPU clusters
From: |
Peter Maydell |
Subject: |
[Qemu-devel] [PATCH v3 0/4] tcg: support heterogenous CPU clusters |
Date: |
Mon, 21 Jan 2019 15:22:14 +0000 |
In the process of debugging my board code that uses this
feature, I realized I needed to support the CPU object
not being a direct child of the cluster object, which
also lead me to the object_child_foreach_recursive()
function which is a neater way to iterate children anyway.
I also figured out we could add an assert which catches
most "parenting CPU vs cluster realize ordering" bugs.
Only patch 2 has changed here, others are already reviewed.
Changes v2->v3:
* allow CPU objects to be indirect children of the cluster;
this is useful for ARMv7M, where the CPU object is a child
of the armv7m container and the board code that sets up
the cluster object only has the armv7m container object:
this is done by using object_child_foreach_recursive()
rather than an open-coded child iteration
* add an assertion that the cluster has at least one CPU,
which catches the easiest-to-make errors when creating
and populating the cluster
Changes v1->v2:
* just fixing a bug in patch 3 that meant we were
accidentally never reusing TBs. Patch 3 is the only
one that needs review.
Original cover letter included below for context.
thanks
-- PMM
Currently, TCG implicitly assumes that all CPUs are alike,
because we have a single cache of generated TBs and we
don't account for which CPU generated the code or is looking
for the TB when adding or searching for generated TBs.
This can go wrong in two situations:
(1) two CPUs have different physical address spaces
(eg CPU 1 has one lot of RAM/ROM, and CPU 2 has
different RAM/ROM): the physical address alone is
then not sufficient to distinguish what code to run
(2) two CPUs have different features (eg FPU vs no FPU):
since our TCG frontends bake assumptions into the
generated code about the presence/absence of features,
if a CPU with FPU picks up a TB for one generated
without an FPU it will behave wrongly
This is unfortunate, because we already have one board in
the tree which has a heterogenous setup: the xlnx-zcu102.
This board has 4xCortex-A53 and 2xCortex-R5F. Although
most "normal" guest code doesn't run into this, it is
possible to demonstrate the bug with it. There's a test case
at http://people.linaro.org/~peter.maydell/xlnx-het.tgz
which arranges to run a fragment of code in AArch32 which
should behave differently on the two CPUs, but does not
(either the A53 gets the behaviour the R5 should have, or
vice-versa).
This patchset adds the "cluster ID" to the set of things
we include in the TCG TB hash, which means that the two
sets of CPUs in this board can no longer accidentally
get TBs generated by the other cluster. It fixes the
bug demonstrated by the test case.
Design notes:
* Adding cluster ID to the hash is RTH's suggestion. It might
also be possible to make each cluster have a TCGContext
code generation context, and have the CPUs use the right
one for the cluster, but that would be a lot more code
rework compared to this approach.
* I put the cluster number in the existing cflags, since
we have plenty of spare bits there. If in future we
need either more than 256 clusters (unlikely) or want
to reclaim bits in the cflags field for some other
purpose we can always extend our xxhash from 28 to 32 bytes.
(I actually wrote the patch to do that before realising
I didn't need it...)
* The cluster object now fills in the CPU object's
cluster_index field when the cluster object is realized.
This imposes an ordering constraint that all CPUs must
be added to a cluster before realizing the cluster. That's
not a big deal, except that unfortunately QOM provides
no way that I could find to enforce this. If anybody
has a suggestion for a better approach that would be great.
Patch 1 therefore makes sure our only current user of the
cluster feature obeys the ordering constraint.
* Patch 4 is a tidyup to the gdbstub code which is possible
now that the CPUState struct has the cluster ID in it.
I believe that this fix is basically all we need to support
heterogenous setups (assuming of course that you just mean
"within an architecture family" like Arm, rather than
systems with say both an Arm and a Microblaze core in them).
The other things I have considered are:
* code that calls cpu_physical_memory_read() and friends,
uses address_space_memory, or otherwise assumes there's
only a single view of "CPU memory": I sent some patches
out before Christmas that fixed these in generic code
like the monitor and disassembler. There are also some
uses in target-arch or device-specific code, which I
think we can in practice ignore unless/until we come to
implement a board that is heterogenous and also uses those
devices or target architectures.
* handling of TB invalidation: I think this should Just Work,
because tb_invalidate_phys_addr() takes an AddressSpace+hwaddr,
which it converts into a ramaddr for (the badly misnamed)
tb_invalidate_phys_page_range(). So the TB caching all works
on ramaddrs, and if CPU 1 writes to some ram X that's mapped
at physaddr A for it but at physaddr B for CPU 2, we will
still correctly invalidate any TBs that CPU 2 had for code
that from its point of view lives at physaddr B.
Note that translate-all.c has a single l1_map[] structure
which will have PageDesc entries for pages for all CPUs in
the system. I believe this is OK because the only thing
we use them for is TB invalidation.
* dirty memory tracking: like TB invalidation, this is OK
because it all works either on MemoryRegions or on ramaddrs,
not on physaddrs.
Have I missed anything that we also need to fix ?
thanks
-- PMM
Peter Maydell (4):
hw/arm/xlx-zynqmp: Realize cluster after putting RPUs in it
qom/cpu: Add cluster_index to CPUState
accel/tcg: Add cluster number to TCG TB hash
gdbstub: Simplify gdb_get_cpu_pid() to use cpu->cluster_index
include/exec/exec-all.h | 4 +++-
include/hw/cpu/cluster.h | 24 ++++++++++++++++++++
include/qom/cpu.h | 7 ++++++
accel/tcg/cpu-exec.c | 3 +++
accel/tcg/translate-all.c | 3 +++
gdbstub.c | 48 ++++-----------------------------------
hw/arm/xlnx-zynqmp.c | 4 ++--
hw/cpu/cluster.c | 46 +++++++++++++++++++++++++++++++++++++
qom/cpu.c | 1 +
9 files changed, 94 insertions(+), 46 deletions(-)
--
2.20.1
- [Qemu-devel] [PATCH v3 0/4] tcg: support heterogenous CPU clusters,
Peter Maydell <=