[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 00/34] Multi Architecture System Emulation

From: Peter Crosthwaite
Subject: Re: [Qemu-devel] [RFC PATCH 00/34] Multi Architecture System Emulation
Date: Mon, 11 May 2015 01:21:01 -0700

On Mon, May 11, 2015 at 12:13 AM, Peter Maydell
<address@hidden> wrote:
> On 11 May 2015 at 07:29, Peter Crosthwaite <address@hidden> wrote:
>> This is target-multi, a system-mode build that can support multiple
>> cpu-types. Patches 1-3 are the main infrastructure. The hard part
>> is the per-target changes needed to get each arch into an includable
>> state.
> Interesting. This is something I'd thought we were still some way
> from being able to do :-)
>> The hardest part is what to do about bootloading. Currently each arch
>> has it's own architecture specific bootloading which may assume a
>> single architecture. I have applied some hacks to at least get this
>> RFC testable using a -kernel -firmware split but going forward being
>> able to associate an elf/image with a cpu explictitly needs to be
>> solved.
> My first thought would be to leave the -kernel/-firmware stuff as
> legacy (or at least with semantics defined by the board model in use)
> and have per-CPU QOM properties for setting up images for genuinely
> multi-CPU configs.


>> For the implementation of this series, the trickiest part is cpu.h
>> inclusion management. There are now more than one cpu.h's and different
>> parts of the tree need a different include scheme. target-multi defines
>> it's own cpu.h which is bare minimum defs as needed by core code only.
>> target-foo/cpu.h are mostly the same but refactored to reuse common
>> code (with target-multi/cpu-head.h). Inclusion scheme goes something like
>> this (for the multi-arch build):
>> 1: All obj-y modules include target-multi/cpu.h
>> 2: Core code includes no other cpu.h's
>> 3: target-foo/ implementation code includes target-foo/cpu.h
>> 4: System level code (e.g. mach models) can use multiple target-foo/cpu.h's
>> Point 4 means that cpu.h's needs to be refactored to be able to include one
>> after the other. The interrupts for ARM and MB needed to be renamed to avoid
>> namespace collision. A few other defs needed multiple include guards, and
>> a few defs which where only for user mode are compiled out or relocated. No
>> attempt at support for multi-arch linux-user mode (if that even makes 
>> sense?).
> I don't think it does make much sense -- our linux-user code hardwires
> a lot of ABI details like size of 'long' and struct layouts. In any
> case we should probably leave it for later.
>> The env as handle by common code now needs to architecture-agnostic. The
>> MB and ARM envs are refactored to have CPU_COMMON as the first field(s)
>> allowing QOM-style pointer casts to/from a generic env which contains only
>> CPU_COMMON. Might need to lock down some struct packing for that but it
>> works for me so far.
> Have you managed to retain the "generated code passes around a pointer
> to an env which starts with the CPU specific fields"? We have the env
> structs the layout we do because it's a performance hit if the registers
> aren't a short distance away from the pointer...

OK, I knew there had to be a reason. So I guess the simplest
alternative is pad the env out so the arch-specific env sections are
the same length followed by a CPU_COMMON. A bit of union { struct {} }
stuffs might just do the trick although there will be some earthworks
on cpu.h.

>> The helper function namespace is going to be tricky. I haven't tackled the
>> problem just yet, but looking for ideas on how we can avoid prefacing all
>> helpers with arch prefixes to avoid link-time collisions because multiple
>> arches use the same helper names.
>> A lowest common denomintor approach is taken on architecture specifics. E.g.
>> TARGET_LONG is 64-bit, and the address space sizes and NUM_MMU_MODES is set
>> to the maximum of all the supported arches.
> ...speaking of performance hits.
> I'm not sure you can do lowest-common-denominator for TARGET_PAGE_SIZE,
> incidentally. At minimum it will result in a perf hit for the CPUs with
> larger pages (because we end up taking the hugepage support paths in the
> cputlb.c code), and at worst TLB flushing in the target's helper routines
> might not take out the right pages. (I think ARM has some theoretical
> bugs here which we don't hit in practice; ARM already has to cope with
> a TARGET_PAGE_SIZE smaller than its usual pagesize, though.)

So I have gone for TARGET_PAGE_SIZE = 12 as the only initially
supported config. This will go a long way while we figure out mixing
page sizes on the core level. I chose to ignore the ARM 1k page size
thing as the code comment suggests it's a legacy thing anyway.

>> The remaining globally defined interfaces between core code and CPUs are
>> QOMified per-cpu (P2)
>> Microblaze translation needs a change pattern to allow conversion to 64-bit
>> TARGET_LONG. Uses of TCGv need to be removed and explicited to 32-bit.
> Yeah, this will be a tedious job for the other targets (I had to do it
> for ARM when I added the AArch64 support).

It's very scriptable. I had it to a point where I could use vim s//cg
mode to turn it into and interactive conversion.

>> This RFC will serve as a reference as I send bits and piece to the respective
>> maintainers (many major subsystems are patched).
>> No support for KVM, im not sure if a mix of TCG and KVM is supported even for
>> a single arch? (which would be prerequisite to MA KVM).
> You can build a single binary which supports both TCG and KVM for a
> particular architecture. You just can't swap back and forth between
> TCG and KVM at runtime. We should probably start by supporting KVM
> only on boards with a single CPU architecture. I don't think it's
> in-principle impossible to get a setup with 4 KVM CPUs and one
> TCG emulated CPUs to work, but it probably needs to wait til we've
> got multi-threaded TCG working before we even think about it.


>> Depends (not heavily) on my on-list disas QOMification. Test instructions
>> available on request. I have tested ARM & MB elfs handshaking through shared
>> memory and both printfing to the same UART (verifying system level
>> connectivity). -d in_asm works with the mix of disas arches comming out.
> Did you do any benchmarking to see whether the performance hits are
> noticeable in practice?

No, do you have any recommendations?

> Do you give each CPU its own codegen buffer? (I'm thinking that some
> of this might also be more easily done once multithreadded-TCG is
> complete, since that will properly split the datastructures.)

No, the approach taken here is everything is exactly the same as
existing SMP. My logic is we already have the core support in that
AArch64 SMP lets us runtime mix-and-match arches. E.g. there's nothing
stopping the bootloader putting one core in AA32 and the other in 64
leading to basically multi-arch. I just extend that to cross
target-foo boundaries with some code re-arrangement.


> thanks
> -- PMM

reply via email to

[Prev in Thread] Current Thread [Next in Thread]