qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] RFC Multi-threaded TCG design document


From: Mark Burton
Subject: Re: [Qemu-devel] RFC Multi-threaded TCG design document
Date: Wed, 17 Jun 2015 20:23:11 +0200

> On 17 Jun 2015, at 18:57, Dr. David Alan Gilbert <address@hidden> wrote:
> 
> * Alex Benn?e (address@hidden) wrote:
>> Hi,
> 
>> Shared Data Structures
>> ======================
>> 
>> Global TCG State
>> ----------------
>> 
>> We need to protect the entire code generation cycle including any post
>> generation patching of the translated code. This also implies a shared
>> translation buffer which contains code running on all cores. Any
>> execution path that comes to the main run loop will need to hold a
>> mutex for code generation. This also includes times when we need flush
>> code or jumps from the tb_cache.
>> 
>> DESIGN REQUIREMENT: Add locking around all code generation, patching
>> and jump cache modification
> 
> I don't think that you require a shared translation buffer between
> cores to do this - although it *might* be the easiest way.
> You could have a per-core translation buffer, the only requirement is
> that most invalidation operations happen on all the buffers
> (although that might depend on the emulated architecture).
> With a per-core translation buffer, each core could generate new translations
> without locking the other cores as long as no one is doing invalidations.

I agree it’s not a design requirement - however we’ve kind of gone round this 
loop in terms of getting things to work.
Fred will doubtless fill in some details, but basically it looks like making 
the TCG so you could run several in parallel is a nightmare. We seem to get 
reasonable performance having just one CPU at a time generating TBs.  At the 
same time, of course, the way Qemu is constructed there are actually several 
‘layers’ of buffer - from the CPU local ones through to the TB ‘pool’. So, 
actually, my accident or design, we benefit from a sort of caching structure. 


> 
>> Memory maps and TLBs
>> --------------------
>> 
>> The memory handling code is fairly critical to the speed of memory
>> access in the emulated system.
>> 
>>  - Memory regions (dividing up access to PIO, MMIO and RAM)
>>  - Dirty page tracking (for code gen, migration and display)
>>  - Virtual TLB (for translating guest address->real address)
>> 
>> There is a both a fast path walked by the generated code and a slow
>> path when resolution is required. When the TLB tables are updated we
>> need to ensure they are done in a safe way by bringing all executing
>> threads to a halt before making the modifications.
>> 
>> DESIGN REQUIREMENTS:
>> 
>>  - TLB Flush All/Page
>>    - can be across-CPUs
>>    - will need all other CPUs brought to a halt
>>  - TLB Update (update a CPUTLBEntry, via tlb_set_page_with_attrs)
>>    - This is a per-CPU table - by definition can't race
>>    - updated by it's own thread when the slow-path is forced
>> 
>> Emulated hardware state
>> -----------------------
>> 
>> Currently the hardware emulation has no protection against
>> multiple-accesses. However guest systems accessing emulated hardware
>> should be carrying out their own locking to prevent multiple CPUs
>> confusing the hardware. Of course there is no guarantee the there
>> couldn't be a broken guest that doesn't lock so you could get racing
>> accesses to the hardware.
>> 
>> There is the class of paravirtualized hardware (VIRTIO) that works in
>> a purely mmio mode. Often setting flags directly in guest memory as a
>> result of a guest triggered transaction.
>> 
>> DESIGN REQUIREMENTS:
>> 
>>  - Access to IO Memory should be serialised by an IOMem mutex
>>  - The mutex should be recursive (e.g. allowing pid to relock itself)
>> 
>> IO Subsystem
>> ------------
>> 
>> The I/O subsystem is heavily used by KVM and has seen a lot of
>> improvements to offload I/O tasks to dedicated IOThreads. There should
>> be no additional locking required once we reach the Block Driver.
>> 
>> DESIGN REQUIREMENTS:
>> 
>>  - The dataplane should continue to be protected by the iothread locks
> 
> Watch out for where DMA invalidates the translated code.
> 


need to check - that might be a great catch !

Cheers

Mark.

> Dave
> 
>> 
>> 
>> References
>> ==========
>> 
>> [1] 
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/Documentation/memory-barriers.txt
>> [2] http://thread.gmane.org/gmane.comp.emulators.qemu/334561
>> [3] http://thread.gmane.org/gmane.comp.emulators.qemu/335297
>> 
>> 
>> 
>> -- 
>> Alex Bennée
> --
> Dr. David Alan Gilbert / address@hidden / Manchester, UK


         +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

        +33 (0)603762104
        mark.burton




reply via email to

[Prev in Thread] Current Thread [Next in Thread]