Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*.

From:	Mark Burton
Subject:	Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*.
Date:	Thu, 18 Dec 2014 15:20:45 +0100

> On 18/12/2014 13:24, Alexander Graf wrote:
>> That's the nice thing about transactions - they guarantee that no other
>> CPU accesses the same cache line at the same time. So you're safe
>> against other vcpus even without blocking them manually.
>> 
>> For the non-transactional implementation we probably would need an "IPI
>> others and halt them until we're done with the critical section"
>> approach. But I really wouldn't concentrate on making things fast on old
>> CPUs.
> 
> The non-transactional implementation can use softmmu to trap access to
> the page from other VCPUs.  This makes it possible to implement (at the
> cost of speed) the same semantics on all hosts.
> 
> Paolo

I believe what your describing, using transactional memory, or using softmmu 
amounts to either option 3 below or option 4.
Relying on it totally was option 4. 

Seems to me, the problem with that option is that support for some hosts will 
be a pain, and covering everything will take some time :-(

Option 3 suggests that we build a ‘slow path’ mechanism first - make sure that 
works (as a backup), and then add optimisations for specific hosts/guests 
afterwards. To me that still seems preferable?

Cheers

Mark.




> On 18 Dec 2014, at 13:24, Alexander Graf <address@hidden> wrote:
> 
> 
> 
> On 18.12.14 10:12, Mark Burton wrote:
>> 
>>> On 17 Dec 2014, at 17:39, Peter Maydell <address@hidden> wrote:
>>> 
>>> On 17 December 2014 at 16:29, Mark Burton <address@hidden> wrote:
>>>>> On 17 Dec 2014, at 17:27, Peter Maydell <address@hidden> wrote:
>>>>> I think a mutex is fine, personally -- I just don't want
>>>>> to see fifteen hand-hacked mutexes in the target-* code.
>>>>> 
>>>> 
>>>> Which would seem to favour the helper function approach?
>>>> Or am I missing something?
>>> 
>>> You need at least some support from QEMU core -- consider
>>> what happens with this patch if the ldrex takes a data
>>> abort, for instance.
>>> 
>>> And if you need the "stop all other CPUs while I do this”
>> 
>> It looks like a corner case, but working this through - the ’simple’ put a 
>> mutex around the atomic instructions approach would indeed need to ensure 
>> that no other core was doing anything - that just happens to be true for 
>> qemu today (or - we would have to put a mutex around all writes); in order 
>> to ensure the case where a store exclusive could potential fail if a 
>> non-atomic instruction wrote (a different value) to the same address. This 
>> is currently guarantee by the implementation in Qemu - how useful it is I 
>> dont know, but if we break it, we run the risk that something will fail (at 
>> the least, we could not claim to have kept things the same).
>> 
>> This also has implications for the idea of adding TCG ops I think...
>> The ideal scenario is that we could ‘fallback’ on the same semantics that 
>> are there today - allowing specific target/host combinations to be optimised 
>> (and to improve their functionality). 
>> But that means, from within the TCG Op, we would need to have a mechanism, 
>> to cause other TCG’s to take an exit…. etc etc… In the end, I’m sure it’s 
>> possible, but it feels so awkward.
> 
> That's the nice thing about transactions - they guarantee that no other
> CPU accesses the same cache line at the same time. So you're safe
> against other vcpus even without blocking them manually.
> 
> For the non-transactional implementation we probably would need an "IPI
> others and halt them until we're done with the critical section"
> approach. But I really wouldn't concentrate on making things fast on old
> CPUs.
> 
> Also keep in mind that for the UP case we can always omit all the magic
> - we only need to detect when we move into an SMP case (linux-user clone
> or -smp on system).
> 
>> 
>> To re-cap where we are (for my own benefit if nobody else):
>> We have several propositions in terms of implementing Atomic instructions
>> 
>> 1/ We wrap the atomic instructions in a mutex using helper functions (this 
>> is the approach others have taken, it’s simple, but it is not clean, as 
>> stated above).
> 
> This is horrible. Imagine you have this split approach with a load
> exclusive and then store whereas the load starts mutex usage and the
> store stop is. At that point if the store creates a segfault you'll be
> left with a dangling mutex.
> 
> This stuff really belongs into the TCG core.
> 
>> 
>> 1.5/ We add a mechanism to ensure that when the mutex is taken, all other 
>> cores are ‘stopped’.
>> 
>> 2/ We add some TCG ops to effectively do the same thing, but this would give 
>> us the benefit of being able to provide better implementations. This is 
>> attractive, but we would end up needing ops to cover at least exclusive 
>> load/store and atomic compare exchange. To me this looks less than elegant 
>> (being pulled close to the target, rather than being able to generalise), 
>> but it’s not clear how we would implement the operations as we would like, 
>> with a machine instruction, unless we did split them out along these lines. 
>> This approach also (probably) requires the 1.5 mechanism above.
> 
> I'm still in favor of just forcing the semantics of transactions onto
> this. If the host doesn't implement transactions, tough luck - do the
> "halt all others" IPI.
> 
>> 
>> 3/ We have discussed a ‘h/w’ approach to the problem. In this case, all 
>> atomic instructions are forced to take the slow path - and a additional 
>> flags are added to the memory API. We then deal with the issue closer to the 
>> memory where we can record who has a lock on a memory address. For this to 
>> work - we would also either
>> a) need to add a mprotect type approach to ensure no ‘non atomic’ writes 
>> occur - or
>> b) need to force all cores to mark the page with the exclusive memory as IO 
>> or similar to ensure that all write accesses followed the slow path.
>> 
>> 4/ There is an option to implement exclusive operations within the TCG using 
>> mprotect (and signal handlers). I have some concerns on this : would we need 
>> have to have support for each host O/S…. I also think we might end up the a 
>> lot of protected regions causing a lot of SIGSEGV’s because an errant guest 
>> doesn’t behave well - basically we will need to see the impact on 
>> performance - finally - this will be really painful to deal with for cases 
>> where the exclusive memory is held in what Qemu considers IO space !!!
>>      In other words - putting the mprotect inside TCG looks to me like it’s 
>> mutually exclusive to supporting a memory-based scheme like (3).
> 
> Again, I don't think it's worth caring about legacy host systems too
> much. In a few years from now transactional memory will be commodity,
> just like KVM is today.
> 
> 
> Alex
> 
>> My personal preference is for 3b) it  is “safe” - its where the hardware is.
>> 3a is an optimization of that.
>> to me, (2) is an optimisation again. We are effectively saying, if you are 
>> able to do this directly, then you dont need to pass via the slow path. 
>> Otherwise, you always have the option of reverting to the slow path.
>> 
>> Frankly - 1 and 1.5 are hacks - they are not optimisations, they are just 
>> dirty hacks. However - their saving grace is that they are hacks that exist 
>> and “work”. I dislike patching the hack, but it did seem to offer the 
>> fastest solution to get around this problem - at least for now. I am no 
>> longer convinced.
>> 
>> 4/ is something I’d like other peoples views on too… Is it a better 
>> approach? What about the slow path?
>> 
>> I increasingly begin to feel that we should really approach this from the 
>> other end, and provide a ‘correct’ solution using the memory - then worry 
>> about making that faster…
>> 
>> Cheers
>> 
>> Mark.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> semantics linux-user currently uses then that definitely needs
>>> core code support. (Maybe linux-user is being over-zealous
>>> there; I haven't thought about it.)
>>> 
>>> -- PMM
>> 
>> 
>>       +44 (0)20 7100 3485 x 210
>> +33 (0)5 33 52 01 77x 210
>> 
>>      +33 (0)603762104
>>      mark.burton
>> 


         +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

        +33 (0)603762104
        mark.burton
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., (continued)
Prev by Date: [Qemu-devel] Qemu on the Android platform
Next by Date: Re: [Qemu-devel] [PATCH v6 1/4] qapi: Comment version info in TransactionAction
Previous by thread: Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*.
Next by thread: Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*.
Index(es):
- Date
- Thread