Re: [Qemu-devel] [RFC v8 00/14] Slow-path for atomic instruction transla

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v8 00/14] Slow-path for atomic instruction transla

From:	Sergey Fedorov
Subject:	Re: [Qemu-devel] [RFC v8 00/14] Slow-path for atomic instruction translation
Date:	Thu, 9 Jun 2016 15:52:09 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0

On 09/06/16 15:35, alvise rigo wrote:
> On Thu, Jun 9, 2016 at 1:42 PM, Sergey Fedorov <address@hidden> wrote:
>> On 19/04/16 16:39, Alvise Rigo wrote:
>>> The implementation heavily uses the software TLB together with a new
>>> bitmap that has been added to the ram_list structure which flags, on a
>>> per-CPU basis, all the memory pages that are in the middle of a LoadLink
>>> (LL), StoreConditional (SC) operation.  Since all these pages can be
>>> accessed directly through the fast-path and alter a vCPU's linked value,
>>> the new bitmap has been coupled with a new TLB flag for the TLB virtual
>>> address which forces the slow-path execution for all the accesses to a
>>> page containing a linked address.
>> But I'm afraid we've got a scalability problem using software TLB engine
>> heavily. This approach relies on TLB flush of all CPUs which is not very
>> cheap operation. That is going to be even more expansive in case of
>> MTTCG as you need to exit the CPU execution loop in order to avoid
>> deadlocks.
>>
>> I see you try mitigate this issue by introducing a history of N last
>> pages touched by an exclusive access. That would work fine avoiding
>> excessive TLB flushes as long as the current working set of exclusively
>> accessed pages does not go beyond N. Once we exceed this limit we'll get
>> a global TLB flush on most LL operations. I'm afraid we can get dramatic
> Indeed, if the guest does a loop of N+1 atomic operations, at each
> iteration we will have N flushes.
>
>> performance decrease as guest code implements finer-grained locking
>> scheme. I would like to emphasise that performance can degrade sharply
>> and dramatically as soon as the limit gets exceeded. How could we tackle
>> this problem?
> In my opinion, the length of the history should not be fixed to avoid
> the drawback of above. We can make the history's length dynamic (until
> a given threshold is reached) according to the pressure of atomic
> instructions. What should remain constant is the time elapsed to make
> a cycle of the history's array. We can for instance store in the lower
> bits of the addresses in the history a sort of timestamp used to
> calculate the period and adjust accordingly the length of the history.
> What do you think?

It really depends on what algorithm we'll introduce for dynamic history
length. I'm afraid it could complicate things and introduce its own
overhead. I'm also going to look at Emilio's approach
http://thread.gmane.org/gmane.comp.emulators.qemu/335297.

Kind regards,
Sergey

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC v8 00/14] Slow-path for atomic instruction translation, Sergey Fedorov, 2016/06/09
- Re: [Qemu-devel] [RFC v8 00/14] Slow-path for atomic instruction translation, alvise rigo, 2016/06/09
  - Re: [Qemu-devel] [RFC v8 00/14] Slow-path for atomic instruction translation, Sergey Fedorov <=

Prev by Date: Re: [Qemu-devel] [PATCH 08/10] cpu: use CPUClass->parse_features() as convertor to global properties
Next by Date: Re: [Qemu-devel] [virtio-net] migration
Previous by thread: Re: [Qemu-devel] [RFC v8 00/14] Slow-path for atomic instruction translation
Next by thread: Re: [Qemu-devel] [PATCH 4/6] trace: Add per-vCPU tracing states for events with the 'vcpu' property
Index(es):
- Date
- Thread