qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 10/10] qdev: fix create in place obj's life cycl


From: Jan Kiszka
Subject: Re: [Qemu-devel] [PATCH 10/10] qdev: fix create in place obj's life cycle problem
Date: Wed, 29 Aug 2012 19:41:04 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

On 2012-08-29 19:27, Avi Kivity wrote:
> On 08/29/2012 10:21 AM, Jan Kiszka wrote:
>> On 2012-08-29 19:13, Avi Kivity wrote:
>>> On 08/27/2012 06:01 PM, Jan Kiszka wrote:
>>>> On 2012-08-27 22:53, Avi Kivity wrote:
>>>>> On 08/27/2012 12:38 PM, Jan Kiszka wrote:
>>>>>>>> Even worse, apply
>>>>>>>> restrictions on how the dispatched objects, the regions, have to be
>>>>>>>> treated because of this.
>>>>>>>
>>>>>>> Please elaborate.
>>>>>>
>>>>>> The fact that you can't manipulate a memory region object arbitrarily
>>>>>> after removing it from the mapping because you track the references to
>>>>>> the object that the region points to, not the region itself. The region
>>>>>> remains in use by the dispatching layer and potentially the called
>>>>>> device, even after deregistration.
>>>>>
>>>>> That object will be a container_of() the region, usually literally but
>>>>> sometimes only in spirit.  Reference counting the regions means they
>>>>> cannot be embedded into other objects any more.
>>>>
>>>> I cannot follow the logic of the last sentence. Reference counting just
>>>> means that we track if there are users left, not necessarily that we
>>>> perform asynchronous releases. We can simply wait for those users to
>>>> complete.
>>>
>>> I don't see how.  Suppose you add a reference count to MemoryRegion. 
>>> How do you delay its containing object's destructor from running?  Do
>>> you iterate over all member MemoryRegion and examine their reference counts?
>>
>> memory_region_transaction_commit (or calls that trigger this) will have
>> to wait. That is required as the caller may start fiddling with the
>> region right afterward.
> 
> However, it may be running within an mmio dispatch, so it may wait forever.

We won't be able to wait for devices that can acquire the same lock we
hold while waiting, of course. That's why a lock ordering device-lock ->
BQL (the one we hold over memory region changes) won't be allowed. I was
wrong on this before: We must not take course-grained locks while
holding fine-grained ones, only the other way around. Pretty obvious,
specifically for realtime / low-latency.

> 
> Ignoring this, how does it wait? sleep()? or wait for a completion?

Ideally via completion.

> 
>>>
>>> Usually a reference count controls the lifetime of the reference counted
>>> object, what are you suggesting here?
>>
>> To synchronize on reference count going to zero. 
> 
> Quite unorthodox.  Do you have real life examples?

synchronize_rcu.

> 
>> Or all readers leaving
>> the read-side critical sections.
> 
> This is rcu.  But again, we may be in an rcu read-side critical section
> while needing to wait.  In fact this was what I suggested in the first
> place, until Marcelo pointed out the deadlock, so we came up with the
> refcount.

We must avoid that deadlock scenario. I bended my brain around it, and I
think that is the only answer. We can if we apply clear rules regarding
locking and BQL-protected services to those devices that will actually
use fine-grained locking, and there only for those paths that take
per-device locks. I'm pretty sure we won't get far with an
"all-or-nothing" model where every device uses a private lock.

> 
>>
>>>
>>>>>
>>>>> We can probably figure out a way to flush out accesses.  After switching
>>>>> to rcu, for example, all we need is synchronize_rcu() in a
>>>>> non-deadlocking place.  But my bet is that it will not be needed.
>>>>
>>>> If you properly flush out accesses, you don't need to track at device
>>>> level anymore - and mess with abstraction layers. That's my whole point.
>>>
>>> To flush out an access you need either rwlock_write_lock() or
>>> synchronize_rcu() (depending on the implementation).  But neither of
>>> these can be run from an rcu read-side critical section or
>>> rwlock_read_lock().
>>>
>>> You could defer the change to a bottom half, but if the hardware demands
>>> that the change be complete before returning, that doesn't work.
>>
>> Right, therefore synchronous transactions.
> 
> I don't follow.  Synchronous transactions mean you can't
> synchronize_rcu() or upgrade the lock or wait for the refcount.  That's
> the source of the problem!

Our current device models assume synchronous transactions on the memory
layer, actually on all layers. The proposals I saw so far try to change
this. But I'm very skeptical that this would succeed, due to the
conversion effort and due to the fact that it would give us a tricky  to
use API.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



reply via email to

[Prev in Thread] Current Thread [Next in Thread]