qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: Release of COREMU, a scalable and portable full-sy


From: Jan Kiszka
Subject: Re: [Qemu-devel] Re: Release of COREMU, a scalable and portable full-system emulator
Date: Fri, 23 Jul 2010 11:47:51 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

Stefan Hajnoczi wrote:
> 2010/7/23 Alexander Graf <address@hidden>:
>> On 23.07.2010, at 09:53, Jan Kiszka wrote:
>>
>>> wang Tiger wrote:
>>>> 在 2010年7月22日 下午11:47,Stefan Hajnoczi <address@hidden> 写道:
>>>>> 2010/7/22 wang Tiger <address@hidden>:
>>>>>> In our implementation for x86_64 target, all devices except LAPIC are
>>>>>> emulated in a seperate thread. VCPUs are emulated  in other threads
>>>>>> (one thread per VCPU).
>>>>>> By observing some device drivers in linux, we have a hypothethis that
>>>>>> drivers in OS have already ensured correct synchronization on
>>>>>> concurrent hardware accesses.
>>>>> This hypothesis is too optimistic.  If hardware emulation code assumes
>>>>> it is only executed in a single-threaded fashion, but guests can
>>>>> execute it in parallel, then this opens up the possibility of race
>>>>> conditions that malicious guests can exploit.  There needs to be
>>>>> isolation: a guest should not be able to cause QEMU to crash.
>>>> In our prototype, we assume the guest behaves correctly. If hardware
>>>> emulation code can ensure atomic access(behave like real hardware),
>>>> VCPUS can access device freely.  We actually refine some hardward
>>>> emulation code (eg. BMDMA, IOAPIC ) to ensure the atomicity of
>>>> hardware access.
>>> This approach is surely helpful for a prototype to explore the limits.
>>> But it's not applicable to production systems. It would create a huge
>>> source of potential subtle regressions for other guest OSes,
>>> specifically those that you cannot analyze regarding synchronized
>>> hardware access. We must play safe.
>>>
>>> That's why we currently have the global mutex. Its conversion can only
>>> happen step-wise, e.g. by establishing an infrastructure to declare the
>>> need of device models for that Big Lock. Then you can start converting
>>> individual models to private locks or even smart lock-less patterns.
>> But isn't that independent from making TCG atomic capable and parallel? At 
>> that point a TCG vCPU would have the exact same issues and interfaces as a 
>> KVM vCPU, right? And then we can tackle the concurrent device access issues 
>> together.
> 
> An issue that might affect COREMU today is core QEMU subsystems that
> are not thread-safe and used from hardware emulation, for example:
> 
> cpu_physical_memory_read/write() to RAM will use qemu_get_ram_ptr().
> This function moves the found RAMBlock to the head of the global RAM
> blocks list in a non-atomic way.  Therefore, two unrelated hardware
> devices executing cpu_physical_memory_*() simultaneously face a race
> condition.  I have seen this happen when playing with parallel
> hardware emulation.

Those issues need to be identified and, in a first step, worked around
by holding dedicated locks or just the global mutex. Maybe the above
conflict can also directly be resolved by creating per-VCPU lookup lists
(likely more efficient than tapping on other VCPU shoes by constantly
reordering a global list). Likely a good example for a self-contained
preparatory patch.

However, getting concurrency right is tricky enough. We should really be
careful with turning to much upside down in a rush. Even if TCG may have
some deeper hooks into the device model or thread-unsafe core parts than
KVM, parallelizing it can and should remain a separate topic. And we
also have to keep an eye on performance if a bit less than 255 VCPUs
shall be emulated.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]