qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: KQEMU code organization


From: Anthony Liguori
Subject: Re: [Qemu-devel] Re: KQEMU code organization
Date: Sun, 01 Jun 2008 17:58:37 -0500
User-agent: Thunderbird 2.0.0.14 (X11/20080501)

Fabrice Bellard wrote:
Anthony Liguori wrote:
[...]
FWIW, the l1_phys_map table is a current hurdle in getting performance. When we use proper accessors to access the virtio_ring, we end up taking
a significant performance hit (around 20% on iperf).  I have some simple
patches that implement a page_desc cache that cache the RAM regions in a
linear array.  That helps get most of it back.

I'd really like to remove the l1_phys_map entirely and replace it with a
sorted list of regions.  I think this would have an overall performance
improvement since its much more cache friendly.  One thing keeping this
from happening is the fact that the data structure is passed up to the
kernel for kqemu.  Eliminating that dependency would be a very good thing!

If the l1_phys_map is a performance bottleneck it means that the
internals of QEMU are not properly used. In QEMU/kqemu, it is not
accessed to do I/Os : a cache is used thru tlb_table[]. I don't see why
KVM cannot use a similar system.

This is for device emulation. KVM doesn't use l1_phys_map() for things like shadow page table accesses.

In the device emulation, we're currently using stl_phys() and friends. This goes through a full lookup in l1_phys_map.

Looking at other devices, some use phys_ram_base + PA and stl_raw() which is broken but faster. A few places call cpu_get_physical_page_desc(), then use phys_ram_base and stl_raw(). This is okay but it still requires at least one l1_phys_map lookup per operation in the device (packet receive, io notification, etc.). I don't think that's going to help much because in our fast paths, we're only doing 2 or 3 stl_phys() operations.

At least on x86, there are very few regions of RAM. That makes it very easy to cache. A TLB style cache seems wrong to me because there are so few RAM regions. I don't see a better way to do this with the existing APIs.

Regards,

Anthony Liguori

Fabrice.








reply via email to

[Prev in Thread] Current Thread [Next in Thread]