qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotl


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry()
Date: Tue, 6 Jun 2017 18:29:04 +0300

On Mon, Jun 05, 2017 at 11:20:13AM +0800, Peter Xu wrote:
> On Fri, Jun 02, 2017 at 05:51:07PM +0300, Michael S. Tsirkin wrote:
> > On Fri, Jun 02, 2017 at 07:50:51PM +0800, Peter Xu wrote:
> > > With the patch applied:
> > > 
> > >   [PATCH v3] exec: fix address_space_get_iotlb_entry page mask
> > >   (already in Paolo's pull request but not yet merged)
> > > 
> > > Now we can have valid address masks. However it is still not ideal,
> > > considering that the mask may not be aligned to guest page sizes. One
> > > example would be when huge page is used in guest (please see commit
> > > message in patch 1 for details). It applies to normal pages too. So we
> > > not only need a valid address mask, we should make sure it is page
> > > mask (for x86, it should be either 4K/2M/1G pages).
> > 
> > Why should we? To get better performance, right?
> 
> IMHO one point is for performance, the other point is on how we should
> define the IOTLB interface. My opinion is that it is better valid
> masks.
> 
> > 
> > > Patch 1+2 fixes the problem. Tested with both kernel net driver or
> > > testpmd, on either 4K/2M pages, to make sure the page mask is correct.
> > > 
> > > Patch 3 is cherry picked from PT series, after fixing from 1+2, we'll
> > > definitely want patch 3 now. Here's the simplest TCP streaming test
> > > using vhost dmar and iommu=pt in guest:
> > > 
> > >   without patch 3:    12.0Gbps
> > 
> > And what happens without patches 1-2?
> 
> Without 1-2, performance is good. But I think it is hacky to have such
> a good result (I explained why the performance is good in the VT-d PT
> support thread with some logs)...
> 
> > 
> > >   with patch 3:       33.5Gbps
> > 
> > This is the part I don't get. Patches 1-2 will return a bigger region to
> > callers. The result should be better performance - instead it seems to
> > slow down vhost for some reason and we need tricks to get
> > performance back. What's going on?
> 
> Yes. The problem is that if without patch 1/2 I think the codes lacks
> correctness. With correctness, we lost performance, then I picked
> patch 3 as well.
> 
> Again, I think the first thing we need to settle is what should be the
> best definition for IOTLB (addr_mask or arbitary length).
> 
> Thanks,

If arbitary length means we don't require prefaulting hacks,
I'm for using arbitary length.


> -- 
> Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]