qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 00/15] vDPA shadow virtqueue


From: Eugenio Perez Martin
Subject: Re: [PATCH v5 00/15] vDPA shadow virtqueue
Date: Tue, 8 Mar 2022 14:56:36 +0100

On Tue, Mar 8, 2022 at 1:17 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 12:37:33PM +0100, Eugenio Perez Martin wrote:
> > On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > > > > Not by itself but I'm not sure we can guarantee guest will not
> > > > > attempt to use the IOVA addresses we are reserving down
> > > > > the road.
> > > >
> > > > The IOVA is allocated via the listeners and stored in the iova tree
> > > > per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> > > > virtio core see GPA to HVA mapping. And we do a reverse lookup to find
> > > > the HVA->IOVA we allocated previously.  So we have double check here:
> > > >
> > > > 1) Qemu memory core to make sure the GPA that guest uses is valid
> > > > 2) the IOVA tree that guarantees there will be no HVA beyond what
> > > > guest can see is used
> > > >
> > > > So technically, there's no way for the guest to use the IOVA address
> > > > allocated for the shadow virtqueue.
> > > >
> > > > Thanks
> > >
> > > I mean, IOVA is programmed in the host hardware to translate to HPA, 
> > > right?
> > >
> >
> > Yes, that's right if the device uses physical maps. Also to note, SVQ
> > vring is allocated in multiples of host huge pages to avoid garbage or
> > unintended access from the device.
> >
> > If a vdpa device uses physical addresses, kernel vdpa will pin qemu
> > memory first and then will send IOVA to HPA translation to hardware.
> > But this IOVA space is not controlled by the guest, but by SVQ. If a
> > guest's virtqueue buffer cannot be translated first to GPA, it will
> > not be forwarded.
> >
> > Thanks!
>
> Right. So if guests send a buffer where buffer address overlaps the
> range we used for the SVQ, then I think at the moment guest won't work.
>

I'm going to dissect a few cases so we're able to sync where the POV
differs. Letting out vIOMMU for simplicity.

If qemu uses an emulated device, it reads VirtQueue and translates
addresses from GPA to HVA via virtqueue_pop. If the guest places an
address out of GPA, dma_memory_map returns an error ("virtio: bogus
descriptor or out of resources").

It doesn't make sense to say "the buffer address overlaps with qemu
memory" here since the conversion function is not defined for all GPA.
If the range is not in GPA, it's a bogus descriptor.

Now we use a vdpa device that uses physical mapping and we start qemu
with no svq. When qemu starts it  maps IOVA == GPA to HVA. When the
vdpa kernel receives the mapping, it pins the HVA memory, obtaining
HPA, and sends the memory map IOVA == GPA to HPA mappings to the
hardware. This is supported here.

If we add SVQ, the IOVA is not GPA anymore. GPA chunks are mapped to
IOVA, and SVQ is mapped too to IOVA, they don't overlap so the device
can access them both. When the memory listener tells vdpa that a new
chunk of memory is added, the code of SVQ does not care about GPA: It
allocates a free region of IOVA for the HVA region of the guest's
memory. GPA to HVA is already tracked and translated by virtqueue_pop.

Let's use example numbers:
- SVQ occupies HVA [0xa000, 0xb000). It's the first one to call
iova_tree_alloc_map, so it's mapped that IOVA [0,0x1000) translates to
[0xa000, 0xb000).
- The memory listener now reports GPA from [0, 0x1000), translated to
HVA [0x8000, 0x9000). The new call to iova_tree_alloc_map assigns the
IOVA [0x1000, 0x2000) to HVA [0x8000, 0x9000).

Then that IOVA tree is sent to the device. From the kernel POV is the
same: It gets HVA addresses, pins them, and configures the hardware so
it can translate IOVA (!= GPA) to HPA.

SVQ now reads descriptors from the guest using virtqueue_pop, so SVQ
as it's caller does not use GPA to address them but HVA. If the
guest's vring descriptor is outside of GPA [0, 0x1000), it's an error
as in emulated device. After that, it translates HVA to IOVA with the
iova-tree. The result must be within [0x1000, 0x2000).

So guests should not be able to write qemu's memory outside the
guest's memory unless it hits a bug either in SVQ code or in qemu's
Virtqueue/DMA system.

Let me know if this makes sense to you.

Thanks!




reply via email to

[Prev in Thread] Current Thread [Next in Thread]