Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0

From:	Wei Wang
Subject:	Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0
Date:	Wed, 19 Apr 2017 18:02:55 +0800
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

On 04/19/2017 05:31 PM, Jan Kiszka wrote:

On 2017-04-19 11:09, Wei Wang wrote:

On 04/19/2017 04:49 PM, Jan Kiszka wrote:

On 2017-04-19 10:42, Wei Wang wrote:

On 04/19/2017 03:35 PM, Jan Kiszka wrote:

On 2017-04-19 08:38, Wang, Wei W wrote:

Hi,
    We made some design changes to the original vhost-pci design,
and want
to open
a discussion about the latest design (labelled 2.0) and its extension
(2.1).
2.0 design: One VM shares the entire memory of another VM
2.1 design: One VM uses an intermediate memory shared with another VM
for
                        packet transmission.
    For the convenience of discussion, I have some pictures
presented at
this link:
_https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf_


    Fig. 1 shows the common driver frame that we want use to build
the 2.0
and 2.1
design. A TX/RX engine consists of a local ring and an exotic ring.
Local ring:
1) allocated by the driver itself;
2) registered with the device (i.e. virtio_add_queue())
Exotic ring:
1) ring memory comes from the outside (of the driver), and exposed to
the driver
        via a BAR MMIO;

Small additional requirement: In order to make this usable with
Jailhouse as well, we need [also] a side-channel configuration for the
regions, i.e. likely via a PCI capability. There are too few BARs, and
they suggest relocatablity, which is not available under Jailhouse for
simplicity reasons (IOW, the shared regions are statically mapped by
the
hypervisor into the affected guest address spaces).

What kind of configuration would you need for the regions?
I think adding a PCI capability should be easy.

Basically address and size, see
https://github.com/siemens/jailhouse/blob/wip/ivshmem2/Documentation/ivshmem-v2-specification.md#vendor-specific-capability-id-09h

Got it, thanks. That should be easy to add to 2.1.

2) does not have a registration in the device, so no ioeventfd/irqfd,
configuration
registers allocated in the device
    Fig. 2 shows how the driver frame is used to build the 2.0 design.
1) Asymmetric: vhost-pci-net <-> virtio-net
2) VM1 shares the entire memory of VM2, and the exotic rings are the
rings
       from VM2.
3) Performance (in terms of copies between VMs):
       TX: 0-copy (packets are put to VM2’s RX ring directly)
       RX: 1-copy (the green arrow line in the VM1’s RX engine)
    Fig. 3 shows how the driver frame is used to build the 2.1 design.
1) Symmetric: vhost-pci-net <-> vhost-pci-net

This is interesting!

2) Share an intermediate memory, allocated by VM1’s vhost-pci device,
for data exchange, and the exotic rings are built on the shared memory
3) Performance:
       TX: 1-copy
RX: 1-copy

I'm not yet sure I to this right: there are two different MMIO regions
involved, right? One is used for VM1's RX / VM2's TX, and the other for
the reverse path? Would allow our requirement to have those regions
mapped with asymmetric permissions (RX read-only, TX read/write).

The design presented here intends to use only one BAR to expose
both TX and RX. The two VMs share an intermediate memory
here, why couldn't we give the same permission to TX and RX?

For security and/or safety reasons: the TX side can then safely prepare
and sign a message in-place because the RX side cannot mess around with
it while not yet being signed (or check-summed). Saves one copy from a
secure place into the shared memory.

If we allow guest1 to write to RX, what safety issue would it cause to
guest2?

This way, guest1 could trick guest2, in a race condition, to sign a
modified message instead of the original one.

Just align the context that we are talking about: RX is the intermediate
shared ring that guest1 uses to receive packets and guest2 uses to send
packet.

Seems the issue is that guest1 will receive a hacked message from RX
(modified by itself). How would it affect guest2?

    Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is
similar).
The four eventfds are allocated by virtio-net, and shared with
vhost-pci-net:
Uses virtio-net’s TX/RX kickfd as the vhost-pci-net’s RX/TX callfd
Uses virtio-net’s TX/RX callfd as the vhost-pci-net’s RX/TX kickfd
Example of how it works:
After packets are put into vhost-pci-net’s TX, the driver kicks TX,
which
causes the an interrupt associated with fd3 to be injected to
virtio-net
    The draft code of the 2.0 design is ready, and can be found here:
Qemu: _https://github.com/wei-w-wang/vhost-pci-device_
Guest driver: _https://github.com/wei-w-wang/vhost-pci-driver_
    We tested the 2.0 implementation using the Spirent packet
generator to transmit 64B packets, the results show that the
throughput of vhost-pci reaches around 1.8Mpps, which is around
two times larger than the legacy OVS+DPDK. Also, vhost-pci shows
better scalability than OVS+DPDK.

Do you have numbers for the symmetric 2.1 case as well? Or is the
driver
not yet ready for that yet? Otherwise, I could try to make it work over
a simplistic vhost-pci 2.1 version in Jailhouse as well. That would
give
a better picture of how much additional complexity this would mean
compared to our ivshmem 2.0.

Implementation of 2.1 is not ready yet. We can extend it to 2.1 after
the common driver frame is reviewed.

Can you you assess the needed effort?

For us, this is a critical feature, because we need to decide if
vhost-pci can be an option at all. In fact, the "exotic ring" will be
the only way to provide secure inter-partition communication on
Jailhouse.

If what is here for 2.0 is suitable to be upstream-ed, I think it will
be easy
to extend it to 2.1 (probably within 1 month).

Unfortunate ordering here, though. Specifically if we need to modify
existing things instead of just adding something. We will need 2.1 prior
to committing to 2.0 being the right thing.


If you want, we can get the common part of design ready first,
then we can start to build on the common part at the same time.
The draft code of 2.0 is ready. I can clean it up, making it easier for
us to continue and change.

Best,
Wei

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] Vhost-pci RFC2.0, Wang, Wei W, 2017/04/19
- Re: [Qemu-devel] Vhost-pci RFC2.0, Marc-André Lureau, 2017/04/19
  - Re: [Qemu-devel] Vhost-pci RFC2.0, Wei Wang, 2017/04/19
- Re: [Qemu-devel] Vhost-pci RFC2.0, Jan Kiszka, 2017/04/19
  - Re: [Qemu-devel] Vhost-pci RFC2.0, Wei Wang, 2017/04/19
    - Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0, Jan Kiszka, 2017/04/19
    - Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0, Wei Wang, 2017/04/19
    - Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0, Jan Kiszka, 2017/04/19
    - Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0, Wei Wang <=
    - Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0, Jan Kiszka, 2017/04/19
    - Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0, Wei Wang, 2017/04/19
    - Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0, Jan Kiszka, 2017/04/19
    - Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0, Wang, Wei W, 2017/04/19
    - Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0, Jan Kiszka, 2017/04/19
    - Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0, Wei Wang, 2017/04/20
    - Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0, Jan Kiszka, 2017/04/20
    - Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0, Wei Wang, 2017/04/20
- Re: [Qemu-devel] [virtio-dev] Vhost-pci RFC2.0, Stefan Hajnoczi, 2017/04/19
  - Re: [Qemu-devel] [virtio-dev] Vhost-pci RFC2.0, Wei Wang, 2017/04/19

Prev by Date: Re: [Qemu-devel] [virtio-dev] Vhost-pci RFC2.0
Next by Date: Re: [Qemu-devel] Hight Processor time of Socket communciation
Previous by thread: Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0
Next by thread: Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0
Index(es):
- Date
- Thread