qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Towards an ivshmem 2.0?


From: Wang, Wei W
Subject: Re: [Qemu-devel] Towards an ivshmem 2.0?
Date: Tue, 17 Jan 2017 09:13:45 +0000

Hi Jan,

On Monday, January 16, 2017 9:10 PM, Jan Kiszka wrote:
> On 2017-01-16 13:41, Marc-André Lureau wrote:
> > On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka <address@hidden
> > <mailto:address@hidden>> wrote:
> >     some of you may know that we are using a shared memory device similar to
> >     ivshmem in the partitioning hypervisor Jailhouse [1].
> >
> >     We started as being compatible to the original ivshmem that QEMU
> >     implements, but we quickly deviated in some details, and in the recent
> >     months even more. Some of the deviations are related to making the
> >     implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
> >     aiming at safety critical systems and, therefore, a small code base.
> >     Other changes address deficits in the original design, like missing
> >     life-cycle management.
> >
> >     Now the question is if there is interest in defining a common new
> >     revision of this device and maybe also of some protocols used on top,
> >     such as virtual network links. Ideally, this would enable us to share
> >     Linux drivers. We will definitely go for upstreaming at least a network
> >     driver such as [2], a UIO driver and maybe also a serial port/console.
> >
> >
> > This sounds like duplicating efforts done with virtio and vhost-pci.
> > Have you looked at Wei Wang proposal?
> 
> I didn't follow it recently, but the original concept was about introducing an
> IOMMU model to the picture, and that's complexity-wise a no-go for us (we can
> do this whole thing in less than 500 lines, even virtio itself is more 
> complex). IIUC,
> the alternative to an IOMMU is mapping the whole frontend VM memory into
> the backend VM - that's security/safety-wise an absolute no-go.

Though the virtio based solution might be complex for you, a big advantage is 
that we have lots of people working to improve virtio. For example, the 
upcoming virtio 1.1 has vring improvement, we can easily upgrade all the virtio 
based solutions, such as vhost-pci, to take advantage of this improvement. From 
the long term perspective, I think this kind of complexity is worthwhile.

We further have security features(e.g. vIOMMU) can be applied to vhost-pci.

> >
> >     Deviations from the original design:
> >
> >     - Only two peers per link
> >
> >
> > sound sane, that's also what vhost-pci aims to afaik
> >
> >
> >       This simplifies the implementation and also the interfaces (think of
> >       life-cycle management in a multi-peer environment). Moreover, we do
> >       not have an urgent use case for multiple peers, thus also not
> >       reference for a protocol that could be used in such setups. If someone
> >       else happens to share such a protocol, it would be possible to discuss
> >       potential extensions and their implications.
> >
> >     - Side-band registers to discover and configure share memory
> > regions
> >
> >       This was one of the first changes: We removed the memory regions from
> >       the PCI BARs and gave them special configuration space registers. By
> >       now, these registers are embedded in a PCI capability. The reasons are
> >       that Jailhouse does not allow to relocate the regions in guest address
> >       space (but other hypervisors may if they like to) and that we now have
> >       up to three of them.
> >
> >
> >  Sorry, I can't comment on that.
> >
> >
> >     - Changed PCI base class code to 0xff (unspecified class)
> >
> >       This allows us to define our own sub classes and interfaces. That is
> >       now exploited for specifying the shared memory protocol the two
> >       connected peers should use. It also allows the Linux drivers to match
> >       on that.
> >
> >
> > Why not, but it worries me that you are going to invent protocols
> > similar to virtio devices, aren't you?
> 
> That partly comes with the desire to simplify the transport (pure shared 
> memory).
> With ivshmem-net, we are at least reusing virtio rings and will try to do 
> this with
> the new (and faster) virtio ring format as well.
> 
> >
> >
> >     - INTx interrupts support is back
> >
> >       This is needed on target platforms without MSI controllers, i.e.
> >       without the required guest support. Namely some PCI-less ARM SoCs
> >       required the reintroduction. While doing this, we also took care of
> >       keeping the MMIO registers free of privileged controls so that a
> >       guest OS can map them safely into a guest userspace application.
> >
> >
> > Right, it's not completely removed from ivshmem qemu upstream,
> > although it should probably be allowed to setup a doorbell-ivshmem
> > with msi=off (this may be quite trivial to add back)
> >
> >
> >     And then there are some extensions of the original ivshmem:
> >
> >     - Multiple shared memory regions, including unidirectional ones
> >
> >       It is now possible to expose up to three different shared memory
> >       regions: The first one is read/writable for both sides. The second
> >       region is read/writable for the local peer and read-only for the
> >       remote peer (useful for output queues). And the third is read-only
> >       locally but read/writable remotely (ie. for input queues).
> >       Unidirectional regions prevent that the receiver of some data can
> >       interfere with the sender while it is still building the message, a
> >       property that is not only useful for safety critical communication,
> >       we are sure.
> >
> >
> > Sounds like a good idea, and something we may want in virtio too

Can you please explain more about the process of transferring a packet using 
the three different memory regions?
In the kernel implementation, the sk_buf can be allocated anywhere.

Btw, this looks similar to the memory access protection mechanism using EPTP 
switching:
Slide 25 
http://www.linux-kvm.org/images/8/87/02x09-Aspen-Jun_Nakajima-KVM_as_the_NFV_Hypervisor.pdf
This missed right side of the figure is an alternative EPT, which gives a full 
access permission to the small piece of security code.

> >
> >
> >     - Life-cycle management via local and remote state
> >
> >       Each device can now signal its own state in form of a value to the
> >       remote side, which triggers an event there. Moreover, state changes
> >       done by the hypervisor to one peer are signalled to the other side.
> >       And we introduced a write-to-shared-memory mechanism for the
> >       respective remote state so that guests do not have to issue an MMIO
> >       access in order to check the state.
> >
> >
> > There is also ongoing work to better support disconnect/reconnect in
> > virtio.
> >
> >
> >
> >     So, this is our proposal. Would be great to hear some opinions if you
> >     see value in adding support for such an "ivshmem 2.0" device to QEMU as
> >     well and expand its ecosystem towards Linux upstream, maybe also DPDK
> >     again. If you see problems in the new design /wrt what QEMU provides so
> >     far with its ivshmem device, let's discuss how to resolve them. Looking
> >     forward to any feedback!
> >
> >
> > My feeling is that ivshmem is not being actively developped in qemu,
> > but rather virtio-based solutions (vhost-pci for vm2vm).
> 
> As pointed out, for us it's most important to keep the design simple - even 
> at the
> price of "reinventing" some drivers for upstream (at least, we do not need two
> sets of drivers because our interface is fully symmetric). I don't see yet how
> vhost-pci could achieve the same, but I'm open to learn more!

Maybe I didn’t fully understand this - "we do not need two sets of drivers 
because our interface is fully symmetric"?

The vhost-pci driver is a standalone network driver from the local guest point 
of view - it's no different than any other network drivers in the guest. When 
talking about usage,  it's used together with another VM's virtio device - 
would this be the "two sets of drivers" that you meant? I think this is pretty 
nature and reasonable, as it is essentially a vm-to-vm communication. 
Furthermore, we are able to dynamically create/destroy and hot-plug in/out a 
vhost-pci device based on runtime requests. 

Thanks for sharing your ideas.

Best,
Wei

reply via email to

[Prev in Thread] Current Thread [Next in Thread]