qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] 答复: Re: [RFC] virtio-fc: draft idea of virtual fibre c


From: Hannes Reinecke
Subject: Re: [Qemu-devel] 答复: Re: [RFC] virtio-fc: draft idea of virtual fibre channel HBA
Date: Tue, 16 May 2017 17:22:31 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0

On 05/16/2017 10:19 AM, Paolo Bonzini wrote:
> 
>> Maybe a union with an overall size of 256 byte (to hold the iSCSI iqn
>> string), which for FC carries the WWPN and the WWNN?
> 
> That depends on how you would like to do controller passthrough in
> general.  iSCSI doesn't have the 64-bit target ID, and doesn't have
> (AFAIK) hot-plug/hot-unplug support, so it's less important than FC.
> 
iSCSI has its 'iqn' string, which is defined to be a 256-byte string.
Hence the number :-)
And if we're updating virtio anyway, we could as well update it to carry
_all_ possible scsi IDs.

>>> 2) If the initiator ID is the moral equivalent of a MAC address,
>>> shouldn't it be the host that provides the initiator ID to the host in
>>> the virtio-scsi config space?  (From your proposal, I'd guess it's the
>>> latter, but maybe I am not reading correctly).
>>
>> That would be dependent on the emulation. For emulated SCSI disk I guess
>> we need to specify it in the commandline somewhere, but for scsi
>> passthrough we could grab it from the underlying device.
> 
> Wait, that would be the target ID.  The initiator ID would be the NPIV
> vport's WWNN/WWPN.  It could be specified on the QEMU command line, or
> it could be tied to some file descriptor (created and initialized by
> libvirt, which has CAP_SYS_ADMIN, and then passed to QEMU; similar to
> tap file descriptors).
> 
No, I do mean the initiator ID.
If we allow qemu to specify an initiator ID, qemu could find the host
NPIV instance and expose that ID via virtio to the guest.
Or we could specify the NPIV host for qemu, and qemu does the magic
internally.

>>>> b) stop exposing the devices attached to that NPIV host to the guest
>>>
>>> What do you mean exactly?
>>>
>> That's one of the longer term plans I have.
>> When doing NPIV currently all devices from the NPIV host appear on the
>> host. Including all partitions, LVM devices and what not. [...]
>> If we make the (guest) initiator ID identical to the NPIV WWPN we can
>> tag the _host_ to not expose any partitions on any LUNs, making the
>> above quite easy.
> 
> Yes, definitely.
> 
>>> At this point, I can think of several ways  to do this, one being SG_IO
>>> in QEMU while the other are more exoteric.
>>>
>>> 1) use virtio-scsi with userspace passthrough (current solution).
>>
>> With option (1) and the target/initiator ID extensions we should be able
>> to get basic NPIV support to work, and would even be able to handle
>> reservations in a sane manner.
> 
> Agreed, but I'm not anymore that sure that the advantages outweigh the
> disadvantages.  Also, let's add no FC-NVMe support to the disadvantages.
> 
>>> 2) the exact opposite: use the recently added "mediated device
>>> passthrough" (mdev) framework to present a "fake" PCI device to the
>>> guest.
>>
>> (2) sounds interesting, but I'd have to have a look into the code to
>> figure out if it could easily be done.
> 
> Not that easy, but it's the bread and butter of the hardware manufacturers.
> If we want them to do it alone, (2) is the way.  Both nVidia and Intel are
> using it.
> 
>>> 3) handle passthrough with a kernel driver.  Under this model, the guest
>>> uses the virtio device, but the passthrough of commands and TMFs is
>>> performed by the host driver.
>>>
>>> We can then choose whether to do it with virtio-scsi or with a new
>>> virtio-fc.
>>
>> (3) would be feasible, as it would effectively mean 'just' to update the
>> current NPIV mechanism. However, this would essentially lock us in for
>> FC; any other types (think NVMe) will require yet another solution.
> 
> An FC-NVMe driver could also expose the same vhost interface, couldn't it?
> FC-NVMe doesn't have to share the Linux code; but sharing the virtio standard
> and the userspace ABI would be great.
> 
> In fact, the main advantage of virtio-fc would be that (if we define it 
> properly)
> it could be reused for FC-NVMe instead of having to extend e.g. virtio-blk.
> For example virtio-scsi has request, to-device payload, response, from-device
> payload.  virtio-fc's request format could be the initiator and target port
> identifiers, followed by FCP_CMD, to-device payload, FCP_RSP, from-device
> payload.
> 
As already said: We do _not_ have access to the FCP frames.
So designing a virtio-fc protocol will only work for libfc-based HBAs,
namely fnic, bnx2fc, and fcoe.
Given that the future of FCoE is somewhat unclear I doubt it's a good
idea to restrict ourselves to that.

>>> 4) same as (3), but in userspace with a "macvtap" like layer (e.g.,
>>> socket+bind creates an NPIV vport).  This layer can work on some kind of
>>> FCP encapsulation, not the raw thing, and virtio-fc could be designed
>>> according to a similar format for simplicity.
>>
>> (4) would require raw FCP frame access, which is one thing we do _not_
>> have. Each card (except for the pure FCoE ones like bnx2fc, fnic, and
>> fcoe) only allows access to pre-formatted I/O commands. And has it's own
>> mechanism for generatind sequence IDs etc. So anything requiring raw FCP
>> access is basically out of the game.
> 
> Not raw.  It could even be defined at the exchange level (plus some special
> things for discovery and login services).  But I agree that (4) is a bit
> pie-in-the-sky.
> 
>> Overall, I would vote to specify a new virtio scsi format _first_,
>> keeping in mind all of these options.
>> (1), (3), and (4) all require an update anyway :-)
>>
>> The big advantage I see with (1) is that it can be added with just some
>> code changes to qemu and virtio-scsi. Every other option require some
>> vendor buy-in, which inevitably leads to more discussions, delays, and
>> more complex interaction (changes to qemu, virtio, _and_ the affected HBAs).
> 
> I agree.  But if we have to reinvent everything in a couple years for
> NVMe over fabrics, maybe it's not worth it.
> 
>> While we're at it: We also need a 'timeout' field to the virtion request
>> structure. I even posted an RFC for it :-)
> 
> Yup, I've seen it. :)
> 
Cool. Thanks.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Teamlead Storage & Networking
address@hidden                                 +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]