[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address
From: |
Michael S. Tsirkin |
Subject: |
Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data |
Date: |
Thu, 19 Jul 2012 17:56:20 +0300 |
On Fri, Jul 20, 2012 at 12:50:26AM +1000, Alexey Kardashevskiy wrote:
> On 20/07/12 00:43, Michael S. Tsirkin wrote:
> > On Fri, Jul 20, 2012 at 12:24:05AM +1000, Alexey Kardashevskiy wrote:
> >> One comment below.
> >>
> >>
> >> On 19/07/12 19:27, Michael S. Tsirkin wrote:
> >>> On Thu, Jul 19, 2012 at 10:32:40AM +1000, Alexey Kardashevskiy
> >>> wrote:
> >>>> On 19/07/12 01:23, Michael S. Tsirkin wrote:
> >>>>> On Wed, Jul 18, 2012 at 11:17:12PM +1000, Alexey Kardashevskiy
> >>>>> wrote:
> >>>>>> On 18/07/12 22:43, Michael S. Tsirkin wrote:
> >>>>>>> On Thu, Jun 21, 2012 at 09:39:10PM +1000, Alexey
> >>>>>>> Kardashevskiy wrote:
> >>>>>>>> Added (msi|msix)_set_message() functions.
> >>>>>>>>
> >>>>>>>> Currently msi_notify()/msix_notify() write to these
> >>>>>>>> vectors to signal the guest about an interrupt so the
> >>>>>>>> correct values have to written there by the guest or
> >>>>>>>> QEMU.
> >>>>>>>>
> >>>>>>>> For example, POWER guest never initializes MSI/MSIX
> >>>>>>>> vectors, instead it uses RTAS hypercalls. So in order to
> >>>>>>>> support MSIX for virtio-pci on POWER we have to initialize
> >>>>>>>> MSI/MSIX message from QEMU.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
> >>>>>>>
> >>>>>>> So guests do enable MSI through config space, but do not
> >>>>>>> fill in vectors?
> >>>>>>
> >>>>>> Yes. msix_capability_init() calls arch_setup_msi_irqs() which
> >>>>>> does everything it needs to do (i.e. calls hypervisor) before
> >>>>>> msix_capability_init() writes PCI_MSIX_FLAGS_ENABLE to the
> >>>>>> PCI_MSIX_FLAGS register.
> >>>>>>
> >>>>>> These vectors are the PCI bus addresses, the way they are set
> >>>>>> is specific for a PCI host controller, I do not see why the
> >>>>>> current scheme is a bug.
> >>>>>
> >>>>> I won't work with any real PCI device, will it? Real pci devices
> >>>>> expect vectors to be written into their memory.
> >>>>
> >>>>
> >>>> Yes. And the hypervisor does this. On POWER (at least book3s -
> >>>> server powerpc, the whole config space kitchen is hidden behind
> >>>> RTAS (kind of bios). For the guest, this RTAS is implemented in
> >>>> hypervisor, for the host - in the system firmware. So powerpc
> >>>> linux does not have to have PHB drivers. Kinda cool.
> >>>>
> >>>> Usual powerpc server is running without the host linux at all, it
> >>>> is running a hypervisor called pHyp. And every guest knows that it
> >>>> is a guest, there is no full machine emulation, it is
> >>>> para-virtualization. In power-kvm, we replace that pHyp with the
> >>>> host linux and now QEMU plays a hypervisor role. Some day We will
> >>>> move the hypervisor to the host kernel completely (?) but now it
> >>>> is in QEMU.
> >>>
> >>> OKay. So it is a POWER-specific weirdness as I suspected. Sure, if
> >>> this is what real hardware does we pretty much have to emulate
> >>> this.
> >>>
> >>>>>>> Very strange. Are you sure it's not just a guest bug? How
> >>>>>>> does it work for other PCI devices?
> >>>>>>
> >>>>>> Did not get the question. It works the same for every PCI
> >>>>>> device under POWER guest.
> >>>>>
> >>>>> I mean for real PCI devices.
> >>>>>
> >>>>>>> Can't we just fix guest drivers to program the vectors
> >>>>>>> properly?
> >>>>>>>
> >>>>>>> Also pls address the comment below.
> >>>>>>
> >>>>>> Comment below.
> >>>>>>
> >>>>>>> Thanks!
> >>>>>>>
> >>>>>>>> --- hw/msi.c | 13 +++++++++++++ hw/msi.h | 1 +
> >>>>>>>> hw/msix.c | 9 +++++++++ hw/msix.h | 2 ++ 4 files
> >>>>>>>> changed, 25 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/hw/msi.c b/hw/msi.c index 5233204..cc6102f
> >>>>>>>> 100644 --- a/hw/msi.c +++ b/hw/msi.c @@ -105,6 +105,19 @@
> >>>>>>>> static inline uint8_t msi_pending_off(const PCIDevice*
> >>>>>>>> dev, bool msi64bit) return dev->msi_cap + (msi64bit ?
> >>>>>>>> PCI_MSI_PENDING_64 : PCI_MSI_PENDING_32); }
> >>>>>>>>
> >>>>>>>> +void msi_set_message(PCIDevice *dev, MSIMessage msg) +{ +
> >>>>>>>> uint16_t flags = pci_get_word(dev->config +
> >>>>>>>> msi_flags_off(dev)); + bool msi64bit = flags &
> >>>>>>>> PCI_MSI_FLAGS_64BIT; + + if (msi64bit) { +
> >>>>>>>> pci_set_quad(dev->config + msi_address_lo_off(dev),
> >>>>>>>> msg.address); + } else { +
> >>>>>>>> pci_set_long(dev->config + msi_address_lo_off(dev),
> >>>>>>>> msg.address); + } + pci_set_word(dev->config +
> >>>>>>>> msi_data_off(dev, msi64bit), msg.data); +} +
> >>>>>>>
> >>>>>>> Please add documentation. Something like
> >>>>>>>
> >>>>>>> /* * Special API for POWER to configure the vectors through
> >>>>>>> * a side channel. Should never be used by devices. */
> >>>>>>
> >>>>>>
> >>>>>> It is useful for any para-virtualized environment I believe,
> >>>>>> is not it? For s390 as well. Of course, if it supports PCI,
> >>>>>> for example, what I am not sure it does though :)
> >>>>>
> >>>>> I expect the normal guest to program the address into MSI
> >>>>> register using config accesses, same way that it enables
> >>>>> MSI/MSIX. Why POWER does it differently I did not yet figure out
> >>>>> but I hope this weirdness is not so widespread.
> >>>>
> >>>>
> >>>> In para-virt I would expect the guest not to touch config space at
> >>>> all. At least it should use one interface rather than two but this
> >>>> is how it is.
> >>>
> >>> It's not new that firmware developers consistently make
> >>> inconsistent design decisions :)
> >>
> >>
> >> It depends on how to look at it. Enabling MSI via the config space is
> >> also done via a special set of hypervisor calls (common and
> >> IBM-specific) so it is all hidden in one place - the system firmware,
> >> what is cool - no PHB drivers in the guest. Although MSI would not
> >> need any additional hypercall to init vectors (everything can be done
> >> via config space), there is MSI-X which stores vectors in BAR and
> >> there is no hypercall for BARs as they are simply memory mapped. This
> >> is I think why the firmware people (or phyp but it is probably the
> >> same) added IBM-specific MSI/MSIX config hypercalls.
> >
> > Well what's wrong with guest doing this through a memory mapped
> > interface?
>
>
> Should not guest allocate addresses and program PHB with them?
What are you asking about? How does MSIX work normally?
OS gets the vector (address/data) pairs in some arch specific way
and then programs them into devices. No need for firmware to touch
any devices.
> The idea was to hide PHB details in the system firmware, this is the point.
The actual result is POWER behaves differently from almost any other
architecture.
> >> And I do not quite understand why MSIX people could not use extended
> >> PCI config space which is 4096 bytes, quite a lot, enough to fit 256
> >> vectors (have not seen a card which asked for more than 9 _per
> >> function_). If somebody really needs 2048, he may want 16384 as well
> >> (or any other crazy number), etc, so why did they put such a limit, it
> >> is a BAR, it is huge? :) A, offtopic anyway.
>
>
> > Well you have just described MSI, just don't use MSIX.
> >
> > The motivation for MSIX was as follows: PCI/PCI-X config space is not
> > 4096 bytes, it is 256 bytes, and is very crowded. You are thinking of
> > PCI express.
>
> MSIX is PCIe feature, no?
No.
> > Config accesses are also nonposted which means at most one
> > must be in flight. This is not appropriate for vector programming which
> > needs to be done from multiple CPUs in parallel.
>
> > Also offtopic, please try to avoid these super long lines in mail :).
>
> Ah. This is from the time when I posted patches via thunderbird and
> disabled wrapping :) Is wrapping at 75 chars ok?
Sure.
>
> >>
> >>
> >>>>>>>> bool msi_enabled(const PCIDevice *dev) { return
> >>>>>>>> msi_present(dev) && diff --git a/hw/msi.h b/hw/msi.h index
> >>>>>>>> 75747ab..6ec1f99 100644 --- a/hw/msi.h +++ b/hw/msi.h @@
> >>>>>>>> -31,6 +31,7 @@ struct MSIMessage {
> >>>>>>>>
> >>>>>>>> extern bool msi_supported;
> >>>>>>>>
> >>>>>>>> +void msi_set_message(PCIDevice *dev, MSIMessage msg);
> >>>>>>>> bool msi_enabled(const PCIDevice *dev); int
> >>>>>>>> msi_init(struct PCIDevice *dev, uint8_t offset, unsigned
> >>>>>>>> int nr_vectors, bool msi64bit, bool msi_per_vector_mask);
> >>>>>>>> diff --git a/hw/msix.c b/hw/msix.c index ded3c55..5f7d6d3
> >>>>>>>> 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -45,6 +45,15 @@
> >>>>>>>> static MSIMessage msix_get_message(PCIDevice *dev,
> >>>>>>>> unsigned vector) return msg; }
> >>>>>>>>
> >>>>>>>> +void msix_set_message(PCIDevice *dev, int vector, struct
> >>>>>>>> MSIMessage msg) +{ + uint8_t *table_entry =
> >>>>>>>> dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE; + +
> >>>>>>>> pci_set_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR,
> >>>>>>>> msg.address); + pci_set_long(table_entry +
> >>>>>>>> PCI_MSIX_ENTRY_DATA, msg.data); +
> >>>>>>>> table_entry[PCI_MSIX_ENTRY_VECTOR_CTRL] &=
> >>>>>>>> ~PCI_MSIX_ENTRY_CTRL_MASKBIT; +} + /* Add MSI-X capability
> >>>>>>>> to the config space for the device. */ /* Given a bar and
> >>>>>>>> its size, add MSI-X table on top of it * and fill MSI-X
> >>>>>>>> capability in the config space. diff --git a/hw/msix.h
> >>>>>>>> b/hw/msix.h index 50aee82..26a437e 100644 --- a/hw/msix.h
> >>>>>>>> +++ b/hw/msix.h @@ -4,6 +4,8 @@ #include "qemu-common.h"
> >>>>>>>> #include "pci.h"
> >>>>>>>>
> >>>>>>>> +void msix_set_message(PCIDevice *dev, int vector,
> >>>>>>>> MSIMessage msg); + int msix_init(PCIDevice *pdev, unsigned
> >>>>>>>> short nentries, MemoryRegion *bar, unsigned bar_nr,
> >>>>>>>> unsigned bar_size); -- 1.7.10
> >>>>>>>>
> >>>>>>>> ps. double '-' and git version is an end-of-patch scissor
> >>>>>>>> as I read somewhere, cannot recall where exactly :)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 21/06/12 20:56, Jan Kiszka wrote:
> >>>>>>>>> On 2012-06-21 12:50, Alexey Kardashevskiy wrote:
> >>>>>>>>>> On 21/06/12 20:38, Jan Kiszka wrote:
> >>>>>>>>>>> On 2012-06-21 12:28, Alexey Kardashevskiy wrote:
> >>>>>>>>>>>> On 21/06/12 17:39, Jan Kiszka wrote:
> >>>>>>>>>>>>> On 2012-06-21 09:18, Alexey Kardashevskiy
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> agrhhh. sha1 of the patch changed after
> >>>>>>>>>>>>>> rebasing :)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Added (msi|msix)_(set|get)_message() function
> >>>>>>>>>>>>>> for whoever might want to use them.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Currently msi_notify()/msix_notify() write to
> >>>>>>>>>>>>>> these vectors to signal the guest about an
> >>>>>>>>>>>>>> interrupt so the correct values have to
> >>>>>>>>>>>>>> written there by the guest or QEMU.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For example, POWER guest never initializes
> >>>>>>>>>>>>>> MSI/MSIX vectors, instead it uses RTAS
> >>>>>>>>>>>>>> hypercalls. So in order to support MSIX for
> >>>>>>>>>>>>>> virtio-pci on POWER we have to initialize
> >>>>>>>>>>>>>> MSI/MSIX message from QEMU.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> As only set* function are required by now, the
> >>>>>>>>>>>>>> "get" functions were added or made public for
> >>>>>>>>>>>>>> a symmetry.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy
> >>>>>>>>>>>>>> <address@hidden> --- hw/msi.c | 29
> >>>>>>>>>>>>>> +++++++++++++++++++++++++++++ hw/msi.h | 2
> >>>>>>>>>>>>>> ++ hw/msix.c | 11 ++++++++++- hw/msix.h |
> >>>>>>>>>>>>>> 3 +++ 4 files changed, 44 insertions(+), 1
> >>>>>>>>>>>>>> deletion(-)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> diff --git a/hw/msi.c b/hw/msi.c index
> >>>>>>>>>>>>>> 5233204..9ad84a4 100644 --- a/hw/msi.c +++
> >>>>>>>>>>>>>> b/hw/msi.c @@ -105,6 +105,35 @@ static inline
> >>>>>>>>>>>>>> uint8_t msi_pending_off(const PCIDevice* dev,
> >>>>>>>>>>>>>> bool msi64bit) return dev->msi_cap + (msi64bit
> >>>>>>>>>>>>>> ? PCI_MSI_PENDING_64 : PCI_MSI_PENDING_32); }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> +MSIMessage msi_get_message(PCIDevice *dev)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> MSIMessage msi_get_message(PCIDevice *dev,
> >>>>>>>>>>>>> unsigned vector)
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Who/how/why is going to calculate the vector
> >>>>>>>>>>>> here?
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> +{ + uint16_t flags =
> >>>>>>>>>>>>>> pci_get_word(dev->config +
> >>>>>>>>>>>>>> msi_flags_off(dev)); + bool msi64bit =
> >>>>>>>>>>>>>> flags & PCI_MSI_FLAGS_64BIT; + MSIMessage
> >>>>>>>>>>>>>> msg; + + if (msi64bit) { +
> >>>>>>>>>>>>>> msg.address = pci_get_quad(dev->config +
> >>>>>>>>>>>>>> msi_address_lo_off(dev)); + } else { +
> >>>>>>>>>>>>>> msg.address = pci_get_long(dev->config +
> >>>>>>>>>>>>>> msi_address_lo_off(dev)); + } + msg.data
> >>>>>>>>>>>>>> = pci_get_word(dev->config + msi_data_off(dev,
> >>>>>>>>>>>>>> msi64bit));
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> And I have this here in addition:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> unsigned int nr_vectors =
> >>>>>>>>>>>>> msi_nr_vectors(flags); ...
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> if (nr_vectors > 1) { msg.data &= ~(nr_vectors -
> >>>>>>>>>>>>> 1); msg.data |= vector; }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> See PCI spec and existing code.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> What for? I really do not get it why someone might
> >>>>>>>>>>>> want to read something but not real value. What
> >>>>>>>>>>>> PCI code should I look?
> >>>>>>>>>>>
> >>>>>>>>>>> I'm not sure what your use case for reading the
> >>>>>>>>>>> message is. For KVM device assignment it is
> >>>>>>>>>>> preparing an alternative message delivery path for
> >>>>>>>>>>> MSI vectors. And for this we will need vector
> >>>>>>>>>>> notifier support for MSI as well. You can check the
> >>>>>>>>>>> MSI-X code for corresponding use cases of
> >>>>>>>>>>> msix_get_message.
> >>>>>>>>>>
> >>>>>>>>>>> And when we already have msi_get_message, another
> >>>>>>>>>>> logical use case is msi_notify. See msix.c again.
> >>>>>>>>>>
> >>>>>>>>>> Aaaa.
> >>>>>>>>>>
> >>>>>>>>>> I have no case for reading the message. All I need is
> >>>>>>>>>> writing. And I want it public as I want to use it from
> >>>>>>>>>> hw/spapr_pci.c. You suggested to add reading, I added
> >>>>>>>>>> "get" to be _symmetric_ to "set" ("get" returns what
> >>>>>>>>>> "set" wrote). You want a different thing which I can
> >>>>>>>>>> do but it is not msi_get_message(), it is something
> >>>>>>>>>> like msi_prepare_message(MSImessage msg) or
> >>>>>>>>>> msi_set_vector(uint16_t data) or simply internal
> >>>>>>>>>> kitchen of msi_notify().
> >>>>>>>>>>
> >>>>>>>>>> Still can do what you suggested, it just does not seem
> >>>>>>>>>> right.
> >>>>>>>>>
> >>>>>>>>> It is right - when looking at it from a different angle.
> >>>>>>>>> ;)
> >>>>>>>>>
> >>>>>>>>> I don't mind if you add msi_get_message now or leave
> >>>>>>>>> this to me. Likely the latter is better as you have no
> >>>>>>>>> use case for msi_get_message (and also
> >>>>>>>>> msix_get_message!) outside of their modules, thus we
> >>>>>>>>> should not export those functions anyway.
> >>
> >>
> >> -- Alexey
> >>
>
>
> --
> Alexey
>
- Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data, (continued)
Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data, Michael S. Tsirkin, 2012/07/18
- Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data, Alexey Kardashevskiy, 2012/07/18
- Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data, Michael S. Tsirkin, 2012/07/18
- Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data, Alexey Kardashevskiy, 2012/07/18
- Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data, Michael S. Tsirkin, 2012/07/19
- Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data, Alexey Kardashevskiy, 2012/07/19
- Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data, Michael S. Tsirkin, 2012/07/19
- Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data, Alexey Kardashevskiy, 2012/07/19
- Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data,
Michael S. Tsirkin <=
[Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data, Alexey Kardashevskiy, 2012/07/18
Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data, Michael S. Tsirkin, 2012/07/19