[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Re: [PATCH 1/1] KVM/userspace: Support for assigning PCI de
From: |
Amit Shah |
Subject: |
[Qemu-devel] Re: [PATCH 1/1] KVM/userspace: Support for assigning PCI devices to guests |
Date: |
Thu, 28 Aug 2008 17:18:01 +0530 |
User-agent: |
KMail/1.9.9 |
* On Wednesday 27 Aug 2008 19:20:26 Anthony Liguori wrote:
> Hi Amit,
>
> Amit Shah wrote:
> > With this patch, we can assign a device on the host machine to a
> > guest.
> >
> > A new command-line option, -pcidevice is added.
> > For example, to invoke it for a device sitting at PCI bus:dev.fn
> > 04:08.0 with host IRQ 18, use this:
> >
> > -pcidevice host=04:08.0
>
> Let's make sure it also works via hotplug. We already have a pci_add
> command. I would expect the syntax should be host:04:08.0 to be more in
> line with usb.
I've not yet tested it (but based on your comments from last time, I made the
necessary changes).
> > libkvm/libkvm-x86.c | 14 +
> > libkvm/libkvm.h | 27 ++
> > qemu/Makefile.target | 1 +
> > qemu/hw/device-assignment.c | 600
> > +++++++++++++++++++++++++++++++++++++++++++ qemu/hw/device-assignment.h |
> > 94 +++++++
> > qemu/hw/isa.h | 2 +
> > qemu/hw/pc.c | 9 +
> > qemu/hw/pci.c | 12 +
> > qemu/hw/pci.h | 1 +
> > qemu/hw/piix_pci.c | 19 ++
> > qemu/vl.c | 18 ++
> > 11 files changed, 797 insertions(+), 0 deletions(-)
> > create mode 100644 qemu/hw/device-assignment.c
> > create mode 100644 qemu/hw/device-assignment.h
>
> This patch is too big on it's on. It should be split into logical parts.
However, it just adds device assignment support and does nothing else. I don't
see a way of splitting this any more.
> > diff --git a/qemu/Makefile.target b/qemu/Makefile.target
> > index 2332fe3..750ecd5 100644
> > --- a/qemu/Makefile.target
> > +++ b/qemu/Makefile.target
> > @@ -611,6 +611,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
> > OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
> > OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
> > OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o
> > +OBJS+= device-assignment.o
>
> We should conditionally compile this into qemu only if we know we can
> support it (has to be a Linux host for instance).
Right now, it gets compiled only on x86. I'll see how to limit this further to
Linux.
> > +#include <stdio.h>
> > +#include <pthread.h>
>
> Why is pthread being included?
Leftover; removed this and other unneeded #includes.
> > +#include <sys/io.h>
> > +#include <sys/ioctl.h>
> > +#include <linux/types.h>
> > +
> > +/* From linux/ioport.h */
> > +#define IORESOURCE_IO 0x00000100 /* Resource type */
> > +#define IORESOURCE_MEM 0x00000200
> > +#define IORESOURCE_IRQ 0x00000400
> > +#define IORESOURCE_DMA 0x00000800
> > +#define IORESOURCE_PREFETCH 0x00001000 /* No side effects */
> > +
> > +#include "device-assignment.h"
> > +#include "irq.h"
> > +
> > +#include "qemu-kvm.h"
> > +#include <linux/kvm_para.h>
> > +
> > +extern FILE *logfile;
> > +
> > +/* #define DEVICE_ASSIGNMENT_DEBUG */
> > +
> > +#ifdef DEVICE_ASSIGNMENT_DEBUG
> > +#define DEBUG(fmt, args...) fprintf(stderr, "%s: " fmt, __func__ , ##
> > args) +#else
> > +#define DEBUG(fmt, args...)
> > +#endif
> > +
> > +#define assigned_dev_ioport_write(suffix) \
> > + static void assigned_dev_ioport_write##suffix(void *opaque, uint32_t
> > addr, \ + uint32_t value)
> > \
> > + { \
> > + assigned_dev_region_t *r_access = (assigned_dev_region_t *)opaque; \
> > + uint32_t r_pio = (unsigned long)r_access->r_virtbase \
> > + + (addr - r_access->e_physbase); \
> > + if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) { \
> > + fprintf(logfile, "assigned_dev_ioport_write" #suffix \
> > + ": r_pio=%08x e_physbase=%08x" \
> > + " r_virtbase=%08lx value=%08x\n", \
> > + r_pio, (int)r_access->e_physbase, \
> > + (unsigned long)r_access->r_virtbase, value); \
> > + } \
> > + iopl(3); \
> > + out##suffix(value, r_pio); \
> > + }
> > +
> > +assigned_dev_ioport_write(b)
> > +assigned_dev_ioport_write(w)
> > +assigned_dev_ioport_write(l)
>
> Please don't do this sort of trickery with macros. Also don't write to
> logfile, write to stderr. Debug should follow the same style as the
> rest of the code.
logfile is just #define logfile stderr. I removed it.
> Also, this whole file needs to be reformated to the QEMU style. How far
> has this diverged from the Xen version of the code? If it's
> sufficiently close, we should coordinate with those guys such that this
> code can eventually go upstream.
As you noticed, it's quite different. They have a lot more code in there. (3k+
lines vs 600 lines). I guess most of it went away when we realised we didn't
need all that. Also, I noticed most of the code there is not in the qemu
style.
> > +static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
> > + uint32_t e_phys, uint32_t e_size, int type)
> > +{
> > + assigned_dev_t *r_dev = (assigned_dev_t *) pci_dev;
> > + assigned_dev_region_t *region = &r_dev->v_addrs[region_num];
> > + int first_map = (region->e_size == 0);
> > + int ret = 0;
> > +
> > + DEBUG("e_phys=%08x r_virt=%p type=%d len=%08x region_num=%d \n",
> > + e_phys, r_dev->v_addrs[region_num].r_virtbase, type, e_size,
> > + region_num);
> > +
> > + region->e_physbase = e_phys;
> > + region->e_size = e_size;
> > +
> > + if (!first_map)
> > + kvm_destroy_phys_mem(kvm_context, e_phys, e_size);
>
> We try to avoid open coding calls to libkvm functions directly within
> QEMU. At the least, it should be wrapped with an if (kvm_enabled()).
With the direct-mmio patch, we just start depending on KVM. Ben-Ami, is there
a chance you can keep the MMIO-via-userspace method supported as well for
non-kvm hosts?
> > +static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num,
> > + uint32_t addr, uint32_t size, int type)
> > +{
> > + assigned_dev_t *r_dev = (assigned_dev_t *) pci_dev;
> > + int i;
> > + uint32_t ((*rf[])(void *, uint32_t)) =
> > + { assigned_dev_ioport_readb,
> > + assigned_dev_ioport_readw,
> > + assigned_dev_ioport_readl
> > + };
> > + void ((*wf[])(void *, uint32_t, uint32_t)) =
> > + { assigned_dev_ioport_writeb,
> > + assigned_dev_ioport_writew,
> > + assigned_dev_ioport_writel
> > + };
>
> I don't think this any nicer (and certainly not less lines) than just
> open coding the register_ioport_{read,write} calls.
Sure.
> > + r_dev->v_addrs[region_num].e_physbase = addr;
> > + DEBUG("%s: address=0x%x type=0x%x len=%d region_num=%d \n",
> > + __func__, addr, type, size, region_num);
> > +
> > + for (i = 0; i < 3; i++) {
> > + register_ioport_write(addr, size, 1<<i, wf[i],
> > + (void *) (r_dev->v_addrs + region_num));
> > + register_ioport_read(addr, size, 1<<i, rf[i],
> > + (void *) (r_dev->v_addrs + region_num));
> > + }
> > +}
> > +
> > +static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t
> > address, + uint32_t val, int len)
> > +{
> > + int fd, r;
> > +
> > + DEBUG("%s: (%x.%x): address=%04x val=0x%08x len=%d\n",
> > + __func__, ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
> > + (uint16_t) address, val, len);
> > +
> > + if (address == 0x4)
> > + pci_default_write_config(d, address, val, len);
> > +
> > + if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
> > + address == 0x3c || address == 0x3d) {
> > + /* used for update-mappings (BAR emulation) */
> > + pci_default_write_config(d, address, val, len);
> > + return;
> > + }
>
> Is this first if check correct? It calls pci_default_write_config but
> then doesn't return? This definitely needs a comment.
>
> > + DEBUG("%s: NON BAR (%x.%x): address=%04x val=0x%08x len=%d\n",
> > + __func__, ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
> > + (uint16_t) address, val, len);
> > + fd = ((assigned_dev_t *)d)->real_device.config_fd;
>
> assigned_dev_t is not QEMU style. It should be AssignedDevice or
> something similar.
Will change this.
> > + lseek(fd, address, SEEK_SET);
>
> Needs error checking.
>
> I'm not going to go any further for now. The code needs an awful lot of
> cleanup. Please try to do at least a couple passes of cleanup and post
> again.
From your other mail:
> Where did this come from originally? It's completely different from
> what is in xen-unstable. What's in xen-unstable is actually a lot nicer
> IMHO.
Nicer in what way?
I think the original code was picked up from before the time it was merged in
Xen.
Amit