qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the de


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device
Date: Sun, 21 Nov 2010 18:38:31 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Sun, Nov 21, 2010 at 06:01:11PM +0200, Gleb Natapov wrote:
> On Sun, Nov 21, 2010 at 04:48:44PM +0200, Michael S. Tsirkin wrote:
> > On Sun, Nov 21, 2010 at 02:50:14PM +0200, Gleb Natapov wrote:
> > > On Sun, Nov 21, 2010 at 01:53:26PM +0200, Michael S. Tsirkin wrote:
> > > > > > The guests.
> > > > > Which one? There are many guests. Your favorite?
> > > > > 
> > > > > > For CLI, we need an easy way to map a device in guest to the
> > > > > > device in qemu and back.
> > > > > Then use eth0, /dev/sdb, or even C:. Your way is not less broken 
> > > > > since what
> > > > > you are saying is "lets use name that guest assigned to a device". 
> > > > 
> > > > No I am saying let's use the name that our ACPI tables assigned.
> > > > 
> > > ACPI does not assign any name. In a best case ACPI tables describe 
> > > resources
> > > used by a device.
> > 
> > Not only that. bus number and segment aren't resources as such.
> > They describe addressing.
> > 
> > > And not all guests qemu supports has support for ACPI. Qemu
> > > even support machines types that do not support ACPI.
> > 
> > So? Different machines -> different names.
> > 
> You want to have different cli for different type of machines qemu
> supports?

Different device names.

> > > > > > 
> > > > > > 
> > > > > > > It looks like you identify yourself with most of
> > > > > > > qemu users, but if most qemu users are like you then qemu has not 
> > > > > > > enough
> > > > > > > users :) Most users that consider themselves to be "advanced" may 
> > > > > > > know
> > > > > > > what eth1 or /dev/sdb means. This doesn't mean we should provide
> > > > > > > "device_del eth1" or "device_add /dev/sdb" command though. 
> > > > > > > 
> > > > > > > More important is that "domain" (encoded as number like you used 
> > > > > > > to)
> > > > > > > and "bus number" has no meaning from inside qemu.
> > > > > > > So while I said many
> > > > > > > times I don't care about exact CLI syntax to much it should make 
> > > > > > > sense
> > > > > > > at least. It can use id to specify PCI bus in CLI like this:
> > > > > > > device_del pci.0:1.1. Or it can even use device id too like this:
> > > > > > > device_del pci.0:ide.0. Or it can use HW topology like in FO 
> > > > > > > device
> > > > > > > path. But doing ah-hoc device enumeration inside qemu and then 
> > > > > > > using it
> > > > > > > for CLI is not it.
> > > > > > > 
> > > > > > > > functionality in the guests.  Qemu is buggy in the moment in 
> > > > > > > > that is
> > > > > > > > uses the bus addresses assigned by guest and not the ones in 
> > > > > > > > ACPI,
> > > > > > > > but that can be fixed.
> > > > > > > It looks like you confused ACPI _SEG for something it isn't.
> > > > > > 
> > > > > > Maybe I did. This is what linux does:
> > > > > > 
> > > > > > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root
> > > > > > *root)
> > > > > > {
> > > > > >         struct acpi_device *device = root->device;
> > > > > >         int domain = root->segment;
> > > > > >         int busnum = root->secondary.start;
> > > > > > 
> > > > > > And I think this is consistent with the spec.
> > > > > > 
> > > > > It means that one domain may include several host bridges.
> > > > > At that level
> > > > > domain is defined as something that have unique name for each device
> > > > > inside it thus no two buses in one segment/domain can have same bus
> > > > > number. This is what PCI spec tells you. 
> > > > 
> > > > And that really is enough for CLI because all we need is locate the
> > > > specific slot in a unique way.
> > > > 
> > > At qemu level we do not have bus numbers. They are assigned by a guest.
> > > So inside a guest domain:bus:slot.func points you to a device, but in
> > > qemu does not enumerate buses.
> > > 
> > > > > And this further shows that using "domain" as defined by guest is very
> > > > > bad idea. 
> > > > 
> > > > As defined by ACPI, really.
> > > > 
> > > ACPI is a part of a guest software that may not event present in the
> > > guest. How is it relevant?
> > 
> > It's relevant because this is what guests use. To access the root
> > device with cf8/cfc you need to know the bus number assigned to it
> > by firmware. How that was assigned is of interest to BIOS/ACPI but not
> > really interesting to the user or, I suspect, guest OS.
> > 
> Of course this is incorrect. OS can re-enumerate PCI if it wishes. Linux
> have cmd just for that.

I haven't looked but I suspect linux will simply assume cf8/cfc and
and start doing it from there. If that doesn't get you the root
device you wanted, tough.

> And saying that ACPI is relevant because this is
> what guest software use in a reply to sentence that states that not all
> guest even use ACPI is, well, strange.
> 
> And ACPI describes only HW that present at boot time. What if you
> hot-plugged root pci bridge? How non existent PCI naming helps you?

that's described by ACPI as well.

> > > > > > > ACPI spec
> > > > > > > says that PCI segment group is purely software concept managed by 
> > > > > > > system
> > > > > > > firmware. In fact one segment may include multiple PCI host 
> > > > > > > bridges.
> > > > > > 
> > > > > > It can't I think:
> > > > > Read _BBN definition:
> > > > >  The _BBN object is located under a PCI host bridge and must be 
> > > > > unique for
> > > > >  every host bridge within a segment since it is the PCI bus number.
> > > > > 
> > > > > Clearly above speaks about multiple host bridge within a segment.
> > > > 
> > > > Yes, it looks like the firmware spec allows that.
> > > It even have explicit example that shows it.
> > > 
> > > > 
> > > > > >     Multiple Host Bridges
> > > > > > 
> > > > > >     A platform may have multiple PCI Express or PCI-X host bridges. 
> > > > > > The base
> > > > > >     address for the
> > > > > >     MMCONFIG space for these host bridges may need to be allocated 
> > > > > > at
> > > > > >     different locations. In such
> > > > > >     cases, using MCFG table and _CBA method as defined in this 
> > > > > > section means
> > > > > >     that each of these host
> > > > > >     bridges must be in its own PCI Segment Group.
> > > > > > 
> > > > > This is not from ACPI spec,
> > > > 
> > > > PCI Firmware Specification 3.0
> > > > 
> > > > > but without going to deep into it above
> > > > > paragraph talks about some particular case when each host bridge must
> > > > > be in its own PCI Segment Group with is a definite prove that in other
> > > > > cases multiple host bridges can be in on segment group.
> > > > 
> > > > I stand corrected. I think you are right. But note that if they are,
> > > > they must have distinct bus numbers assigned by ACPI.
> > > ACPI does not assign any numbers.
> > 
> > For all root pci devices firmware must supply BBN number. This is the
> > bus number, isn't it? For nested buses, this is optional.
> Nonsense. _BBN is optional and does not present in Seabios DSDT.

The spec says it's not optional for host bridges:

        Firmware must report Host Bridges in the ACPI name space. Each Host
        Bridge object must contain the following objects:
           _HID and _CID
           _CRS to determine all resources consumed and produced (passed through
        to the secondary bus)
           by the host bridge. Firmware allocates resources (Memory Addresses,
        I/O Port, etc.) to Host
           Bridges. The _CRS descriptor informs the operating system of the
        resources it may use for
           configuring devices below the Host Bridge.
           ● _TRA, _TTP, and _TRS translation offsets to inform the operating
        system of the mapping
                between the primary bus and the secondary bus.
           _PRT and the interrupt descriptor to determine interrupt routing.

         _BBN to obtain a bus number.

so seabios seems to be out of spec.

> As far
> as I can tell it is only needed if PCI segment group has more then one
> pci host bridges.

No. Because cfc/cf8 are not aware of _SEG.

> > 
> > > Bios enumerates buses and assign
> > > numbers.
> > 
> > There's no standard way to enumerate pci root devices in guest AFAIK.
> > The spec says:
> >     Firmware must configure all Host Bridges in the systems, even if
> >     they are not connected to a console or boot device. Firmware must
> >     configure Host Bridges in order to allow operating systems to use the
> >     devices below the Host Bridges. This is because the Host Bridges
> >     programming model is not defined by the PCI Specifications. 
> > 
> > 
> Guest should be aware of HW to use it. Be it through bios or driver.

Why should it? You get bus number and stiuck it in cf8/cfc,
you get a config cycle. No magic HW awareness needed.

> > > ACPI, in a base case, describes what BIOS did to OSPM. Qemu sits
> > > one layer below all this and does not enumerate PC buses. Even if we make
> > > it to do so there is not way to guaranty that guest will enumerate them
> > > in the same order since there is more then one way to do enumeration. I
> > > repeated this numerous times to you already.
> > 
> > ACPI is really part of the motherboard. Calling it the guest just
> > confuses things. Guest OS can override bus numbering for nested buses
> > but not for root buses.
> > 
> If calling ACPI part of a guest confuses you then you are already
> confused. Guest OS can do whatever it wishes with any enumeration FW did
> if it knows better.
> 
> > > > 
> > > > > > 
> > > > > > > _SEG
> > > > > > > is not what OSPM uses to tie HW resource to ACPI resource. It 
> > > > > > > used _CRS
> > > > > > > (Current Resource Settings) for that just like OF. No surprise 
> > > > > > > there.
> > > > > > 
> > > > > > OSPM uses both I think.
> > > > > > 
> > > > > > All I see linux do with CRS is get the bus number range.
> > > > > So lets assume that HW has two PCI host bridges and ACPI has:
> > > > >         Device(PCI0) {
> > > > >             Name (_HID, EisaId ("PNP0A03"))
> > > > >             Name (_SEG, 0x00)
> > > > >         }
> > > > >         Device(PCI1) {
> > > > >             Name (_HID, EisaId ("PNP0A03"))
> > > > >             Name (_SEG, 0x01)
> > > > >         }
> > > > > I.e no _CRS to describe resources. How do you think OSPM knows which 
> > > > > of
> > > > > two pci host bridges is PCI0 and which one is PCI1?
> > > > 
> > > > You must be able to uniquely address any bridge using the combination 
> > > > of _SEG
> > > > and _BBN.
> > > 
> > > No at all. And saying "you must be able" without actually show how
> > > doesn't prove anything. _SEG is relevant only for those host bridges
> > > that support MMCONFIG (not all of them do, and none that qemu support
> > > does yet). _SEG points to specific entry in MCFG table and MCFG entry
> > > holds base address for MMCONFIG space for the bridge (this address
> > > is configured by a guest). This is all _SEG does really, no magic at
> > > all. _BBN returns bus number assigned by the BIOS to host bridge. Nothing
> > > qemu visible again.
> > > So _SEG and _BBN gives you two numbers assigned by
> > > a guest FW. Nothing qemu can use to identify a device.
> > 
> > This FW is given to guest by qemu. It only assigns bus numbers
> > because qemu told it to do so.
> Seabios is just a guest qemu ships. There are other FW for qemu. Bochs
> bios, openfirmware, efi. All of them where developed outside of qemu
> project and all of them are usable without qemu. You can't consider them
> be part of qemu any more then Linux/Windows with virtio drivers.
> 
> > 
> > > > 
> > > > > > And the spec says, e.g.:
> > > > > > 
> > > > > >       the memory mapped configuration base
> > > > > >     address (always corresponds to bus number 0) for the PCI 
> > > > > > Segment Group
> > > > > >     of the host bridge is provided by _CBA and the bus range 
> > > > > > covered by the
> > > > > >     base address is indicated by the corresponding bus range 
> > > > > > specified in
> > > > > >     _CRS.
> > > > > > 
> > > > > Don't see how it is relevant. And _CBA is defined only for PCI 
> > > > > Express. Lets
> > > > > solve the problem for PCI first and then move to PCI Express. Jumping 
> > > > > from one
> > > > > to another destruct us from main discussion.
> > > > 
> > > > I think this is what confuses us.  As long as you are using cf8/cfc 
> > > > there's no
> > > > concept of a domain really.
> > > > Thus:
> > > >         /address@hidden
> > > > 
> > > > is probably enough for BIOS boot because we'll need to make root bus 
> > > > numbers
> > > > unique for legacy guests/option ROMs.  But this is not a hardware 
> > > > requirement
> > > > and might become easier to ignore eith EFI.
> > > > 
> > > You do not need MMCONFIG to have multiple PCI domains. You can have one
> > > configured via standard cf8/cfc and another one on ef8/efc and one more
> > > at mmio fce00000 and you can address all of them:
> > > /address@hidden
> > > /address@hidden
> > > /address@hidden
> > > 
> > > And each one of those PCI domains can have 256 subbridges.
> > 
> > Will common guests such as windows or linux be able to use them? This
> With proper drivers yes. There is HW with more then one PCI bus and I
> think qemu emulates some of it (PPC MAC for instance). 
> 
> > seems to be outside the scope of the PCI Firmware specification, which
> > says that bus numbers must be unique.
> They must be unique per PCI segment group.
> 
> > 
> > > > > > 
> > > > > > > > 
> > > > > > > > That should be enough for e.g. device_del. We do have the need 
> > > > > > > > to
> > > > > > > > describe the topology when we interface with firmware, e.g. to 
> > > > > > > > describe
> > > > > > > > the ACPI tables themselves to qemu (this is what Gleb's patches 
> > > > > > > > deal
> > > > > > > > with), but that's probably the only case.
> > > > > > > > 
> > > > > > > Describing HW topology is the only way to unambiguously describe 
> > > > > > > device to
> > > > > > > something or someone outside qemu and have persistent device 
> > > > > > > naming
> > > > > > > between different HW configuration.
> > > > > > 
> > > > > > Not really, since ACPI is a binary blob programmed by qemu.
> > > > > > 
> > > > > APCI is part of the guest, not qemu.
> > > > 
> > > > Yes it runs in the guest but it's generated by qemu. On real hardware,
> > > > it's supplied by the motherboard.
> > > > 
> > > It is not generated by qemu. Parts of it depend on HW and other part 
> > > depend
> > > on how BIOS configure HW. _BBN for instance is clearly defined to return
> > > address assigned bu the BIOS.
> > 
> > BIOS is supplied on the motherboard and in our case by qemu as well.
> You can replace MB bios by coreboot+seabios on some of them.
> Manufacturer don't want you to do it and make it hard to do, but
> otherwise this is just software, not some magic dust.
> 
> > There's no standard way for BIOS to assign bus number to the pci root,
> > so it does it in device-specific way. Why should a management tool
> > or a CLI user care about these? As far as they are concerned
> > we could use some PV scheme to find root devices and assign bus
> > numbers, and it would be exactly the same.
> > 
> Go write KVM userspace that does that. AFAIK there is project out there
> that tries to do that. No luck so far. Your world view is very x86/Linux
> centric. You need to broaden it a little bit. Next time you propose
> something ask yourself will it work with qemu-sparc, qemu-ppc, qemu-amd.
> 
> 
> > > > > Just saying "not really" doesn't
> > > > > prove much. I still haven't seen any proposition from you that 
> > > > > actually
> > > > > solve the problem. No, "lets use guest naming" is not it. There is no
> > > > > such thing as "The Guest". 
> > > > > 
> > > > > --
> > > > >                       Gleb.
> > > > 
> > > > I am sorry if I didn't make this clear.  I think we should use the 
> > > > domain:bus
> > > > pair to name the root device. As these are unique and 
> > > > 
> > > You forgot to complete the sentence :) But you made it clear enough and
> > > it is incorrect. domain:bus pair not only not unique they do not exist
> > > in qemu at all
> > 
> > Sure they do. domain maps to mcfg address for express. bus is used for
> mcfg is optional as far as I can see. You can compile out MMCONFIG
> support on Linux.
> 
> > cf8/cfc addressing. They are assigned by BIOS but since BIOS
> > is supplied with hardware the point is moot.
> Most PC hardware is supplied with Windows, so what? BIOS is a code that
> runs in a guest. It is part of a guest. Every line of code executed by
> vcpu belongs to a guest. No need to redefine things to prove you point.
> 
> > 
> > > and as such can't be used to address device. They are
> > > product of HW enumeration done by a guest OS just like eth0 or C:.
> > > 
> > > --
> > >                   Gleb.
> > 
> > There's a huge difference between BIOS and guest OS,
> Not true.
> 
> >                                            and between bus
> > numbers of pci root and of nested bridges.
> Really? What is it?
> 
> > 
> > Describing hardware io ports makes sense if you are trying to
> > communicate data from qemu to the BIOS.  But the rest of the world might
> > not care.
> > 
> The part of the world that manage HW cares. You may need to add device
> from monitor before first line of BIOS is event executed. How can you
> rely on BIOS enumerate of devices in this case?
> 
> 
> --
>                       Gleb.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]