[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Multi GPU passthrough via VFIO
From: |
Maik Broemme |
Subject: |
Re: [Qemu-devel] Multi GPU passthrough via VFIO |
Date: |
Fri, 7 Feb 2014 19:07:09 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Hi Alex,
Maik Broemme <address@hidden> wrote:
> Hi Alex,
>
> Alex Williamson <address@hidden> wrote:
> > On Thu, 2014-02-06 at 01:25 +0100, Maik Broemme wrote:
> > > Hi Alex,
> > >
> > > Maik Broemme <address@hidden> wrote:
> > > > > > > > Another minor issue is that the R9 290X is not reset during
> > > > > > > > shutdown of
> > > > > > > > VM (neither Linux nor Windows) but it can be tricked with doing
> > > > > > > > "suspend-to-ram" between two starts. That's why I use
> > > > > > > > '-no-reboot' option
> > > > > > > > in QEMU. The 7870 is doing the reset properly.
> > > > > > >
> > > > > > >
> > > > > > > Is the NoSoftRst "-" on the 290X vs "+" on the 7870 in lspci -vvv
> > > > > > > by
> > > > > > > chance? Thanks,
> > > > > > >
> > > > > >
> > > > > > Here are both. It is funny it is opposite as you described. :)
> > > > >
> > > > >
> > > > > Oops, yes. Does this help?
> > > > >
> > > > > --- a/hw/misc/vfio.c
> > > > > +++ b/hw/misc/vfio.c
> > > > > @@ -3136,7 +3136,7 @@ static void vfio_pci_reset_handler(void *opaque)
> > > > >
> > > > > QLIST_FOREACH(group, &group_list, next) {
> > > > > QLIST_FOREACH(vdev, &group->device_list, next) {
> > > > > - if (!vdev->reset_works || (!vdev->has_flr &&
> > > > > vdev->has_pm_reset)) {
> > > > > + if (!vdev->reset_works || !vdev->has_flr) {
> > > > > vdev->needs_reset = true;
> > > > > }
> > > > > }
> > > > >
> > > > > I can't figure out why I coded it the way that I did. Probably overly
> > > > > targeting a specific device. Thanks,
> > > > >
> > > >
> > > > This patch works absolutely fine. After applying it to my 'qemu-git',
> > > > the
> > > > device resets works flawlessly. So it would be great to push it upstream
> > > > as it looks good.
> > > >
> > >
> > > Okay sorry. I was too fast here. It was just working first time but now
> > > even after clean reboot it no longer works as expected but behavior
> > > is very strange.
> > >
> > > Windows:
> > >
> > > 1st boot works fine - boot VGA and Windows ATI driver loaded, issue
> > > reboot and qemu stopped due to '-no-reboot'.
> > >
> > > 2nd boot works partially - boot VGA and Windows ATI driver loaded but
> > > black screen and my system becames terrible slow and mostly
> > > unresponsive. My dmesg shows immediately after ATI driver will
> > > enable the device the following:
> > >
> > > [ 159.984324] vfio_ecap_init: 0000:01:00.0 hiding ecap address@hidden
> > > [ 159.984340] vfio_ecap_init: 0000:01:00.0 hiding ecap address@hidden
> > > [ 160.129036] vfio_ecap_init: 0000:02:00.0 hiding ecap address@hidden
> > > [ 160.129049] vfio_ecap_init: 0000:02:00.0 hiding ecap address@hidden
> > > [ 172.977677] kvm: zapping shadow pages for mmio generation wraparound
> > > [ 173.160174] br0: port 2(tap0) entered forwarding state
> > > [ 175.902967] vfio-pci 0000:01:00.0: irq 46 for MSI/MSI-X
> > > [ 188.340430] Clocksource tsc unstable (delta = -119654611 ns)
> > > [ 188.340511] Switched to clocksource hpet
> > > [ 191.088693] hpet1: lost 12 rtc interrupts
> > > [ 191.926555] hpet1: lost 25 rtc interrupts
> > >
> > > So your patch fixed indeed reset issue of boot VGA but something else
> > > is wrong now. :)
> >
> > Can you try the cards separately? If you run lspci on the device in the
> > host, does it report as normal? Often when the host gets slow and we
> > get these sorts of clock issues it means the bus is fatal and we get
> > timeouts trying to read from it.
> >
>
> Okay with only one card I don't have the clock issues anymore, so we
> should look into this a bit later as working reset seems more important
> for now.
>
> > > Linux (fglrx):
> > >
> > > 1st boot works fine - boot VGA, fglrx loads fine and X could be
> > > started, issue reboot via SSH and qemu stopped due to
> > > '-no-reboot'.
> > >
> > > 2nd boot works partially - boot VGA, fglrx loads fine but X couldn't
> > > be started and fails with:
> > >
> > > [ 34.265111] fglrx_pci 0000:02:00.0: irq 50 for MSI/MSI-X
> > > [ 34.344313] <6>[fglrx] Firegl kernel thread PID: 318
> > > [ 34.344400] <6>[fglrx] Firegl kernel thread PID: 319
> > > [ 34.344478] <6>[fglrx] Firegl kernel thread PID: 320
> > > [ 34.344589] <6>[fglrx] IRQ 50 Enabled
> > > [ 34.356105] <6>[fglrx] Reserved FB block: Shared offset:0,
> > > size:1000000
> > > [ 34.356107] <6>[fglrx] Reserved FB block: Unshared offset:fac3000,
> > > size:3000
> > > [ 34.356109] <6>[fglrx] Reserved FB block: Unshared offset:fac6000,
> > > size:23a000
> > > [ 34.356110] <6>[fglrx] Reserved FB block: Unshared offset:7fff4000,
> > > size:c000
> > > [ 34.386436] fglrx_pci 0000:01:00.0: irq 51 for MSI/MSI-X
> > > [ 34.490902] <6>[fglrx] Firegl kernel thread PID: 321
> > > [ 34.490994] <6>[fglrx] Firegl kernel thread PID: 322
> > > [ 34.491069] <6>[fglrx] Firegl kernel thread PID: 323
> > > [ 34.491166] <6>[fglrx] IRQ 51 Enabled
> > > [ 34.505271] <6>[fglrx] Reserved FB block: Shared offset:0,
> > > size:1000000
> > > [ 34.505273] <6>[fglrx] Reserved FB block: Unshared offset:f9c3000,
> > > size:3000
> > > [ 34.505274] <6>[fglrx] Reserved FB block: Unshared offset:f9c6000,
> > > size:23a000
> > > [ 34.505276] <6>[fglrx] Reserved FB block: Unshared offset:fc00000,
> > > size:100000
> > > [ 34.505277] <6>[fglrx] Reserved FB block: Unshared offset:fff8000,
> > > size:8000
> > > [ 34.505278] <6>[fglrx] Reserved FB block: Unshared offset:ffff4000,
> > > size:c000
> > > [ 34.526198] BUG: unable to handle kernel paging request at
> > > ffff880c724e8008
> > > [ 34.526203] IP: [<ffffffffa0399af6>]
> > > TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx]
> > > [ 34.526277] PGD 1b3e067 PUD 0
> > > [ 34.526279] Oops: 0002 [#1] PREEMPT SMP
> > > [ 34.526282] Modules linked in: mousedev crct10dif_pclmul
> > > crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev
> > > aesni_intel snd_hda_codec_hdmi aes_x86_64 lrw gf128mul glue_helper
> > > ablk_helper cryptd snd_hda_intel microcode snd_hda_codec serio_raw
> > > psmouse parport_pc snd_hwdep snd_pcm parport snd_page_alloc processor
> > > snd_timer snd soundcore i2c_i801 intel_agp lpc_ich pcspkr intel_gtt
> > > i2c_core shpchp evdev fglrx(PO) amd_iommu_v2 button ext4 crc16 mbcache
> > > jbd2 atkbd libps2 virtio_blk virtio_net ahci libahci libata scsi_mod
> > > i8042 floppy serio virtio_pci virtio_ring virtio
> > > [ 34.526307] CPU: 1 PID: 316 Comm: X Tainted: P O
> > > 3.13.1-2-ARCH #1
> > > [ 34.526309] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > Bochs 01/01/2011
> > > [ 34.526311] task: ffff8800776e2d00 ti: ffff880037a28000 task.ti:
> > > ffff880037a28000
> > > [ 34.526312] RIP: 0010:[<ffffffffa0399af6>] [<ffffffffa0399af6>]
> > > TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx]
> > > [ 34.526353] RSP: 0018:ffff880037a29810 EFLAGS: 00010296
> > > [ 34.526354] RAX: 0000000000000001 RBX: ffff8800724e800c RCX:
> > > 0000000000000006
> > > [ 34.526356] RDX: 0000000000000003 RSI: 0000000000000002 RDI:
> > > ffff8800724e8264
> > > [ 34.526357] RBP: ffff88007b19a00c R08: 00000000000186a0 R09:
> > > 000000000001e848
> > > [ 34.526358] R10: 00000002fffffffd R11: 00000000ffffffff R12:
> > > 0000000000000001
> > > [ 34.526359] R13: ffff88007b19a00c R14: 0000000000000000 R15:
> > > ffff880037a298b0
> > > [ 34.526363] FS: 00007f0ba649b880(0000) GS:ffff88007fd00000(0000)
> > > knlGS:0000000000000000
> > > [ 34.526365] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > [ 34.526366] CR2: ffff880c724e8008 CR3: 0000000037998000 CR4:
> > > 00000000000406e0
> > > [ 34.526372] Stack:
> > > [ 34.526373] ffff88007b19a2f4 ffff88007bffcd1c 0000000000000001
> > > ffffffffa0322cf0
> > > [ 34.526375] 0000000000000000 0000000000000000 0000000000000000
> > > ffff880077ed2c08
> > > [ 34.526378] 0000000000000000 ffff880077ed2c08 ffff880037a298a0
> > > ffffffffa0327f14
> > > [ 34.526380] Call Trace:
> > > [ 34.526435] [<ffffffffa0322cf0>] ? PHM_DispatchTable+0xf0/0x220
> > > [fglrx]
> > > [ 34.526490] [<ffffffffa0327f14>] ?
> > > PECI_NotifyDALPreAdapterClockChange+0x144/0x160 [fglrx]
> > > [ 34.526546] [<ffffffffa031e321>] ? PHM_SetPowerState+0x31/0xc0 [fglrx]
> > > [ 34.526597] [<ffffffffa0340a5b>] ?
> > > PSM_ApplyHardwareAttributes_Dynamic+0x9b/0xf0 [fglrx]
> > > [ 34.526651] [<ffffffffa033fde9>] ?
> > > PSM_AdjustPowerState_Dynamic+0x169/0x540 [fglrx]
> > > [ 34.526668] [<ffffffffa0322cf0>] ? PHM_DispatchTable+0xf0/0x220
> > > [fglrx]
> > > [ 34.526668] [<ffffffffa0342ee4>] ? PEM_ExcuteEventChain+0x64/0xe0
> > > [fglrx]
> > > [ 34.526668] [<ffffffffa0341302>] ? PEM_HandleEvent+0x92/0xd0 [fglrx]
> > > [ 34.526668] [<ffffffffa03357c0>] ? PEM_CWDDEPM_NotifyEvent+0xe0/0x4d0
> > > [fglrx]
> > > [ 34.526668] [<ffffffffa0333869>] ? PP_Cwdde+0x109/0x180 [fglrx]
> > > [ 34.526668] [<ffffffffa02091dc>] ? firegl_pplib_cwddepm+0xbc/0x130
> > > [fglrx]
> > > [ 34.526668] [<ffffffffa02092d9>] ?
> > > firegl_pplib_notify_event+0x89/0xd0 [fglrx]
> > > [ 34.526668] [<ffffffffa020292f>] ? hal_init_gpu+0x2bf/0x480 [fglrx]
> > > [ 34.526668] [<ffffffffa01dcc7b>] ? firegl_open+0x2db/0x310 [fglrx]
> > > [ 34.526668] [<ffffffffa01cb287>] ? ip_firegl_open+0x17/0x20 [fglrx]
> > > [ 34.526668] [<ffffffffa01ccac8>] ? firegl_stub_open+0x98/0x100 [fglrx]
> > > [ 34.526668] [<ffffffff811a82bf>] ? chrdev_open+0x9f/0x1d0
> > > [ 34.526668] [<ffffffff811a1967>] ? do_dentry_open+0x1b7/0x2c0
> > > [ 34.526668] [<ffffffff811aed41>] ? __inode_permission+0x41/0xb0
> > > [ 34.526668] [<ffffffff811a8220>] ? cdev_put+0x30/0x30
> > > [ 34.526668] [<ffffffff811a1d91>] ? finish_open+0x31/0x40
> > > [ 34.526668] [<ffffffff811b1b72>] ? do_last+0x572/0xe90
> > > [ 34.526668] [<ffffffff811af036>] ? link_path_walk+0x236/0x8d0
> > > [ 34.526668] [<ffffffff811b254b>] ? path_openat+0xbb/0x6b0
> > > [ 34.526668] [<ffffffff811b3c6a>] ? do_filp_open+0x3a/0x90
> > > [ 34.526668] [<ffffffff811c0567>] ? __alloc_fd+0xa7/0x130
> > > [ 34.526668] [<ffffffff811a2f49>] ? do_sys_open+0x129/0x220
> > > [ 34.526668] [<ffffffff811a305e>] ? SyS_open+0x1e/0x20
> > > [ 34.526668] [<ffffffff8152136d>] ? system_call_fastpath+0x1a/0x1f
> > > [ 34.526668] Code: 8b 4a 1c 8b 93 e0 18 00 00 48 8d bb 58 02 00 00 85
> > > d2 0f 84 63 02 00 00 f6 c2 01 0f 84 20 01 00 00 44 8b 1b 41 ff cb 4f 8d
> > > 14 5b <46> 89 44 93 08 8b 95 3c 02 00 00 48 89 d0 48 c1 e8 07 a8 01 75
> > > [ 34.526668] RIP [<ffffffffa0399af6>]
> > > TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx]
> > > [ 34.526668] RSP <ffff880037a29810>
> > > [ 34.526668] CR2: ffff880c724e8008
> > > [ 34.526668] ---[ end trace 5431e6dcf1c31dea ]---
> > > [ 69.317528] type=1006 audit(1391649552.046:4): pid=324 uid=0 old
> > > auid=4294967295 new auid=0 old ses=4294967295 new ses=3 res=1
> > >
> > > I know it is the binary driver but I would also retry with radeon one but
> > > I believe there will be a similar crash. In my first try I just rebooted
> > > the Linux VM several times without starting X.
> > >
> > > I got it one time working without getting 'Clocksource tsc unstable' but
> > > now I'm unable to repeat it. So I believe something more is needed.
> >
> > Bus resets are a mixed blessing, it returns the card to a relatively
> > known state, but it's a fairly unusual event from a platform perspective
> > and we have no idea what kind of quirks the host system bios might have
> > in place to workaround hardware. If the bus is not fatal you might try
> > running lspci -vvv in the host at various points to see what changed.
> > For instance, boot a Linux guest to text mode and see if the card is in
> > the same state between first boot and second boot before starting X.
> > Thanks,
> >
>
> I tried the R9 290X separately now. You're right there are some changes
> between lspci -vvv output between 1st and 2nd boot and they are reset
> if I do "suspend-to-ram" and resume before 3rd boot of VM. Below is the
> lspci from 1st boot and the diffs of the lspci outputs:
>
> --- 001-lspci.290x.before.1st.log 2014-02-07 01:13:41.498827928 +0100
> +++ 002-lspci.290x.during.1st.before.X.log 2014-02-07 01:14:47.984612423
> +0100
> @@ -1,6 +1,6 @@
> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller])
> Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00
> - Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
> + Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 18
> @@ -19,7 +19,7 @@
> DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
> Unsupported-
> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> - DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr-
> TransPend-
> + DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr-
> TransPend-
> LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit
> Latency L0s <64ns, L1 <1us
> ClockPM- Surprise- LLActRep- BwNot-
> LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
> @@ -39,13 +39,13 @@
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> - CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> + CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> Capabilities: [270 v1] #19
> Capabilities: [2b0 v1] Address Translation Service (ATS)
> ATSCap: Invalidate Queue Depth: 00
> - ATSCtl: Enable-, Smallest Translation Unit: 00
> + ATSCtl: Enable+, Smallest Translation Unit: 00
> Capabilities: [2c0 v1] #13
> Capabilities: [2d0 v1] #1b
> Kernel driver in use: vfio-pci
>
> --- 002-lspci.290x.during.1st.before.X.log 2014-02-07 01:14:47.984612423
> +0100
> +++ 003-lspci.290x.during.1st.after.X.log 2014-02-07 01:16:29.644846503
> +0100
> @@ -1,9 +1,9 @@
> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller])
> Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00
> - Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B- DisINTx-
> + Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> - Interrupt: pin A routed to IRQ 18
> + Interrupt: pin A routed to IRQ 47
> Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
> Region 2: Memory at df800000 (64-bit, prefetchable) [size=8M]
> Region 4: I/O ports at be00 [size=256]
> @@ -17,14 +17,14 @@
> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1
> unlimited
> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
> Unsupported-
> - RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> + RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> - DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr-
> TransPend-
> + DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr-
> TransPend-
> LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit
> Latency L0s <64ns, L1 <1us
> ClockPM- Surprise- LLActRep- BwNot-
> LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> - LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+
> DLActive- BWMgmt- ABWMgmt-
> + LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
> DLActive- BWMgmt- ABWMgmt-
> DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-,
> OBFF Not Supported
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
> OBFF Disabled
> LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> @@ -32,8 +32,8 @@
> Compliance De-emphasis: -6dB
> LnkSta2: Current De-emphasis Level: -3.5dB,
> EqualizationComplete-, EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-,
> LinkEqualizationRequest-
> - Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> - Address: 0000000000000000 Data: 0000
> + Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> + Address: 00000000fee00000 Data: 0000
> Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1
> Len=010 <?>
> Capabilities: [150 v2] Advanced Error Reporting
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>
> Now I stopped X and powered down the VM and started 2nd cycle:
>
> --- 003-lspci.290x.during.1st.after.X.log 2014-02-07 01:16:29.644846503
> +0100
> +++ 004-lspci.290x.before.2nd.log 2014-02-07 01:16:50.966611282 +0100
> @@ -1,9 +1,9 @@
> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller])
> Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00
> - Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B- DisINTx+
> + Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> - Interrupt: pin A routed to IRQ 47
> + Interrupt: pin A routed to IRQ 18
> Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
> Region 2: Memory at df800000 (64-bit, prefetchable) [size=8M]
> Region 4: I/O ports at be00 [size=256]
> @@ -17,7 +17,7 @@
> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1
> unlimited
> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
> Unsupported-
> - RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> + RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr-
> TransPend-
> LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit
> Latency L0s <64ns, L1 <1us
> @@ -32,7 +32,7 @@
> Compliance De-emphasis: -6dB
> LnkSta2: Current De-emphasis Level: -3.5dB,
> EqualizationComplete-, EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-,
> LinkEqualizationRequest-
> - Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> + Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> Address: 00000000fee00000 Data: 0000
> Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1
> Len=010 <?>
> Capabilities: [150 v2] Advanced Error Reporting
> @@ -45,7 +45,7 @@
> Capabilities: [270 v1] #19
> Capabilities: [2b0 v1] Address Translation Service (ATS)
> ATSCap: Invalidate Queue Depth: 00
> - ATSCtl: Enable+, Smallest Translation Unit: 00
> + ATSCtl: Enable-, Smallest Translation Unit: 00
> Capabilities: [2c0 v1] #13
> Capabilities: [2d0 v1] #1b
> Kernel driver in use: vfio-pci
>
> --- 003-lspci.290x.during.1st.after.X.log 2014-02-07 01:16:29.644846503
> +0100
> +++ 004-lspci.290x.before.2nd.log 2014-02-07 01:16:50.966611282 +0100
> @@ -1,9 +1,9 @@
> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller])
> Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00
> - Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B- DisINTx+
> + Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> - Interrupt: pin A routed to IRQ 47
> + Interrupt: pin A routed to IRQ 18
> Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
> Region 2: Memory at df800000 (64-bit, prefetchable) [size=8M]
> Region 4: I/O ports at be00 [size=256]
> @@ -17,7 +17,7 @@
> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1
> unlimited
> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
> Unsupported-
> - RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> + RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr-
> TransPend-
> LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit
> Latency L0s <64ns, L1 <1us
> @@ -32,7 +32,7 @@
> Compliance De-emphasis: -6dB
> LnkSta2: Current De-emphasis Level: -3.5dB,
> EqualizationComplete-, EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-,
> LinkEqualizationRequest-
> - Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> + Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> Address: 00000000fee00000 Data: 0000
> Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1
> Len=010 <?>
> Capabilities: [150 v2] Advanced Error Reporting
> @@ -45,7 +45,7 @@
> Capabilities: [270 v1] #19
> Capabilities: [2b0 v1] Address Translation Service (ATS)
> ATSCap: Invalidate Queue Depth: 00
> - ATSCtl: Enable+, Smallest Translation Unit: 00
> + ATSCtl: Enable-, Smallest Translation Unit: 00
> Capabilities: [2c0 v1] #13
> Capabilities: [2d0 v1] #1b
> Kernel driver in use: vfio-pci
>
> --- 004-lspci.290x.before.2nd.log 2014-02-07 01:16:50.966611282 +0100
> +++ 005-lspci.290x.during.2nd.before.X.log 2014-02-07 01:17:55.571676376
> +0100
> @@ -1,6 +1,6 @@
> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller])
> Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00
> - Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
> + Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 18
> @@ -19,12 +19,12 @@
> DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
> Unsupported-
> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> - DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr-
> TransPend-
> + DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr-
> TransPend-
> LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit
> Latency L0s <64ns, L1 <1us
> ClockPM- Surprise- LLActRep- BwNot-
> LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> - LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
> DLActive- BWMgmt- ABWMgmt-
> + LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+
> DLActive- BWMgmt- ABWMgmt-
> DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-,
> OBFF Not Supported
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
> OBFF Disabled
> LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> @@ -33,7 +33,7 @@
> LnkSta2: Current De-emphasis Level: -3.5dB,
> EqualizationComplete-, EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-,
> LinkEqualizationRequest-
> Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> - Address: 00000000fee00000 Data: 0000
> + Address: 0000000000000000 Data: 0000
> Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1
> Len=010 <?>
> Capabilities: [150 v2] Advanced Error Reporting
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> @@ -45,7 +45,7 @@
> Capabilities: [270 v1] #19
> Capabilities: [2b0 v1] Address Translation Service (ATS)
> ATSCap: Invalidate Queue Depth: 00
> - ATSCtl: Enable-, Smallest Translation Unit: 00
> + ATSCtl: Enable+, Smallest Translation Unit: 00
> Capabilities: [2c0 v1] #13
> Capabilities: [2d0 v1] #1b
> Kernel driver in use: vfio-pci
>
> --- 005-lspci.290x.during.2nd.before.X.log 2014-02-07 01:17:55.571676376
> +0100
> +++ 006-lspci.290x.during.2nd.after.X.crash.log 2014-02-07
> 01:18:16.996855362 +0100
> @@ -1,9 +1,9 @@
> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller])
> Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00
> - Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B- DisINTx-
> + Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> - Interrupt: pin A routed to IRQ 18
> + Interrupt: pin A routed to IRQ 47
> Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
> Region 2: Memory at df800000 (64-bit, prefetchable) [size=8M]
> Region 4: I/O ports at be00 [size=256]
> @@ -17,9 +17,9 @@
> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1
> unlimited
> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
> Unsupported-
> - RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> + RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> - DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr-
> TransPend-
> + DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr-
> TransPend-
> LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit
> Latency L0s <64ns, L1 <1us
> ClockPM- Surprise- LLActRep- BwNot-
> LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
> @@ -32,8 +32,8 @@
> Compliance De-emphasis: -6dB
> LnkSta2: Current De-emphasis Level: -3.5dB,
> EqualizationComplete-, EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-,
> LinkEqualizationRequest-
> - Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> - Address: 0000000000000000 Data: 0000
> + Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> + Address: 00000000fee00000 Data: 0000
> Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1
> Len=010 <?>
> Capabilities: [150 v2] Advanced Error Reporting
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>
> Interesting is the diff between 1st and 2nd boot, so if I do the lspci
> prior to the booting. The only difference between 1st start and 2nd
> start are:
>
> --- 001-lspci.290x.before.1st.log 2014-02-07 01:13:41.498827928 +0100
> +++ 004-lspci.290x.before.2nd.log 2014-02-07 01:16:50.966611282 +0100
> @@ -24,7 +24,7 @@
> ClockPM- Surprise- LLActRep- BwNot-
> LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> - LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+
> DLActive- BWMgmt- ABWMgmt-
> + LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
> DLActive- BWMgmt- ABWMgmt-
> DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-,
> OBFF Not Supported
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
> OBFF Disabled
> LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> @@ -33,13 +33,13 @@
> LnkSta2: Current De-emphasis Level: -3.5dB,
> EqualizationComplete-, EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-,
> LinkEqualizationRequest-
> Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> - Address: 0000000000000000 Data: 0000
> + Address: 00000000fee00000 Data: 0000
> Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1
> Len=010 <?>
> Capabilities: [150 v2] Advanced Error Reporting
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> - CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> + CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> Capabilities: [270 v1] #19
>
> After that if I do suspend-to-ram / resume trick I have again lspci
> output from before 1st boot.
>
Another workaround where your patch works fine is to do the following:
#1 Start VM
#2 Start X
#3 Stop X
#4 rmmod fglrx
#5 poweroff
After this I'm able to restart the VM as many times as I want with boot
VGA, fglrx and X but obviously if the VM crashes I need to issue
"suspend-to-ram" / resume workaround. It looks like fglrx properly
disables the device if unloaded.
[ 36.081197] <6>[fglrx] IRQ 48 Disabled
[ 36.096488] <6>[fglrx] module unloaded - fglrx 13.35.5 [Jan 29 2014]
Should I retry it with radeon driver or with VFIO debug enabled?
> > Alex
> >
>
> --Maik
>
--Maik
- [Qemu-devel] Multi GPU passthrough via VFIO, Maik Broemme, 2014/02/05
- Re: [Qemu-devel] Multi GPU passthrough via VFIO, Alex Williamson, 2014/02/05
- Re: [Qemu-devel] Multi GPU passthrough via VFIO, Maik Broemme, 2014/02/05
- Re: [Qemu-devel] Multi GPU passthrough via VFIO, Alex Williamson, 2014/02/05
- Re: [Qemu-devel] Multi GPU passthrough via VFIO, Maik Broemme, 2014/02/05
- Re: [Qemu-devel] Multi GPU passthrough via VFIO, Maik Broemme, 2014/02/05
- Re: [Qemu-devel] Multi GPU passthrough via VFIO, Alex Williamson, 2014/02/05
- Re: [Qemu-devel] Multi GPU passthrough via VFIO, Maik Broemme, 2014/02/06
- Re: [Qemu-devel] Multi GPU passthrough via VFIO,
Maik Broemme <=
- Re: [Qemu-devel] Multi GPU passthrough via VFIO, Alex Williamson, 2014/02/07
- Re: [Qemu-devel] Multi GPU passthrough via VFIO, Maik Broemme, 2014/02/07
- Re: [Qemu-devel] Multi GPU passthrough via VFIO, Maik Broemme, 2014/02/13
- Re: [Qemu-devel] Multi GPU passthrough via VFIO, Alex Williamson, 2014/02/13
- Re: [Qemu-devel] Multi GPU passthrough via VFIO, Maik Broemme, 2014/02/14