qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Multi GPU passthrough via VFIO


From: Maik Broemme
Subject: Re: [Qemu-devel] Multi GPU passthrough via VFIO
Date: Thu, 6 Feb 2014 01:25:46 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Hi Alex,

Maik Broemme <address@hidden> wrote:
> > > > > Another minor issue is that the R9 290X is not reset during shutdown 
> > > > > of
> > > > > VM (neither Linux nor Windows) but it can be tricked with doing
> > > > > "suspend-to-ram" between two starts. That's why I use '-no-reboot' 
> > > > > option
> > > > > in QEMU. The 7870 is doing the reset properly.
> > > > 
> > > > 
> > > > Is the NoSoftRst "-" on the 290X vs "+" on the 7870 in lspci -vvv by
> > > > chance?  Thanks,
> > > > 
> > > 
> > > Here are both. It is funny it is opposite as you described. :)
> > 
> > 
> > Oops, yes.  Does this help?
> > 
> > --- a/hw/misc/vfio.c
> > +++ b/hw/misc/vfio.c
> > @@ -3136,7 +3136,7 @@ static void vfio_pci_reset_handler(void *opaque)
> >  
> >      QLIST_FOREACH(group, &group_list, next) {
> >          QLIST_FOREACH(vdev, &group->device_list, next) {
> > -            if (!vdev->reset_works || (!vdev->has_flr && 
> > vdev->has_pm_reset)) {
> > +            if (!vdev->reset_works || !vdev->has_flr) {
> >                  vdev->needs_reset = true;
> >              }
> >          }
> > 
> > I can't figure out why I coded it the way that I did.  Probably overly
> > targeting a specific device.  Thanks,
> > 
> 
> This patch works absolutely fine. After applying it to my 'qemu-git', the
> device resets works flawlessly. So it would be great to push it upstream
> as it looks good.
> 

Okay sorry. I was too fast here. It was just working first time but now
even after clean reboot it no longer works as expected but behavior
is very strange.

Windows:

  1st boot works fine - boot VGA and Windows ATI driver loaded, issue
      reboot and qemu stopped due to '-no-reboot'.

  2nd boot works partially - boot VGA and Windows ATI driver loaded but
      black screen and my system becames terrible slow and mostly
      unresponsive. My dmesg shows immediately after ATI driver will
      enable the device the following:

[  159.984324] vfio_ecap_init: 0000:01:00.0 hiding ecap address@hidden
[  159.984340] vfio_ecap_init: 0000:01:00.0 hiding ecap address@hidden
[  160.129036] vfio_ecap_init: 0000:02:00.0 hiding ecap address@hidden
[  160.129049] vfio_ecap_init: 0000:02:00.0 hiding ecap address@hidden
[  172.977677] kvm: zapping shadow pages for mmio generation wraparound
[  173.160174] br0: port 2(tap0) entered forwarding state
[  175.902967] vfio-pci 0000:01:00.0: irq 46 for MSI/MSI-X
[  188.340430] Clocksource tsc unstable (delta = -119654611 ns)
[  188.340511] Switched to clocksource hpet
[  191.088693] hpet1: lost 12 rtc interrupts
[  191.926555] hpet1: lost 25 rtc interrupts

  So your patch fixed indeed reset issue of boot VGA but something else
  is wrong now. :)

Linux (fglrx):

  1st boot works fine - boot VGA, fglrx loads fine and X could be
      started, issue reboot via SSH and qemu stopped due to
      '-no-reboot'.

  2nd boot works partially - boot VGA, fglrx loads fine but X couldn't
      be started and fails with:

[   34.265111] fglrx_pci 0000:02:00.0: irq 50 for MSI/MSI-X
[   34.344313] <6>[fglrx] Firegl kernel thread PID: 318
[   34.344400] <6>[fglrx] Firegl kernel thread PID: 319
[   34.344478] <6>[fglrx] Firegl kernel thread PID: 320
[   34.344589] <6>[fglrx] IRQ 50 Enabled
[   34.356105] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000 
[   34.356107] <6>[fglrx] Reserved FB block: Unshared offset:fac3000, size:3000 
[   34.356109] <6>[fglrx] Reserved FB block: Unshared offset:fac6000, 
size:23a000 
[   34.356110] <6>[fglrx] Reserved FB block: Unshared offset:7fff4000, 
size:c000 
[   34.386436] fglrx_pci 0000:01:00.0: irq 51 for MSI/MSI-X
[   34.490902] <6>[fglrx] Firegl kernel thread PID: 321
[   34.490994] <6>[fglrx] Firegl kernel thread PID: 322
[   34.491069] <6>[fglrx] Firegl kernel thread PID: 323
[   34.491166] <6>[fglrx] IRQ 51 Enabled
[   34.505271] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000 
[   34.505273] <6>[fglrx] Reserved FB block: Unshared offset:f9c3000, size:3000 
[   34.505274] <6>[fglrx] Reserved FB block: Unshared offset:f9c6000, 
size:23a000 
[   34.505276] <6>[fglrx] Reserved FB block: Unshared offset:fc00000, 
size:100000 
[   34.505277] <6>[fglrx] Reserved FB block: Unshared offset:fff8000, size:8000 
[   34.505278] <6>[fglrx] Reserved FB block: Unshared offset:ffff4000, 
size:c000 
[   34.526198] BUG: unable to handle kernel paging request at ffff880c724e8008
[   34.526203] IP: [<ffffffffa0399af6>] 
TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx]
[   34.526277] PGD 1b3e067 PUD 0 
[   34.526279] Oops: 0002 [#1] PREEMPT SMP 
[   34.526282] Modules linked in: mousedev crct10dif_pclmul crct10dif_common 
crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev aesni_intel 
snd_hda_codec_hdmi aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd 
snd_hda_intel microcode snd_hda_codec serio_raw psmouse parport_pc snd_hwdep 
snd_pcm parport snd_page_alloc processor snd_timer snd soundcore i2c_i801 
intel_agp lpc_ich pcspkr intel_gtt i2c_core shpchp evdev fglrx(PO) amd_iommu_v2 
button ext4 crc16 mbcache jbd2 atkbd libps2 virtio_blk virtio_net ahci libahci 
libata scsi_mod i8042 floppy serio virtio_pci virtio_ring virtio
[   34.526307] CPU: 1 PID: 316 Comm: X Tainted: P           O 3.13.1-2-ARCH #1
[   34.526309] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 
01/01/2011
[   34.526311] task: ffff8800776e2d00 ti: ffff880037a28000 task.ti: 
ffff880037a28000
[   34.526312] RIP: 0010:[<ffffffffa0399af6>]  [<ffffffffa0399af6>] 
TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx]
[   34.526353] RSP: 0018:ffff880037a29810  EFLAGS: 00010296
[   34.526354] RAX: 0000000000000001 RBX: ffff8800724e800c RCX: 0000000000000006
[   34.526356] RDX: 0000000000000003 RSI: 0000000000000002 RDI: ffff8800724e8264
[   34.526357] RBP: ffff88007b19a00c R08: 00000000000186a0 R09: 000000000001e848
[   34.526358] R10: 00000002fffffffd R11: 00000000ffffffff R12: 0000000000000001
[   34.526359] R13: ffff88007b19a00c R14: 0000000000000000 R15: ffff880037a298b0
[   34.526363] FS:  00007f0ba649b880(0000) GS:ffff88007fd00000(0000) 
knlGS:0000000000000000
[   34.526365] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   34.526366] CR2: ffff880c724e8008 CR3: 0000000037998000 CR4: 00000000000406e0
[   34.526372] Stack:
[   34.526373]  ffff88007b19a2f4 ffff88007bffcd1c 0000000000000001 
ffffffffa0322cf0
[   34.526375]  0000000000000000 0000000000000000 0000000000000000 
ffff880077ed2c08
[   34.526378]  0000000000000000 ffff880077ed2c08 ffff880037a298a0 
ffffffffa0327f14
[   34.526380] Call Trace:
[   34.526435]  [<ffffffffa0322cf0>] ? PHM_DispatchTable+0xf0/0x220 [fglrx]
[   34.526490]  [<ffffffffa0327f14>] ? 
PECI_NotifyDALPreAdapterClockChange+0x144/0x160 [fglrx]
[   34.526546]  [<ffffffffa031e321>] ? PHM_SetPowerState+0x31/0xc0 [fglrx]
[   34.526597]  [<ffffffffa0340a5b>] ? 
PSM_ApplyHardwareAttributes_Dynamic+0x9b/0xf0 [fglrx]
[   34.526651]  [<ffffffffa033fde9>] ? PSM_AdjustPowerState_Dynamic+0x169/0x540 
[fglrx]
[   34.526668]  [<ffffffffa0322cf0>] ? PHM_DispatchTable+0xf0/0x220 [fglrx]
[   34.526668]  [<ffffffffa0342ee4>] ? PEM_ExcuteEventChain+0x64/0xe0 [fglrx]
[   34.526668]  [<ffffffffa0341302>] ? PEM_HandleEvent+0x92/0xd0 [fglrx]
[   34.526668]  [<ffffffffa03357c0>] ? PEM_CWDDEPM_NotifyEvent+0xe0/0x4d0 
[fglrx]
[   34.526668]  [<ffffffffa0333869>] ? PP_Cwdde+0x109/0x180 [fglrx]
[   34.526668]  [<ffffffffa02091dc>] ? firegl_pplib_cwddepm+0xbc/0x130 [fglrx]
[   34.526668]  [<ffffffffa02092d9>] ? firegl_pplib_notify_event+0x89/0xd0 
[fglrx]
[   34.526668]  [<ffffffffa020292f>] ? hal_init_gpu+0x2bf/0x480 [fglrx]
[   34.526668]  [<ffffffffa01dcc7b>] ? firegl_open+0x2db/0x310 [fglrx]
[   34.526668]  [<ffffffffa01cb287>] ? ip_firegl_open+0x17/0x20 [fglrx]
[   34.526668]  [<ffffffffa01ccac8>] ? firegl_stub_open+0x98/0x100 [fglrx]
[   34.526668]  [<ffffffff811a82bf>] ? chrdev_open+0x9f/0x1d0
[   34.526668]  [<ffffffff811a1967>] ? do_dentry_open+0x1b7/0x2c0
[   34.526668]  [<ffffffff811aed41>] ? __inode_permission+0x41/0xb0
[   34.526668]  [<ffffffff811a8220>] ? cdev_put+0x30/0x30
[   34.526668]  [<ffffffff811a1d91>] ? finish_open+0x31/0x40
[   34.526668]  [<ffffffff811b1b72>] ? do_last+0x572/0xe90
[   34.526668]  [<ffffffff811af036>] ? link_path_walk+0x236/0x8d0
[   34.526668]  [<ffffffff811b254b>] ? path_openat+0xbb/0x6b0
[   34.526668]  [<ffffffff811b3c6a>] ? do_filp_open+0x3a/0x90
[   34.526668]  [<ffffffff811c0567>] ? __alloc_fd+0xa7/0x130
[   34.526668]  [<ffffffff811a2f49>] ? do_sys_open+0x129/0x220
[   34.526668]  [<ffffffff811a305e>] ? SyS_open+0x1e/0x20
[   34.526668]  [<ffffffff8152136d>] ? system_call_fastpath+0x1a/0x1f
[   34.526668] Code: 8b 4a 1c 8b 93 e0 18 00 00 48 8d bb 58 02 00 00 85 d2 0f 
84 63 02 00 00 f6 c2 01 0f 84 20 01 00 00 44 8b 1b 41 ff cb 4f 8d 14 5b <46> 89 
44 93 08 8b 95 3c 02 00 00 48 89 d0 48 c1 e8 07 a8 01 75 
[   34.526668] RIP  [<ffffffffa0399af6>] 
TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx]
[   34.526668]  RSP <ffff880037a29810>
[   34.526668] CR2: ffff880c724e8008
[   34.526668] ---[ end trace 5431e6dcf1c31dea ]---
[   69.317528] type=1006 audit(1391649552.046:4): pid=324 uid=0 old 
auid=4294967295 new auid=0 old ses=4294967295 new ses=3 res=1

I know it is the binary driver but I would also retry with radeon one but
I believe there will be a similar crash. In my first try I just rebooted
the Linux VM several times without starting X.

I got it one time working without getting 'Clocksource tsc unstable' but
now I'm unable to repeat it. So I believe something more is needed.

> > Alex
> > 
> 
> --Maik
> 

--Maik



reply via email to

[Prev in Thread] Current Thread [Next in Thread]