qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [BUG] Guest OS hangs on boot when 64bit BAR present (kv


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [BUG] Guest OS hangs on boot when 64bit BAR present (kvm-apic -msi resource conflict)
Date: Wed, 13 Feb 2013 12:24:15 +0200

On Wed, Feb 13, 2013 at 06:06:37PM +1300, Alexey Korolev wrote:
> Sometime ago I reported an issue about guest OS hang when 64bit BAR present.
> http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg03189.html
> http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg00413.html
> 
> Some more investigation has been done, so in this post I'll try to explain 
> why it happens and offer possible solutions:
> 
> *When the issue happens*
> The issue occurs on Linux guest OS if kernel version <2.6.36
> A Guest OS hangs on boot when a 64bit PCI BAR is present in a system (if we 
> use ivshmem driver for example) and occupies range within first
> 4 GB.
> 
> *How to reproduce*
> I used the following qemu command to reproduce the case:
> /usr/local/bin/qemu-system-x86_64 -M pc-1.3 -enable-kvm -m 2000 -smp 
> 1,sockets=1,cores=1,threads=1 -name Rh5332 -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/Rh5332.monitor,server,nowait 
> -mon chardev=charmonitor,id=monitor,mode=readline -rtc
> base=utc -boot cd -drive 
> file=/home/akorolev/rh5332.img,if=none,id=drive-ide0-0-0,format=raw -device
> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -chardev 
> file,id=charserial0,path=/home/akorolev/serial.log -device
> isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga 
> cirrus -device ivshmem,shm,size=32M-device
> virtio-balloon-pci,id=balloon0
> 
> Tried different guests: Centos 5.8 64bit, RHEL 5.3 32bit, FC 12 64bit on all 
> machines hang occurs in 100% cases
> 
> *Why it happens*
> The issue basically comes from Linux PCI enumeration code.
> 
> The OS enumerates 64BIT bars when device is enabled using the following 
> procedure.
> 1. Write all FF's to lower half of 64bit BAR
> 2. Write address back to lower half of 64bit BAR
> 3. Write all FF's to higher half of 64bit BAR
> 4. Write address back to higher half of 64bit BAR
> 
> For qemu it means that  qemu pci_default_write_config() recevies all FFs for 
> lower part of the 64bit BAR.
> Then it applies the mask and converts the value to "All FF's - size + 1" 
> (FE000000 if size is 32MB).
> 
> So for short period of time the range [0xFE000000 - 0xFFFFFFFF] will be 
> occupied by ivshmem resource.
> For some reason it is lethal for further boot process.
> 
> We have found that boot process screws up completely if kvm-apic-msi range is 
> overlapped even for short period of time.  (We still don't
> know why it happens, hope that the qemu maintainers can answer?)
> 
> If we look at kvm-apic-msi memory region it is a non-overlapable memory 
> region with hardcoded address range [0xFEE00000 - 0xFEF00000].

Thanks for looking into this!

> Here is a log we collected from render_memory_regions:
> 
>  system overlap 0 pri 0 [0x0 - 0x7fffffffffffffff]
>      kvmvapic-rom overlap 1 pri 1000 [0xca000 - 0xcd000]
>          pc.ram overlap 0 pri 0 [0xca000 - 0xcd000]
>          ++ pc.ram [0xca000 - 0xcd000] is added to view
>      ....................
>      smram-region overlap 1 pri 1 [0xa0000 - 0xc0000]
>          pci overlap 0 pri 0 [0xa0000 - 0xc0000]
>              cirrus-lowmem-container overlap 1 pri 1 [0xa0000 - 0xc0000]
>                  cirrus-low-memory overlap 0 pri 0 [0xa0000 - 0xc0000]
>                 ++cirrus-low-memory [0xa0000 - 0xc0000] is added to view
>      kvm-ioapic overlap 0 pri 0 [0xfec00000 - 0xfec01000]
>     ++kvm-ioapic [0xfec00000 - 0xfec01000] is added to view
>      pci-hole64 overlap 0 pri 0 [0x100000000 - 0x4000000100000000]
>          pci overlap 0 pri 0 [0x100000000 - 0x4000000100000000]
>      pci-hole overlap 0 pri 0 [0x7d000000 - 0x100000000]

So we have ioapic and pci-hole which should be non-overlap,
actually overlap each other.
Isn't this a problem?

>          pci overlap 0 pri 0 [0x7d000000 - 0x100000000]
>              ivshmem-bar2-container overlap 1 pri 1 [0xfe000000 - 0x100000000]
>                  ivshmem.bar2 overlap 0 pri 0 [0xfe000000 - 0x100000000]
>                 ++ivshmem.bar2 [0xfe000000 - 0xfec00000] is added to view
>                 ++ivshmem.bar2  [0xfec01000 - 0x100000000] is added to view
>              ivshmem-mmio overlap 1 pri 1 [0xfebf1000 - 0xfebf1100]
>              e1000-mmio overlap 1 pri 1 [0xfeba0000 - 0xfebc0000]
>              cirrus-mmio overlap 1 pri 1 [0xfebf0000 - 0xfebf1000]
>              cirrus-pci-bar0 overlap 1 pri 1 [0xfa000000 - 0xfc000000]
>                  vga.vram overlap 1 pri 1 [0xfa000000 - 0xfa800000]
>                 ++vga.vram [0xfa000000 - 0xfa800000] is added to view
>                  cirrus-bitblt-mmio overlap 0 pri 0 [0xfb000000 - 0xfb400000]
>                 ++cirrus-bitblt-mmio [0xfb000000 - 0xfb400000] is added to 
> view
>                  cirrus-linear-io overlap 0 pri 0 [0xfa000000 - 0xfa800000]
>              pc.bios overlap 0 pri 0 [0xfffe0000 - 0x100000000]
>      ram-below-4g overlap 0 pri 0 [0x0 - 0x7d000000]
>          pc.ram overlap 0 pri 0 [0x0 - 0x7d000000]
>         ++pc.ram [0x0 - 0xa0000] is added to view
>         ++pc.ram [0x100000 - 0x7d000000] is added to view
>      kvm-apic-msi overlap 0 pri 0 [0xfee00000 - 0xfef00000]
> 
> As you can see from log the kvm-apic-msi is enumarated last when range 
> [0xfee00000 - 0xfef00000] is already occupied by ivshmem.bar2
> [0xfec01000 - 0x100000000].
> 
> 
> *Possible solutions*
> Solution 1. Probably the best would be adding the rule that regions which may 
> not be overlapped are added to view first (In in other words
> regions which must not be overlapped have the highest priority).  Please find 
> patch in the following message.
> 
> Solution 2. Raise priority of kvm-apic-msi resource. This is a bit misleading 
> solution, as priority is only applicable for overlap-able
> regions, but this region must not be overlapped.
> 
> Solution 3. Fix the issue at PCI level. Track if the resource is 64bit and 
> apply changes if both parts of 64bit BAR are programmed. (It
> appears that real PCI bus controllers are smart enough to track 64bit BAR 
> writes on PC, so qemu could do the same? Drawbacks are that
> tracking PCI writes is bit cumbersome, and such tracking may appear to 
> somebody as a hack)
> 
> 
> Alexey

I have to say I don't understand what does the overlap attribute
supposed to do, exactly.

In practice it currently seems to be ignored.

How about we drop it and rely exclusively on priorities?
It's probably easier to just give the apic high priority.
there's precedent - kvmvapic does:

memory_region_add_subregion_overlap(as, rom_paddr, &s->rom, 1000);

Jan, could you please clarify where did the value 1000 come from?

Maybe we need some predefined priority values in memory.h

-- 
MST



reply via email to

[Prev in Thread] Current Thread [Next in Thread]