Re: [Qemu-discuss] Qemu freeze on I/O intensive workload

qemu-discuss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-discuss] Qemu freeze on I/O intensive workload

From:	Jean-Tiare LE BIGOT
Subject:	Re: [Qemu-discuss] Qemu freeze on I/O intensive workload
Date:	Tue, 2 Oct 2018 12:24:22 +0200

On one of the frozen quest, I can see ATA and clock errors in the console
output:

>
> [    4.513829] input: ImExPS/2 Generic Explorer Mouse as
> /devices/platform/i8042/serio1/input/input3
> [   66.539190] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [   66.542582] ata1.00: failed command: FLUSH CACHE
> [   66.544780] ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 9
> [   66.544780]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [   66.549921] ata1.00: status: { DRDY }
> [   66.550952] ata1: hard resetting link
> [  103.850697] clocksource: timekeeping watchdog on CPU0: Marking
> clocksource 'tsc' as unstable because the skew is too large:
> [  103.958111] clocksource:                       'hpet' wd_now: 6db28ff8
> wd_last: 8f3cee23 mask: ffffffff
> [  103.960891] clocksource:                       'tsc' cs_now: 317cb4fc49
> cs_last: 20265917c3 mask: ffffffffffffffff
> [  103.969066] clocksource: Switched to clocksource hpet
> [  104.295987] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [  104.298393] ata1.00: configured for UDMA/100
> [  104.300329] ata1.00: retrying FLUSH 0xe7 Emask 0x4
> [  104.303472] ata1.00: device reported invalid CHS sector 0
> [  104.304884] ata1: EH complete
>

It looks like there is some 60s timeout followed by a 40s link reset. The
virtual hard drives are raw files stored locally on a LVM RAID 0 of 2 SSD
drives. Does it ring a bell ? Similar reports on Internet seems to be
related to remote disks and I'm not sure what to think about the clock skew
?

Any pointer would be greatly appreciated !

On Wed, 26 Sep 2018 at 19:17, Jean-Tiare LE BIGOT <
address@hidden> wrote:

> Hi,
>
> I am using Qemu in a test suite to virtualize 3 x86_64 machines in an
> isolated network. The end goal is to run integration tests on a Yocto
> generated distribution.
>
> In the setupe phase of the test suite, we start 4 Qemu instances in
> parallel with the "-daemonize" option and a global lock to prevent parralel
> start (until daemonized).
>
> The machine's configuration files start as follow (prior to variable
> replacement):
>
>>
>> #
>> # General setup
>> #
>>
>> [machine]
>>   accel = "kvm" # For acceleration
>>   type = "q35"  # For AHCI drive (sda instead of ide)
>>
>> [memory]
>>   size = "1G"
>>
>> #
>> # Control socket
>> #
>>
>> [chardev "monitor"]
>>   backend = "socket"
>>   server = "on"
>>   wait = "off"
>>   path = "%%MACHINE_DATA%%/monitor.sock"
>>
>> [mon]
>>   chardev = "monitor"
>>   mode = "control"
>>
>> #
>> # UEFI setup
>> #
>>
>> [drive]
>>   if = "pflash"
>>   format = "raw"
>>   readonly = "on"
>>   file = "%%MACHINE_DATA%%/OVMF_CODE.fd"
>>
>> [drive]
>>   if = "pflash"
>>   format = "raw"
>>   file = "%%MACHINE_DATA%%/OVMF_VARS.fd"
>>
>> #
>> # Harddrive and install ISO
>> #
>>
>> [drive]
>>   if = "ide"
>>   format = "raw"
>>   file = "%%MACHINE_DATA%%/sda.img"
>>
>> [drive]
>>   if = "ide"
>>   index = "2"
>>   format = "raw"
>>   file = "%%MACHINE_DATA%%/cdrom-rw.iso"
>>
>
> When started, the machines install themeselves. The install is based on a
> "dd" of a pre-generated ext4 image to the new partitions set.
>
> What we observe is, sometime, the virtualized machine freezes in the "dd"
> step. From the host side, we observe that the "voluntary_ctxt_switches" of
> the Qemu thread does not increase in 3 seconds suggesting that the emulated
> process is blocked.
>
> The stack trace (/proc/[Thread ID]/stack) is:
>
> [<ffffffffc055565a>] kvm_vcpu_block+0x8a/0x2f0 [kvm]
>> [<ffffffffc0571529>] kvm_arch_vcpu_ioctl_run+0x159/0x1620 [kvm]
>> [<ffffffffc0555146>] kvm_vcpu_ioctl+0x2a6/0x620 [kvm]
>> [<ffffffffaf287b55>] do_vfs_ioctl+0xa5/0x600
>> [<ffffffffaf288129>] SyS_ioctl+0x79/0x90
>> [<ffffffffaf89c0b7>] entry_SYSCALL_64_fastpath+0x1a/0xa5
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>
> We are using Qemu 3.0 with a kernel 4.13.9-300.fc27.x86_64 on the host.
> The guest is a Yocto Linux 4.9 with a custom stripped down configuration.
>
> We do not know where the freeze comes from. From what we observe, the
> freeze may come from the host kernel, the guest kernel or Qemu itself.
>
> How can we go further in the diagnose ? We can enable traces / patch / run
> under gdb / ...
>
> Thanks !
>
> --
> Jean-Tiare Le Bigot
>


-- 
Jean-Tiare Le Bigot

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-discuss] Qemu freeze on I/O intensive workload, Jean-Tiare LE BIGOT <=

Prev by Date: Re: [Qemu-discuss] [Qemu-devel] virtio-console downgrade the virtio-pci-blk performance
Next by Date: [Qemu-discuss] Suddenly the mouse and keyboard for all guests stopped being recognised
Previous by thread: Re: [Qemu-discuss] [Qemu-devel] virtio-console downgrade the virtio-pci-blk performance
Next by thread: [Qemu-discuss] Suddenly the mouse and keyboard for all guests stopped being recognised
Index(es):
- Date
- Thread