[Qemu-devel] Endless loop in qcow2_alloc_cluster_offset

From: Jan Kiszka
Subject: [Qemu-devel] Endless loop in qcow2_alloc_cluster_offset
Date: Thu, 19 Nov 2009 13:19:55 +0100
I just managed to push a qemu-kvm process (git rev. b496fe3431) into an
endless loop in qcow2_alloc_cluster_offset, namely over
QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight):

(gdb) bt
#0  0x000000000048614b in qcow2_alloc_cluster_offset (bs=0xc4e1d0, 
offset=7417184256, n_start=0, n_end=16, num=0xcb351c, m=0xcb3568) at 
#1  0x00000000004828d0 in qcow_aio_write_cb (opaque=0xcb34d0, ret=0) at 
#2  0x0000000000482a44 in qcow_aio_writev (bs=<value optimized out>, 
sector_num=<value optimized out>, qiov=<value optimized out>, nb_sectors=<value 
optimized out>, cb=<value optimized out>, opaque=<value optimized out>) at 
#3  0x0000000000470e89 in bdrv_aio_writev (bs=0xc4e1d0, sector_num=2, 
qiov=0x7f48a9010ed0, nb_sectors=16, cb=0x470d20 <bdrv_rw_em_cb>, 
opaque=0x7f48a9010f0c) at /data/qemu-kvm/block.c:1362
#4  0x0000000000472991 in bdrv_write_em (bs=0xc4e1d0, sector_num=14486688, 
buf=0xd67200 "H\a", nb_sectors=16) at /data/qemu-kvm/block.c:1736
#5  0x0000000000435581 in ide_sector_write (s=0xc92650) at 
#6  0x0000000000425fc2 in kvm_handle_io (env=<value optimized out>) at 
#7  kvm_run (env=<value optimized out>) at /data/qemu-kvm/qemu-kvm.c:964
#8  0x0000000000426049 in kvm_cpu_exec (env=0x1000) at 
#9  0x000000000042627d in kvm_main_loop_cpu (_env=<value optimized out>) at 
#10 ap_main_loop (_env=<value optimized out>) at /data/qemu-kvm/qemu-kvm.c:1943
#11 0x00007f48ae89d070 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f48abf0711d in clone () from /lib64/libc.so.6
#13 0x0000000000000000 in ?? ()
(gdb) print ((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first 
$5 = (struct QCowL2Meta *) 0xcb3568
(gdb) print *((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first 
$6 = {offset = 7417176064, n_start = 0, nb_available = 16, nb_clusters = 0, 
depends_on = 0xcb3568, dependent_requests = {lh_first = 0x0}, next_in_flight = 
{le_next = 0xcb3568, le_prev = 0xc4ebd8}}

So next == first.

Is something fiddling with cluster_allocs concurrently, e.g. some signal
handler? Or what could cause this list corruption? Would it be enough to


