[Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset

From: Kevin Wolf
Subject: [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset
Date: Thu, 19 Nov 2009 15:49:19 +0100
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20091014 Fedora/3.0-2.8.b4.fc11 Thunderbird/3.0b4

Hi Jan,

Am 19.11.2009 13:19, schrieb Jan Kiszka:
> (gdb) print ((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first 
> $5 = (struct QCowL2Meta *) 0xcb3568
> (gdb) print *((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first 
> $6 = {offset = 7417176064, n_start = 0, nb_available = 16, nb_clusters = 0, 
> depends_on = 0xcb3568, dependent_requests = {lh_first = 0x0}, next_in_flight 
> = {le_next = 0xcb3568, le_prev = 0xc4ebd8}}
> So next == first.

Oops. Doesn't sound quite right...

> Is something fiddling with cluster_allocs concurrently, e.g. some signal
> handler? Or what could cause this list corruption? Would it be enough to

Are there any specific signals you're thinking of? Related to block code
I can only think of SIGUSR2 and this one shouldn't call any block driver
functions directly. You're using aio=threads, I assume? (It's the default)

QLIST_FOREACH_SAFE shouldn't make a difference in this place as the loop
doesn't insert or remove any elements. If the list is corrupted now, I
think it would be corrupted with QLIST_FOREACH_SAFE as well - at best,
the endless loop would occur one call later.

The only way I see to get such a loop in a list is to re-insert an
element that already is part of the list. The only insert is at
qcow2-cluster.c:777. Remains the question how we came there twice
without run_dependent_requests() removing the L2Meta from our list first
- because this is definitely wrong...

Presumably, it's not reproducible?


