[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Segfaults in chardev due to races
From: |
Peter Xu |
Subject: |
Re: [Qemu-devel] Segfaults in chardev due to races |
Date: |
Mon, 11 Feb 2019 14:13:57 +0800 |
User-agent: |
Mutt/1.10.1 (2018-07-13) |
On Wed, Feb 06, 2019 at 07:38:18PM +0100, Marc-André Lureau wrote:
> Hi
>
> On Wed, Jan 23, 2019 at 4:39 PM Max Reitz <address@hidden> wrote:
> >
> > On 22.12.18 10:17, Paolo Bonzini wrote:
> > > On 21/12/18 23:31, Max Reitz wrote:
> > >> I suppose the issue is that QMP events are sent by one thread, and
> > >> client disconnects are handled by a different one. So if a QMP event is
> > >> sent while a client disconnects concurrently, races may occur; and the
> > >> only protection against concurrent access appears to be the
> > >> chr_write_lock, which I don't think is enough.
> > >
> > > I think disconnection (tcp_chr_disconnect) has to take the
> > > chr_write_lock too.
> >
> > That seems to fix the issue for me (can also be reproduced by running
> > iotest 169 in parallel), but how should this be implemented? I suppose
> > tcp_chr_disconnect() can't really take the lock itself, because it's
> > called by tcp_chr_write() which is invoked with the lock held.
> >
> > Max
> >
>
> Is anybody fixing this? Paolo suggestion seems right in general, but
> not easy to implement correctly for all chardevs. (C isn't helping!)
>
> I think this can be treated as a regression from commit
> 8258292e18c39480b64eba9f3551ab772ce29b5d, OOB monitor is now enabled
> by default (on socket chardev).
So far I see that only TCP implemented chr_disconnect, does it mean
it's still doable?
I'm uncertain on whether the response queue idea could help too since
that idea is/was trying to keep all the chardev IO/control paths in
the same thread. When you wanted (and managed) to remove the response
queue I was uncertain on things like this - because if with the
response queues we're still making the chardev work very like single
thread (we only offload some works that must be done in main thread,
and let the two threads talk only via req/response queues, protected
by mutexes). Now we're doing it real multi-threading in many places.
So IMHO we either fix all multi-thread safety, or may we fall back? I
don't know much on chardev enough to say what is required to prove
100% multithread safety just like when I was working on oob, that's
why I chose the response queue and tried to keep safe. I'm unsure
whether that's still an alternative.
Thanks,
--
Peter Xu