[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] test-filter-mirror hangs

From: Markus Armbruster
Subject: Re: [Qemu-devel] test-filter-mirror hangs
Date: Fri, 25 Jan 2019 09:44:16 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Jason Wang <address@hidden> writes:

> On 2019/1/25 下午3:12, Markus Armbruster wrote:
>> Jason Wang <address@hidden> writes:
>>> On 2019/1/24 下午5:47, Markus Armbruster wrote:
>>>> Please cc: me on QMP issues.
>>> Ok.
>>>> Jason Wang <address@hidden> writes:
>>>>> On 2019/1/24 上午3:53, Dr. David Alan Gilbert wrote:
>>>>>> * Jason Wang (address@hidden) wrote:
>>>>>>> On 2019/1/22 上午2:56, Peter Maydell wrote:
>>>>>>>> On Thu, 17 Jan 2019 at 09:46, Jason Wang<address@hidden>  wrote:
>>>>>>>>> On 2019/1/15 上午12:33, Zhang Chen wrote:
>>>>>>>>>> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
>>>>>>>>>> <address@hidden  <mailto:address@hidden>> wrote:
>>>>>>>>>>         * Peter Maydell (address@hidden
>>>>>>>>>>         <mailto:address@hidden>) wrote:
>>>>>>>>>>         > Recently I've noticed that test-filter-mirror has been 
>>>>>>>>>> hanging
>>>>>>>>>>         > intermittently, typically when run on some other TCG 
>>>>>>>>>> architecture.
>>>>>>>>>>         > In the instance I've just looked at, this was with s390x 
>>>>>>>>>> guest on
>>>>>>>>>>         > x86-64 host, though I've also seen it on other host archs 
>>>>>>>>>> and
>>>>>>>>>>         > perhaps with other guests.
>>>>>>>>>>         Watch out to see if you really do see it for other guests;
>>>>>>>>>>         it carefully avoids using virtio-net to avoid vhost; but on 
>>>>>>>>>> s390x it
>>>>>>>>>>         uses virtio-net-ccw - could that hit the vhost it was trying 
>>>>>>>>>> to avoid?
>>>>>>>>>>         > Below is a backtrace, though it seems to be pretty 
>>>>>>>>>> unhelpful.
>>>>>>>>>>         > Anybody got any theories ? Does the mirror test rely on 
>>>>>>>>>> dirty
>>>>>>>>>>         > memory bitmaps like the migration test (which also hangs
>>>>>>>>>>         > occasionally with TCG due to some bug I'm sure we've 
>>>>>>>>>> investigated
>>>>>>>>>>         > in the past) ?
>>>>>>>>>>         I don't think it relies on the CPU at all.
>>>>>>>>>>      I have no idea about this currently, but Jason and I designed 
>>>>>>>>>> the
>>>>>>>>>> test case.
>>>>>>>>>> Add Jason: Have any comments about this ?
>>>>>>>>> I can't reproduce this locally with s390x-softmmu. It looks to me the
>>>>>>>>> test should be independent to any kinds of emulation. It should pass
>>>>>>>>> when mainloop work.
>>>>>>>> I've just seen a hang with ppc64 guest on s390x host, so it is
>>>>>>>> indeed not specific to s390x guest (and so not specific to
>>>>>>>> virtio-net either, since the ppc64 guest setup uses e1000).
>>>>>>>> thanks
>>>>>>>> -- PMM
>>>>>>> Finally reproduced locally after hundreds (sometimes thousands) times of
>>>>>>> running.
>>>>>>> Bisection points to OOB monitor[1].
>>>>>>> It looks to me after OOB is used unconditionally we lose a barrier to 
>>>>>>> make
>>>>>>> sure socket is connected before sending packets in 
>>>>>>> test-filter-mirror.c. Is
>>>>>>> there any other similar and simple thing that we could do to kick the
>>>>>>> mainloop?
>>>>>> Do you mean the:
>>>>>>        /* send a qmp command to guarantee that 'connected' is setting to 
>>>>>> true. */
>>>>>>        qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
>>>>> Yes.
>>>>>> why was that ever sufficient to know the socket was ready?
>>>>> It was suggested by Fam, I don't remember the details. Can we make
>>>>> sure all pending events has been processed (UNIX socket was set to
>>>>> connected) after query-status is returned with an non OOB monitor?
>>>> I'm afraid I lack context.  Which socket are you talking about?  The
>>>> test has at least the QMP socket, the send_sock[], and recv_sock.  What
>>>> exactly are you trying to accomplish?
>>> I mean recv_sock. If mirror tries to send a packet to it before its
>>> is_connected is set to true, packet will be dropped.
>> So the *socket* is connected (in the TCP sense),
> UNIX domain socket actually in the case of this test.


>> but something else
>> (whatever owns is_connected) is not.  Can you point me to where
>> is_connected is set to true?
> Sorry, should be "connected". It was set in tcp_chr_connect(). So if
> filter want to send a packet to socket chardev before
> tcp_chr_connect() is called, the packet will be dropped silently by
> tcp_chr_write(). This will fail this unit-test.

Aha: the thing that isn't connected is the character device.

>>>> By the way, mkstemp(sock_path) followed by unix_connect(sock_path, NULL)
>>>> looks rather fishy.  Why create a temporary file only to create a Unix
>>>> domain socket right over it?
>>> I vaguely remember passing fd created by unix domain socket doesn't
>>> work when the test is introduced. So my understanding is the author
>>> needs a way to create a unique file name which will be used b Unix
>>> domain socket at that time.
>> We should really, really, really improve the test harness to run each
>> test program in its very own temporary directory.  Then tests can simply
>> create files with fixed names, and leave cleanup to the test harness.
> Agree, but for this test, since passing fd works now. I tend to using
> socketpair().

Resources that don't require manual cleanup (such as file descriptors
obtained with socketpair() or pipe()) are the best choice when they


reply via email to

[Prev in Thread] Current Thread [Next in Thread]