qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] net/tap.c: Possibly a way to stall tap input


From: Jan Kiszka
Subject: Re: [Qemu-devel] net/tap.c: Possibly a way to stall tap input
Date: Fri, 02 Aug 2013 18:49:11 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

On 2013-08-02 14:45, Jan Kiszka wrote:
> On 2013-08-02 13:46, Stefan Hajnoczi wrote:
>> On Thu, Aug 01, 2013 at 07:15:54PM +0200, Jan Kiszka wrote:
>>> I was digging into the involved code and found something fishy:
>>>
>>> net/tap.c:
>>> static void tap_send(void *opaque)
>>> {
>>>     ...
>>>         size = qemu_send_packet_async(&s->nc, buf, size,
>>>                                       tap_send_completed);
>>>         if (size == 0) {
>>>             tap_read_poll(s, false);
>>>         }
>>>
>>> So, if tap_send is registered for the mainloop polling (ie. can_receive
>>> returned true before starting to poll) but qemu_send_packet_async
>>> returns 0 now as qemu_can_send_packet/can_receive happens to report
>>> false in the meantime, we will disable read polling. If also write
>>> polling is off, the fd will be completely removed from the iohandler
>>> list. But even if write polling remains on, I wonder what should bring
>>> read polling back?
>>
>> This behavior seems fine to me.  Once the peer (pcnet) is able to
>> receive again it must flush the queue, this will re-enable
>> tap_read_poll().
>>
>> Can you explain a bit more why this would be a problem?
> 
> The problem is that I don't see at all what will call tap_read_poll(s,
> 1), neither in theory nor in reality.
> 
> As long as the real test case is out of reach, I tried to emulate the
> faulty behaviour by letting tap_can_send always return 1. Result:
> reception stalls during boot as even qemu_flush_queued_packets cannot
> get it running again once tap_read_poll(s, 0) was called.

OK, this is the bug: When a NIC becomes ready to send or receive again,
the issued qemu_flush_queued_packets will only flush queued packets that
are supposed to leave the NIC, none that may have been queued at the
output of the corresponding backend. For the case of hub-based setups,
we need to propagate this flush via the hub to all attached peers. This
flush will trigger the send callback of tap, and that will re-enable
receive polling.

So this is actually a generic bug that should theoretically affect any
user space NIC, with or without a hub in the middle. I'll cook up a fix,
play with it on Monday and share the outcome.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux



reply via email to

[Prev in Thread] Current Thread [Next in Thread]