[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edk2] [PATCH 2/2] efinet: fix lost packets due to active MNP instan
From: |
Michael Chang |
Subject: |
Re: [edk2] [PATCH 2/2] efinet: fix lost packets due to active MNP instances |
Date: |
Tue, 21 Apr 2015 14:49:35 +0800 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Tue, Apr 21, 2015 at 02:12:54PM +0800, Michael Chang wrote:
> On Mon, Apr 20, 2015 at 02:30:00PM +0800, Michael Chang wrote:
> > On Sun, Apr 19, 2015 at 11:01:11AM +0300, Andrei Borzenkov wrote:
> > > EDK2 network stack is based on Managed Network Protocol which is layered
> > > on top of Simple Management Protocol and does background polling. This
> > > polling races with grub for received (and probably trasmitted) packets
> > > which causes either serious slowdown or complete failure to load files.
> > >
> > > Additionaly PXE driver creates two child devices - IPv4 and IPv6 - with
> > > bound SNP instance as well. This means we get three cards for every
> > > physical adapter when enumerating. Not only is this confusing, this
> > > may result in grub ignoring packets that come in via the "wrong" card.
> > >
> > > Example of device hierarchy is
> > >
> > > Ctrl[91] PciRoot(0x0)/Pci(0x3,0x0)
> > > Ctrl[95] PciRoot(0x0)/Pci(0x3,0x0)/MAC(525400123456,0x1)
> > > Ctrl[B4]
> > > PciRoot(0x0)/Pci(0x3,0x0)/MAC(525400123456,0x1)/IPv4(0.0.0.0)
> > > Ctrl[BC]
> > > PciRoot(0x0)/Pci(0x3,0x0)/MAC(525400123456,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000)
> > >
> > > Skip PXE created virtual devices and open SNP on base device exclusively.
> > > This destroys all child MNP instances and stops background polling. We
> > > cannot do it for virtual devices anyway as PXE would probably attempt
> > > to destroy them when stopped and current OVMF crashes when we try to do
> > > it.
> > >
> > > Exclusive open cannot be done when enumerating cards, as it would destroy
> > > PXE information we need to autoconfigure interface; and it cannot be done
> > > during autoconfiguration as we need to do it for non-PXE boot as well. So
> > > move SNP open to card ->open method and add matching ->close to clean up.
> > >
> > > References: https://savannah.gnu.org/bugs/?41731
> > > ---
> > > grub-core/net/drivers/efi/efinet.c | 147
> > > ++++++++++++++++++++++++++++---------
> > > 1 file changed, 112 insertions(+), 35 deletions(-)
> > >
> > > diff --git a/grub-core/net/drivers/efi/efinet.c
> > > b/grub-core/net/drivers/efi/efinet.c
> > > index f171f20..5777016 100644
> > > --- a/grub-core/net/drivers/efi/efinet.c
> > > +++ b/grub-core/net/drivers/efi/efinet.c
> > > @@ -142,9 +142,72 @@ get_card_packet (struct grub_net_card *dev)
> > > return nb;
> > > }
> > >
> > > +static grub_err_t
> > > +open_card (struct grub_net_card *dev)
> > > +{
> > > + grub_efi_simple_network_t *net;
> >
> > I'm not sure about adding null check for dev->efi_net here is necessary
> > or not, as we don't close it in close_card so that reopening a closed
> > interface would make open_protocol fail with EFI_ACCESS_DENIED and in
> > turn makes the entire open_card funtion fail with GRUB_ERR_BAD_DEVICE.
> >
> > > +
> > > + net = grub_efi_open_protocol (dev->efi_handle, &net_io_guid,
> > > + GRUB_EFI_OPEN_PROTOCOL_BY_EXCLUSIVE);
> > > + if (! net)
> > > + /* This should not happen... Why? */
> > > + return GRUB_ERR_BAD_DEVICE;
> > > +
> > > + if (net->mode->state == GRUB_EFI_NETWORK_STOPPED
> > > + && efi_call_1 (net->start, net) != GRUB_EFI_SUCCESS)
> > > + return GRUB_ERR_BAD_DEVICE;
> > > +
> > > + if (net->mode->state == GRUB_EFI_NETWORK_STOPPED)
> > > + return GRUB_ERR_BAD_DEVICE;
> > > +
> > > + if (net->mode->state == GRUB_EFI_NETWORK_STARTED
> > > + && efi_call_3 (net->initialize, net, 0, 0) != GRUB_EFI_SUCCESS)
> > > + return GRUB_ERR_BAD_DEVICE;
> > > +
> > > + dev->mtu = net->mode->max_packet_size;
> > > +
> > > + dev->txbufsize = ALIGN_UP (dev->mtu, 64) + 256;
> > > + dev->txbuf = grub_zalloc (dev->txbufsize);
> > > + if (!dev->txbuf)
> > > + {
> > > + grub_print_error ();
> > > + return GRUB_ERR_BAD_DEVICE;
> > > + }
> > > + dev->txbusy = 0;
> > > +
> > > + dev->rcvbufsize = ALIGN_UP (dev->mtu, 64) + 256;
> > > +
> > > + grub_memcpy (dev->default_address.mac,
> > > + net->mode->current_address,
> > > + sizeof (dev->default_address.mac));
> > > +
> > > + dev->efi_net = net;
> > > + return GRUB_ERR_NONE;
> > > +}
> > > +
> > > +static void
> > > +close_card (struct grub_net_card *dev)
> > > +{
> > > + if (dev->txbuf)
> > > + {
> > > + grub_free (dev->txbuf);
> > > + dev->txbuf = NULL;
> > > + }
> > > +
> > > + if (dev->rcvbuf)
> > > + {
> > > + grub_free (dev->rcvbuf);
> > > + dev->rcvbuf = NULL;
> > > + }
> > > +
> > > + /* FIXME do we need to close Simple Network Protocol here? */
> >
> > The question from me is why not? :)
> >
> > If we don't close it, the consequence might prevent other application
> > (for eg, the chainloaded one from grub) from using managed network
> > protocol or related one ?
> >
> > The rest of the patch looks good to me and a lot better than my
> > workaround patch. Thanks for you work on this.
> >
> > I can give this patch a test to see if it fixed the slowness issue I
> > have also experienced in OVMF for IPv4 networking and also together with
> > my net_bootp6 patch.
>
> The patch works well with IPv4 as expected, I verfied it to fix the
> slowness problem.
>
> But with IPv6, there were some problems found.
>
> 1. It failed with "packet too big" in grub_net_send_ip6_packet. The
> reason is the card not opened at that time so that the mtu is unset
> (zero). That makes the packet size testing against mtu "too big".
>
> The call stack looks like.
>
> grub_net_link_layer_resolve grub_net_icmp6_send_request
> grub_net_send_ip_packet grub_net_send_ip6_packet
>
> Meanwhile the IPv4 call stack:
>
> grub_net_link_layer_resolve grub_net_arp_send_request
> send_ethernet_packet
>
> It calls send_ethernet_packet which do not have the mtu size check and
> also the open the card device if not opened.
>
> 2. I tried to get it going by adding card open to
> grub_net_send_ip6_packet (), but experienced another problem as the link
> layer resolve timeout due to the network interface's hardware mac
> address unset (inf->hwaddress.mac). After some debugging I realized
> that's in my net_bootp6 patch, the network interface is added during
> handling of cached dhcpv6 reply packets using the hardware mac provided
> by the card instance, which is again not yet opened.:(
>
> The reason why IPv4 works is because it doesn't use card instance's mac
> address but mac_addr from the bootp's ack packet.
>
> It's possble to find mac_addr from the DHCPv6 Reply packets, but it's
> unreliable you will always have one. I think it's related to what DUID
> type is using in the CLIENT_ID option. The spec defines
>
> DUID-LLT 1 DUID-EN 2 DUID-LL 3
>
> With DUID-LLT, DUID-LL you can read the mac_addr (and type) but with
> DUID-EN you are out of luck. The final round made me surrender is that
> the cached reply packet in UEFI, the DUID seems to be undefined type
> "4"..
>
> 3. Even I can add the card open earler before hadling the dhcpv6
> packets, it will freeze at grub_efi_open_protocol if the option in use
> is GRUB_EFI_OPEN_PROTOCOL_BY_EXCLUSIVE. I was actually using
> GRUB_EFI_OPEN_PROTOCOL_GET_PROTOCOL in the entire test and am running
> out of idea how to deal with that diversity.
Hi Andrey,
I checked other grub_net_card implementation (emucard, grub_pxe_card,
ofnet). The mtu and default_adree are determined during the device
enumeration but not open. I think it would be better to follow them as
those properties are needed quite early and that may also fix the
problem #1 and #2.
For #3, I'm not trying to get the log from OVMF ..
Thanks,
Michael
- [PATCH 1/2] efidisk: move device path helpers in core for efinet, Andrei Borzenkov, 2015/04/19
- Re: [edk2] [PATCH 2/2] efinet: fix lost packets due to active MNP instances, Andrei Borzenkov, 2015/04/25
- Re: [edk2] [PATCH 2/2] efinet: fix lost packets due to active MNP instances, Michael Chang, 2015/04/27
- Re: [edk2] [PATCH 2/2] efinet: fix lost packets due to active MNP instances, Vladimir 'φ-coder/phcoder' Serbinenko, 2015/04/29
- Re: [edk2] [PATCH 2/2] efinet: fix lost packets due to active MNP instances, Andrei Borzenkov, 2015/04/26
- Re: [edk2] [PATCH 2/2] efinet: fix lost packets due to active MNP instances, Andrei Borzenkov, 2015/04/26
- Re: [edk2] [PATCH 2/2] efinet: fix lost packets due to active MNP instances, Michael Chang, 2015/04/27
- Re: [edk2] [PATCH 2/2] efinet: fix lost packets due to active MNP instances, Michael Chang, 2015/04/28