grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edk2] [PATCH 2/2] efinet: fix lost packets due to active MNP instan


From: Michael Chang
Subject: Re: [edk2] [PATCH 2/2] efinet: fix lost packets due to active MNP instances
Date: Tue, 21 Apr 2015 14:49:35 +0800
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Apr 21, 2015 at 02:12:54PM +0800, Michael Chang wrote:
> On Mon, Apr 20, 2015 at 02:30:00PM +0800, Michael Chang wrote:
> > On Sun, Apr 19, 2015 at 11:01:11AM +0300, Andrei Borzenkov wrote:
> > > EDK2 network stack is based on Managed Network Protocol which is layered
> > > on top of Simple Management Protocol and does background polling. This
> > > polling races with grub for received (and probably trasmitted) packets
> > > which causes either serious slowdown or complete failure to load files.
> > > 
> > > Additionaly PXE driver creates two child devices - IPv4 and IPv6 - with
> > > bound SNP instance as well. This means we get three cards for every
> > > physical adapter when enumerating. Not only is this confusing, this
> > > may result in grub ignoring packets that come in via the "wrong" card.
> > > 
> > > Example of device hierarchy is
> > > 
> > >  Ctrl[91] PciRoot(0x0)/Pci(0x3,0x0)
> > >    Ctrl[95] PciRoot(0x0)/Pci(0x3,0x0)/MAC(525400123456,0x1)
> > >      Ctrl[B4] 
> > > PciRoot(0x0)/Pci(0x3,0x0)/MAC(525400123456,0x1)/IPv4(0.0.0.0)
> > >      Ctrl[BC] 
> > > PciRoot(0x0)/Pci(0x3,0x0)/MAC(525400123456,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000)
> > > 
> > > Skip PXE created virtual devices and open SNP on base device exclusively.
> > > This destroys all child MNP instances and stops background polling.  We
> > > cannot do it for virtual devices anyway as PXE would probably attempt
> > > to destroy them when stopped and current OVMF crashes when we try to do 
> > > it.
> > > 
> > > Exclusive open cannot be done when enumerating cards, as it would destroy
> > > PXE information we need to autoconfigure interface; and it cannot be done
> > > during autoconfiguration as we need to do it for non-PXE boot as well. So
> > > move SNP open to card ->open method and add matching ->close to clean up.
> > > 
> > > References: https://savannah.gnu.org/bugs/?41731
> > > ---
> > >  grub-core/net/drivers/efi/efinet.c | 147 
> > > ++++++++++++++++++++++++++++---------
> > >  1 file changed, 112 insertions(+), 35 deletions(-)
> > > 
> > > diff --git a/grub-core/net/drivers/efi/efinet.c 
> > > b/grub-core/net/drivers/efi/efinet.c
> > > index f171f20..5777016 100644
> > > --- a/grub-core/net/drivers/efi/efinet.c
> > > +++ b/grub-core/net/drivers/efi/efinet.c
> > > @@ -142,9 +142,72 @@ get_card_packet (struct grub_net_card *dev)
> > >    return nb;
> > >  }
> > >  
> > > +static grub_err_t
> > > +open_card (struct grub_net_card *dev)
> > > +{
> > > +  grub_efi_simple_network_t *net;
> > 
> > I'm not sure about adding null check for dev->efi_net here is necessary
> > or not, as we don't close it in close_card so that reopening a closed
> > interface would make open_protocol fail with EFI_ACCESS_DENIED and in
> > turn makes the entire open_card funtion fail with GRUB_ERR_BAD_DEVICE.
> > 
> > > +
> > > +  net = grub_efi_open_protocol (dev->efi_handle, &net_io_guid,
> > > +                         GRUB_EFI_OPEN_PROTOCOL_BY_EXCLUSIVE);
> > > +  if (! net)
> > > +    /* This should not happen... Why?  */
> > > +    return GRUB_ERR_BAD_DEVICE;
> > > +
> > > +  if (net->mode->state == GRUB_EFI_NETWORK_STOPPED
> > > +      && efi_call_1 (net->start, net) != GRUB_EFI_SUCCESS)
> > > +    return GRUB_ERR_BAD_DEVICE;
> > > +
> > > +  if (net->mode->state == GRUB_EFI_NETWORK_STOPPED)
> > > +    return GRUB_ERR_BAD_DEVICE;
> > > +
> > > +  if (net->mode->state == GRUB_EFI_NETWORK_STARTED
> > > +      && efi_call_3 (net->initialize, net, 0, 0) != GRUB_EFI_SUCCESS)
> > > +    return GRUB_ERR_BAD_DEVICE;
> > > +
> > > +  dev->mtu = net->mode->max_packet_size;
> > > +
> > > +  dev->txbufsize = ALIGN_UP (dev->mtu, 64) + 256;
> > > +  dev->txbuf = grub_zalloc (dev->txbufsize);
> > > +  if (!dev->txbuf)
> > > +    {
> > > +      grub_print_error ();
> > > +      return GRUB_ERR_BAD_DEVICE;
> > > +    }
> > > +  dev->txbusy = 0;
> > > +
> > > +  dev->rcvbufsize = ALIGN_UP (dev->mtu, 64) + 256;
> > > +
> > > +  grub_memcpy (dev->default_address.mac,
> > > +        net->mode->current_address,
> > > +        sizeof (dev->default_address.mac));
> > > +
> > > +  dev->efi_net = net;
> > > +  return GRUB_ERR_NONE;
> > > +}
> > > +
> > > +static void
> > > +close_card (struct grub_net_card *dev)
> > > +{
> > > +  if (dev->txbuf)
> > > +    {
> > > +      grub_free (dev->txbuf);
> > > +      dev->txbuf = NULL;
> > > +    }
> > > +
> > > +  if (dev->rcvbuf)
> > > +    {
> > > +      grub_free (dev->rcvbuf);
> > > +      dev->rcvbuf = NULL;
> > > +    }
> > > +
> > > +  /* FIXME do we need to close Simple Network Protocol here? */
> > 
> > The question from me is why not? :)
> > 
> > If we don't close it, the consequence might prevent other application
> > (for eg, the chainloaded one from grub) from using managed network
> > protocol or related one ?
> > 
> > The rest of the patch looks good to me and a lot better than my
> > workaround patch. Thanks for you work on this.
> > 
> > I can give this patch a test to see if it fixed the slowness issue I
> > have also experienced in OVMF for IPv4 networking and also together with
> > my net_bootp6 patch.
> 
> The patch works well with IPv4 as expected, I verfied it to fix the
> slowness problem.
> 
> But with IPv6, there were some problems found.
> 
> 1. It failed with "packet too big" in grub_net_send_ip6_packet. The
> reason is the card not opened at that time so that the mtu is unset
> (zero). That makes the packet size testing against mtu "too big".
> 
> The call stack looks like.
> 
> grub_net_link_layer_resolve grub_net_icmp6_send_request
> grub_net_send_ip_packet grub_net_send_ip6_packet
>     
> Meanwhile the IPv4 call stack:
> 
> grub_net_link_layer_resolve grub_net_arp_send_request
> send_ethernet_packet
> 
> It calls send_ethernet_packet which do not have the mtu size check and
> also the open the card device if not opened.
> 
> 2. I tried to get it going by adding card open to
> grub_net_send_ip6_packet (), but experienced another problem as the link
> layer resolve timeout due to the network interface's hardware mac
> address unset (inf->hwaddress.mac). After some debugging I realized
> that's in my net_bootp6 patch, the network interface is added during
> handling of cached dhcpv6 reply packets using the hardware mac provided
> by the card instance, which is again not yet opened.:(
> 
> The reason why IPv4 works is because it doesn't use card instance's mac
> address but mac_addr from the bootp's ack packet.
> 
> It's possble to find mac_addr from the DHCPv6 Reply packets, but it's
> unreliable you will always have one. I think it's related to what DUID
> type is using in the CLIENT_ID option. The spec defines
> 
> DUID-LLT  1 DUID-EN   2 DUID-LL   3
> 
> With DUID-LLT, DUID-LL you can read the mac_addr (and type) but with
> DUID-EN you are out of luck. The final round made me surrender is that
> the cached reply packet in UEFI, the DUID seems to be undefined type
> "4"..
> 
> 3. Even I can add the card open earler before hadling the dhcpv6
> packets, it will freeze at grub_efi_open_protocol if the option in use
> is GRUB_EFI_OPEN_PROTOCOL_BY_EXCLUSIVE. I was actually using
> GRUB_EFI_OPEN_PROTOCOL_GET_PROTOCOL in the entire test and am running
> out of idea how to deal with that diversity.


Hi Andrey,

I checked other grub_net_card implementation (emucard, grub_pxe_card,
ofnet). The mtu and default_adree are determined during the device
enumeration but not open. I think it would be better to follow them as
those properties are needed quite early and that may also fix the
problem #1 and #2.

For #3, I'm not trying to get the log from OVMF ..

Thanks,
Michael



reply via email to

[Prev in Thread] Current Thread [Next in Thread]