qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net


From: Yuri Benditovich
Subject: Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Date: Wed, 4 Nov 2020 13:49:05 +0200



On Wed, Nov 4, 2020 at 4:08 AM Jason Wang <jasowang@redhat.com> wrote:

On 2020/11/3 下午6:32, Yuri Benditovich wrote:
>
>
> On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
> <mailto:jasowang@redhat.com>> wrote:
>
>
>     On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
>     > Basic idea is to use eBPF to calculate and steer packets in TAP.
>     > RSS(Receive Side Scaling) is used to distribute network packets
>     to guest virtqueues
>     > by calculating packet hash.
>     > eBPF RSS allows us to use RSS with vhost TAP.
>     >
>     > This set of patches introduces the usage of eBPF for packet steering
>     > and RSS hash calculation:
>     > * RSS(Receive Side Scaling) is used to distribute network packets to
>     > guest virtqueues by calculating packet hash
>     > * eBPF RSS suppose to be faster than already existing 'software'
>     > implementation in QEMU
>     > * Additionally adding support for the usage of RSS with vhost
>     >
>     > Supported kernels: 5.8+
>     >
>     > Implementation notes:
>     > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
>     > Added eBPF support to qemu directly through a system call, see the
>     > bpf(2) for details.
>     > The eBPF program is part of the qemu and presented as an array
>     of bpf
>     > instructions.
>     > The program can be recompiled by provided Makefile.ebpf(need to
>     adjust
>     > 'linuxhdrs'),
>     > although it's not required to build QEMU with eBPF support.
>     > Added changes to virtio-net and vhost, primary eBPF RSS is used.
>     > 'Software' RSS used in the case of hash population and as a
>     fallback option.
>     > For vhost, the hash population feature is not reported to the guest.
>     >
>     > Please also see the documentation in PATCH 6/6.
>     >
>     > I am sending those patches as RFC to initiate the discussions
>     and get
>     > feedback on the following points:
>     > * Fallback when eBPF is not supported by the kernel
>
>
>     Yes, and it could also a lacking of CAP_BPF.
>
>
>     > * Live migration to the kernel that doesn't have eBPF support
>
>
>     Is there anything that we needs special treatment here?
>
> Possible case: rss=on, vhost=on, source system with kernel 5.8
> (everything works) -> dest. system 5.6 (bpf does not work), the
> adapter functions, but all the steering does not use proper queues.


Right, I think we need to disable vhost on dest.


Is this acceptable to disable vhost at time of migration? 
 
>
>
>
>     > * Integration with current QEMU build
>
>
>     Yes, a question here:
>
>     1) Any reason for not using libbpf, e.g it has been shipped with some
>     distros
>
>
> We intentionally do not use libbpf, as it present only on some distros.
> We can switch to libbpf, but this will disable bpf if libbpf is not
> installed


That's better I think.

We think the preferred way is to have an eBPF code built-in in QEMU (not distribute it as a separate file).

Our initial idea was to not use the libbpf because it:
1. Does not create additional dependency during build time and during run-time
2. Gives us smaller footprint of loadable eBPF blob inside qemu
3. Do not add too much code to QEMU

We can switch to libbpf, in this case:
1. Presence of dynamic library is not guaranteed on the target system
2. Static library is large
3. libbpf uses eBPF ELF which is significantly bigger than just the array or instructions (May be we succeed to reduce the ELF to some suitable size and still have it built-in)

Please let us know whether you still think libbpf is better and why. 

Thanks



>     2) It would be better if we can avoid shipping bytecodes
>
>
>
> This creates new dependencies: llvm + clang + ...
> We would prefer byte code and ability to generate it if prerequisites
> are installed.


It's probably ok if we treat the bytecode as a kind of firmware.

But in the long run, it's still worthwhile consider the qemu source is
used for development and llvm/clang should be a common requirement for
generating eBPF bytecode for host.


>
>
>     > * Additional usage for eBPF for packet filtering
>
>
>     Another interesting topics in to implement mac/vlan filters. And
>     in the
>     future, I plan to add mac based steering. All of these could be
>     done via
>     eBPF.
>
>
> No problem, we can cooperate if needed
>
>
>     >
>     > Know issues:
>     > * hash population not supported by eBPF RSS: 'software' RSS used
>
>
>     Is this because there's not way to write to vnet header in
>     STERRING BPF?
>
> Yes. We plan to submit changes for kernel to cooperate with BPF and
> populate the hash, this work is in progress


That would require a new type of eBPF program and may need some work on
verifier.


May be need to allow loading of an additional type in tun.c, not only socket filter (to use bpf_set_hash)
Also vhost and tun in kernel need to be aware of header extension for hash population.
 
Btw, macvtap is still lacking even steering ebpf program. Would you want
to post a patch to support that?


Probably after we have full functioning BPF with TAP/TUN
 

>
>     > as a fallback, also, hash population feature is not reported to
>     guests
>     > with vhost.
>     > * big-endian BPF support: for now, eBPF is disabled for
>     big-endian systems.
>
>
>     Are there any blocker for this?
>
>
> No, can be added in v2


Cool.

Thanks


>
>     Just some quick questions after a glance of the codes. Will go
>     through
>     them tomorrow.
>
>     Thanks
>
>
>     >
>     > Andrew (6):
>     >    Added SetSteeringEBPF method for NetClientState.
>     >    ebpf: Added basic eBPF API.
>     >    ebpf: Added eBPF RSS program.
>     >    ebpf: Added eBPF RSS loader.
>     >    virtio-net: Added eBPF RSS to virtio-net.
>     >    docs: Added eBPF documentation.
>     >
>     >   MAINTAINERS                    |   6 +
>     >   configure                      |  36 +++
>     >   docs/ebpf.rst                  |  29 ++
>     >   docs/ebpf_rss.rst              | 129 ++++++++
>     >   ebpf/EbpfElf_to_C.py           |  67 ++++
>     >   ebpf/Makefile.ebpf             |  38 +++
>     >   ebpf/ebpf-stub.c               |  28 ++
>     >   ebpf/ebpf.c                    | 107 +++++++
>     >   ebpf/ebpf.h                    |  35 +++
>     >   ebpf/ebpf_rss.c                | 178 +++++++++++
>     >   ebpf/ebpf_rss.h                |  30 ++
>     >   ebpf/meson.build               |   1 +
>     >   ebpf/rss.bpf.c                 | 470 ++++++++++++++++++++++++++++
>     >   ebpf/trace-events              |   4 +
>     >   ebpf/trace.h                   |   2 +
>     >   ebpf/tun_rss_steering.h        | 556
>     +++++++++++++++++++++++++++++++++
>     >   hw/net/vhost_net.c             |   2 +
>     >   hw/net/virtio-net.c            | 120 ++++++-
>     >   include/hw/virtio/virtio-net.h |   4 +
>     >   include/net/net.h              |   2 +
>     >   meson.build                    |   3 +
>     >   net/tap-bsd.c                  |   5 +
>     >   net/tap-linux.c                |  19 ++
>     >   net/tap-solaris.c              |   5 +
>     >   net/tap-stub.c                 |   5 +
>     >   net/tap.c                      |   9 +
>     >   net/tap_int.h                  |   1 +
>     >   net/vhost-vdpa.c               |   2 +
>     >   28 files changed, 1889 insertions(+), 4 deletions(-)
>     >   create mode 100644 docs/ebpf.rst
>     >   create mode 100644 docs/ebpf_rss.rst
>     >   create mode 100644 ebpf/EbpfElf_to_C.py
>     >   create mode 100755 ebpf/Makefile.ebpf
>     >   create mode 100644 ebpf/ebpf-stub.c
>     >   create mode 100644 ebpf/ebpf.c
>     >   create mode 100644 ebpf/ebpf.h
>     >   create mode 100644 ebpf/ebpf_rss.c
>     >   create mode 100644 ebpf/ebpf_rss.h
>     >   create mode 100644 ebpf/meson.build
>     >   create mode 100644 ebpf/rss.bpf.c
>     >   create mode 100644 ebpf/trace-events
>     >   create mode 100644 ebpf/trace.h
>     >   create mode 100644 ebpf/tun_rss_steering.h
>     >
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]