XDP submodule
The NFB driver supports XDP as an alternative to NDP API and DPDK. The XDP submodule is under development, and the NFB driver doesn’t compile with it by default.
How to enable
In order to compile with the XDP submodule, the variable CONFIG_NFB_XDP = y
has to be set in drivers/Makefile.conf
.
# drivers/Makefile.conf
CONFIG_MODULES = y
CONFIG_NFB = m
CONFIG_NFB_XVC = m
# uncomment to enable XDP functionality
CONFIG_NFB_XDP = y
Next, the driver needs to be configure-d and make-d
$ cd drivers; ./configure; make; cd ..
Then, you enable the driver by setting the module parametr xdp_enable=yes
when inserting the module.
$ sudo rmmod nfb
$ sudo insmod drivers/kernel/drivers/nfb/nfb.ko xdp_enable=yes
In order to see if the driver successfully initialized, check the system log.
The nfb_xdp: Successfully attached
message should be present. In case of errors, you can try rebooting the desing, by running nfb-boot -F<design slot number>
.
$ dmesg
...
[ 4933.960744] nfb 0000:03:00.0: FDT loaded, size: 16384, allocated buffer size: 65536
[ 4933.961000] nfb 0000:03:00.0: unable to enable MSI
[ 4933.961311] nfb 0000:03:00.0: nfb_mi: MI0 on PCI0 map: successfull
[ 4933.964620] nfb 0000:03:00.0: nfb_boot: Attached successfully
[ 4934.024800] nfb 0000:03:00.0: nfb_ndp: attached successfully
[ 4934.024884] nfb 0000:03:00.0: nfb_qdr: Attached successfully (0 QDR controllers)
[ 4934.028473] nfb 0000:03:00.0: nfb_xdp: Successfully attached
[ 4934.028478] nfb 0000:03:00.0: successfully initialized
[ 4934.028685] pcieport 0000:00:02.0: restoring errors on PCI bridge
[ 4934.028688] pcieport 0000:00:02.0: firmware reload done
...
Lastly, you add the add the XDP netdev with the help of nfb-dma
tool.
To add a single XDP netdevice with all queues open
$ sudo nfb-dma -N nfb_xdp add,id=1
How to test
Basic XDP vs AF_XDP / XDP zero-copy mode overview
Basic XDP
When the XDP driver is inserted, and the XDP network interfaces are created via nfb-dma
, this starts up the queues on those network interfaces.
In this mode, the driver acts as a usual network driver. When the XDP program is loaded onto the interface, the driver executes it on each received packet and handles the result accordingly.
The driver in this mode allocates its own memory for the packet buffers using Page Pool API.
This mode only expands the basic network driver functionality by running the XDP program.
AF_XDP / XDP zero-copy
AF_XDP mode requires a specialized userspace app. This app allocates the memory block used for XDP buffers called UMEM and four XDP descriptor rings to go with it. The app then registers those objects with the system through system calls. The app opens the AF_XDP socket.
For an AF_XDP application to work, the XDP program has to be loaded into the kernel. This is usually done from the C code as a part of the app initialization process. The XDP program will work in tandem with the app in the userspace. The main purpose of this XDP program is to return an XDP_REDIRECT action onto the AF_XDP socket opened by the userspace application. The reference to the socket is passed to the XDP program from the userspace by using an eBPF map of type BPF_MAP_TYPE_XSKMAP.
For a more complete explanation, please refer to the docs: https://docs.kernel.org/networking/af_xdp.html
These steps are usually done indirectly by using some helper library like
libxdp
.
The driver then has to switch the network channel (AF_XDP works with RX/TX queue pairs) chosen by the userspace application to AF_XDP mode. This is done on the fly, and it basically means stopping the DMA channel, mapping the memory from the userspace, and starting the DMA channel again.
In this mode, the driver works with memory allocated by userspace, and the userspace app has exclusive control over the network channel it works with.
When the AF_XDP socket is closed, the channel is switched back to the normal XDP mode.
While running in this mode, the true zero-copy operation is achieved as long as the XDP program passes the packets to the userspace through AF_XDP socket. XDP actions other than XDP_REDIRECT to userspace will be handled by copying.
XDP quickstart
This section tells the bare minimum someone needs to know to compile and load an XDP program onto the interface.
XDP return actions
The XDP program returns an XDP action for each received packet. Those actions are:
- XDP_PASS - passes the packet to the network stack
Copy for XDP and AF_XDP
- XDP_TX - transmits the packet from the same interface it was received on
zero-copy with XDP
Copy for AF_XDP (the correct way to do this is through redirect to the userspace app)
- XDP_DROP - drops the packet
Recycles the buffer for XDP and AF_XDP
- XDP_REDIRECT - redirects the packet
Onto network interface - copy
To a different CPU - for load balancing purposes
To userspace - this is AF_XDP zero-copy operation
XDP attach modes
XDP program can be attached in different ways, this has performance implications.
- Generic XDP (Generic mode, SKB mode) - The XDP program is run after driver passes the packet into the network stack.
This makes the XDP execution indifferent to driver support, but it means there will be little to no benefit to performance. - This mode should be used only for testing purposes
- Native XDP (Native mode) - The XDP program is run as a first thing on the receive path, and the driver handles the XDP actions.
This is what should be used and is the main goal of this XDP kernel module
- Offloaded XDP - The XDP program is loaded and executed on the network card.
This is not supported by the NDK
Prerequisites
$ sudo dnf install clang libbpf libbpf-devel libxdp libxdp-devel xdp-tools
clang
eBPF compiler is implemented as LLVM backend - clang is used on the front
libbpf
Library for writing eBPF (XDP is a subset of eBPF) programs in C; https://github.com/libbpf/libbpf
libxdp
Library that simplifies writing AF_XDP programs and contains helpers for attaching XDP programs to the interface; https://github.com/xdp-project/xdp-tools/blob/main/lib/libxdp/README.org
xdp-tools
Collection of tools for loading and testing XDP programs
The most simple XDP program
A program that drops all packets
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
SEC("xdp_drop")
int xdp_drop_prog(struct xdp_md *ctx)
{
return XDP_DROP;
}
Compiling the XDP program
$ clang -O2 -g -Wall -target bpf -c xdp_drop.c -o xdp_drop.o
Loading the XDP program
Easiest way is to use
xdp-loader
program, which comes as a part ofxdp-tools
.libxdp
contains helper functions for loading XDP programs from C code and interacting with eBPF maps.
Listing the interfaces and loaded programs:
$ sudo xdp-loader status
Loading the program in native mode:
$ sudo xdp-loader load IFNAME xdp_drop.o
Loading the program in generic mode:
$ sudo xdp-loader load -m skb IFNAME xdp_drop.o
Unloading the program:
$ sudo xdp-loader unload IFNAME --all
AF_XDP quickstart
This section explains the basics of writing AF_XDP program using libxdp
in a form of short code snippets.
For an example how to work with the NDK look at the ndp-tool
application located in the swbase repository:
https://github.com/CESNET/ndk-sw/tree/main/tools/ndptool
The interensting files are xdp_read.c
for RX - this is the simplest example of libxdp
application.
And xdp_common.c
for the NDK FPGA sysfs
API, that can be used to easily map the firmware queue indexes to netdev channel indexes.
The XDP program
The libxdp
library ships default XDP program for AF_XDP operation,
this program is loaded and the reference to the AF_XDP socket is placed to the BPF_MAP_TYPE_XSKMAP
automatically,
unless XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD
flag is set.
This simplifies the setup. If you would like to create your own XDP program for AF_XDP operation,
for an example to preprocess packets before redirecting them to userspace, please follow the libxdp
README.
You can check the default program here: https://github.com/xdp-project/xdp-tools/blob/main/lib/libxdp/xsk_def_xdp_prog.c
The userspace application
Loading the program, putting the socket reference to the BPF_MAP_TYPE_XSKMAP
This is taken care of by libxdp
, if you want to use custom XDP program, please follow the libxdp
README.
Creating UMEM
The userspace has to allocate a large block of memory called UMEM.
It is standard that one frame (here corresponding to one packet) is of the size PAGE_SIZE.
Function posix_memalign() assures that each frame is located on its own page.
It is possible to allocate DPDK style hugepages via mmap
and flag MAP_HUGETLB
,
but this is not used in the example and requires to reserve the hugepages beforehand.
The UMEM configuration can be skipped by passing NULL
as the last argument of the xsk_umem__create()
function,
but the default ring size is 2048 descriptors, which can lead to performance issues,
from testing it is good to have as much descriptors as there are UMEM frames.
// UMEM configuration
struct umem_info *uinfo = ¶ms->queue_data_arr[i].umem_info;
uinfo->size = NUM_FRAMES * pagesize;
uinfo->umem_cfg.comp_size = NUM_FRAMES;
uinfo->umem_cfg.fill_size = NUM_FRAMES;
uinfo->umem_cfg.flags = 0;
uinfo->umem_cfg.frame_headroom = 0;
uinfo->umem_cfg.frame_size = pagesize;
// UMEM memory area
if(posix_memalign(&uinfo->umem_area, pagesize, uinfo->size)) {
fprintf(stderr, "Failed to get allocate umem buff for queue %d\n", params->queue_data_arr[i].eth_qid);
...
}
// Registering UMEM
if((ret = xsk_umem__create(&uinfo->umem, uinfo->umem_area, uinfo->size, &uinfo->fill_ring, &uinfo->comp_ring, &uinfo->umem_cfg))) {
fprintf(stderr, "Failed to create umem for queue %d; ret: %d\n", params->queue_data_arr[i].eth_qid, ret);
...
}
Address stack
The userspace app needs to keep track of which frames are in its possession. The address in this contex means offset into the UMEM.
In this example, keeping track of free frames is done through a stack of addresses.
The full code is located in xdp_common.h
.
struct addr_stack {
uint64_t addresses[NUM_FRAMES];
unsigned addr_cnt;
};
// Get address from stack
static inline uint64_t alloc_addr(struct addr_stack *stack) {
if (stack->addr_cnt == 0) {
fprintf(stderr, "BUG out of adresses\n");
exit(1);
}
uint64_t addr = stack->addresses[--stack->addr_cnt];
stack->addresses[stack->addr_cnt] = 0;
return addr;
}
// Put adress to stack
static inline void free_addr(struct addr_stack *stack, uint64_t address) {
if (stack->addr_cnt == NUM_FRAMES) {
fprintf(stderr, "BUG counting adresses\n");
exit(1);
}
stack->addresses[stack->addr_cnt++] = address;
}
// Fill the stack with addresses into the umem
static inline void init_addr(struct addr_stack *stack, unsigned frame_size) {
for(unsigned i = 0; i < NUM_FRAMES; i++) {
stack->addresses[i] = i * frame_size;
}
stack->addr_cnt = NUM_FRAMES;
}
NDK FPGA sysfs interface
To use XDP driver you first need to add the XDP interface via nfb-dma
tool.
This creates an issue when writing applications, because the firmware DMA queues can be mapped to any number of XDP interfaces.
The AF_XDP socket needs to know the name of the interface and the index of the queue in the context of the interface.
For this purpose the XDP driver makes this information available via the sysfs
interface.
When the XDP driver is loaded the /sys/class/nfb/nfb#/nfb_xdp/
directory is made available.
Inside this directory:
channel_total
- tells you how many XDP channels there are, this information is parsed from the device tree at driver startup.
eth_count
- tells you how many physical interfaces there are, this information is parsed from the device tree at driver startup.
channel#
- directories corespond to the XDP channels - These contain files corresponding to the state of the channel.
├── status
- 1 if the queue was assigned to interface vianfb-dma
else 0
├── ifname
- Name of the interface the channel was mapped to vianfb-dma
, else ‘NOT_OPEN’
└── index
- Index in the context of the interface the channel was mapped to vianfb-dma
, else -1
Example:
We have NDK-MINIMAL
firmware with 16 DMA queue pairs. We add one XDP interface which uses firmware DMA queue pairs 5, 6.
The dmesg
command says nfb 0000:21:00.0 enp33s0: renamed from nfb0x1
because system predictible interface naming is enabled:
$ sudo nfb-dma -N nfb_xdp add,id=1 -i5-6
Then the simplified structure looks like this:
$ tree
.
├── channel0
│ ├── ifname NOT_OPEN
│ ├── index -1
│ └── status 0
...
├── channel5
│ ├── ifname enp33s0
│ ├── index 0
│ └── status 1
├── channel6
│ ├── ifname enp33s0
│ ├── index 1
│ └── status 1
├── channel7
│ ├── ifname NOT_OPEN
│ ├── index -1
│ └── status 0
...
├── channel_total 16
├── cmd
└── eth_count 2
By reading the files we can get the correct values for opening the AF_XDP socket.
The DMA queue pair 5 is mapped to enp33s0 as the first channel (index 0).
xsk_socket__create(..., "enp33s0", 0, ..., ..., ..., ...)
To delete the XDP interface:
$ sudo nfb-dma -N nfb_xdp del,id=1
Creating socket
The ifname and queue_id in the context of the XDP interface can be acquired via sysfs.
The XSK configuration can be skipped by passing NULL
as the last argument of the xsk_socket__create()
function,
but the default ring size is 2048 descriptors, which can lead to performance issues,
from testing it is good to have as much descriptors as there are UMEM frames.
// XSK configuration
struct xsk_info *xinfo = ¶ms->queue_data_arr[i].xsk_info;
xinfo->xsk_cfg.rx_size = NUM_FRAMES;
xinfo->xsk_cfg.tx_size = NUM_FRAMES;
xinfo->xsk_cfg.bind_flags = XDP_ZEROCOPY;
// The name of the interface together with the queue_id (in context of the interface)
// Can be best acquired via the sysfs interface of NDK platform
strcpy(xinfo->ifname, params->queue_data_arr[i].ifname);
xinfo->queue_id = params->queue_data_arr[i].eth_qid;
if((ret = xsk_socket__create(&xinfo->xsk, xinfo->ifname, xinfo->queue_id, uinfo->umem, &xinfo->rx_ring, &xinfo->tx_ring, &xinfo->xsk_cfg))) {
fprintf(stderr, "Failed to create xsocket for queue %d; ret: %d\n", params->queue_data_arr[i].eth_qid, ret);
...
}
The main loop
The fill part (from userspace to kernel):
xsk_ring_prod__reserve()
reserves slots on the fill ring
xsk_ring_prod__fill_addr()
is used to place the frames into the ring
xsk_ring_prod__submit()
signals, that the filling is done, and kernel now owns the frames
The recieve part (from kernel to userspace):
xsk_ring_cons__peek()
returns the number of packets ready to be processed and gives ownership to userspace
xsk_ring_cons__rx_desc()
holds the information about packet (its length and adress of data start)
xsk_ring_cons__release()
returns the RX ring slots back to the kernel so they can be filled with new packets
// The receive loop
while (run) {
...
// Fill rx
unsigned rx_idx = 0;
unsigned fill_idx = 0;
unsigned reservable = xsk_prod_nb_free(fill_ring, stack.addr_cnt);
reservable = reservable < stack.addr_cnt ? reservable : stack.addr_cnt;
if (reservable > burst_size) {
unsigned reserved = xsk_ring_prod__reserve(fill_ring, reservable, &fill_idx);
for(unsigned i = 0; i < reserved; i++) {
*xsk_ring_prod__fill_addr(fill_ring, fill_idx++) = alloc_addr(&stack);
}
xsk_ring_prod__submit(fill_ring, reserved);
}
...
// Receive packets
cnt = xsk_ring_cons__peek(rx_ring, burst_size, &rx_idx);
// Process packets
for(unsigned i = 0; i < cnt; i++) {
struct xdp_desc const *desc = xsk_ring_cons__rx_desc(rx_ring, rx_idx++);
packets[i].data = xsk_umem__get_data(xsk_data->umem_info.umem_area, desc->addr);
packets[i].data_length = desc->len;
free_addr(&stack, desc->addr);
}
// Mark done
xsk_ring_cons__release(rx_ring, cnt);
}
Common issues
No traffic:
Make sure the firmware interfaces are enabled: nfb-eth -e1
Useful links
Useful AF_XDP enabled apps
tcpreplay - since release v4.5.1; https://github.com/appneta/tcpreplay
Libxdp README
For libxdp
API refer to: https://github.com/xdp-project/xdp-tools/tree/main/lib/libxdp