Why eBPF is critical and how it's getting better
“In a cloud environment, we just assume that it’s a given that we have tools that allow us to debug performance on a daily basis,” she said. “But without eBPF, we would have to rely on good old tools like TCPdump and strace, and in turn, those would require a lot more system resources, they would be highly inefficient, leading us to investing a lot of dollars in monitoring the fleet at a high scale in a cloud environment.”
Netflix is both a leading contributor and user of eBPF. It also has built out multiple networking tools on its own that use eBPF. Netflix built a network observability sidecar called Flow Exporter, for example. A sidecar is a term used to describe a type of container that operates alongside a cluster. The Flow Exporter uses eBPF to collect and process data.
“We collect all of this data and also use it for traffic forecasting and run it through a large ML (machine learning) model, which in turns, allows us to do interesting things like traffic shaping and dynamically addressing traffic,” she said.
The challenge of dealing with noisy neighbors is familiar to many networking professionals. Netflix is using eBPF to detect noisy neighbor problems, which can take a toll on application performance if not detected and remediated.
Netflix has also developed a tool called bpftop, which provides a real-time view of eBPF programs and shows stats like average execution, runtime events per seconds and CPU utilization.
Security is another area where Netflix is making use of eBPF as part of its Dropio DDoS (distributed denial of service) mitigation tool. “We invested in building this eBPF-based module that is actually highly efficient in enforcing IP-based rules,” she said.