Academic Salon on High-Performance Ethernet: Host Networking and Monitoring
Organisation by Technical University of Munich
Dates
Wednesday Afternoon, 12 March and Thursday Morning, 13 March 2025
Location
TUM-IAS Institute of Advanced Studies Lichtenbergstraße 2 a, 85748 Garching near Munich, Germany also online (Hybrid Event) - information for remote participants: https://hedgedoc.net.in.tum.de/s/urD1cxHI8
Organizers
Georg Carle, Sebastian Gallenmüller, Andreas Herkersdorf, Jörg Ott
Aims and Scope
The rapid evolution of Ethernet technology has led to unprecedented data transmission speeds, with current data rates of up to 800 Gbit/s. These advancements enable higher bandwidths and lower latencies, which is important to meet the needs of data-intensive and latency-sensitive applications in the areas of machine learning and real-time networked systems.
The physical layer of Ethernet has been widely adopted across both Scale-Out and Scale-Up network architectures. Scale-Out networks, which expand horizontally by adding more interconnected nodes, typically use the standardized full Ethernet stack in data centers to support lossless communication. This is achieved through technologies such as Priority Flow Control (PFC). However, as network size increases, the efficiency of bandwidth utilization can degrade due to PFC-induced performance issues. To address this challenge, both industry (e.g., Universal Ethernet Consortium) and academia are exploring more efficient protocol mechanims to enhance bandwidth utilization in larger-scale networks.
Scale-Up networks, which increase the resources of individual nodes, benefit from the high bandwidth offered by standard Ethernet. However, the increasing network speed presents significant challenges to the computers connected via these networks. Although hardware acceleration in network interface cards is beneficial for performance, server architectures, interconnection between processors and devices, and software protocols stacks remain bottlenecks in leveraging the full potential of modern high-speed Ethernet. This becomes particularly challenging in the context of modern host architectures with GPU and NPU (Neural Processing Unit) co-processors, suitable to support a wide range of applications, including AI training, AI inference, AI recommendation, HPC, and other performance-demanding cloud applications.
Conventional software protocol stacks introduce significant latency and processing overheads, limiting their ability to handle high-speed traffic efficiently. To mitigate this, protocols can be improved. Additionally, kernel offload functionality has emerged as a key technology, enabling the delegation of certain network processing tasks directly to NICs. While this offload capability reduces CPU usage and accelerates data processing, it is not yet fully optimized for fine-grained flow management at ultra-high speeds.
Additionally, the ability to monitor and analyze individual network flows in real time has become critical for effective network management, security, and optimization. Existing solutions often fail to balance the demands of performance and granularity. To address these challenges, approaches capable of providing fine-grained measurements will be regarded, based on technologies such as eBPF (extended Berkeley Packet Filter), programmable NICs, and kernel offload functionality, to meet the demands in the context of ultra-high-speed Ethernet, achieving scalability and precision.
Sessions
Sessions are planned consisting of regular presentations followed by Q&A, and panel discussions. Possible sessions that address the above outlined aims of the academic salon could be organized along the following content grouping.
Topics
- Topic: High-Speed Ethernet – Opportunities and Challenges
addressing one or more of the following topics:
- Overview of Ethernet evolution and its progression to 800 Gbit/s and beyond.
- Use cases driving demand for ultra-high-speed networks
- Challenges by current server architectures, NICs, GPU and NPU co-processors, and software protocol stacks.
- Discussion of what aspects of traditional solutions are insufficient for scaling to such speeds.
- Topic: Hardware Acceleration in Network Interface Cards (NICs)
addressing one or more of the following topics:
- Modern NIC architectures and their role in high-speed networking.
- Hardware acceleration features such as RDMA, flow classification, and packet filtering.
- Programmable NICs and monitoring capabilities of NICs
- Challenges in integrating NIC acceleration with existing network and server architectures.
- Topic: Network architectures and acceleration for AI computing
addressing one or more of the following topics:
- Scale-out network architecture, and transport design for AI training
- Scale-up networking, and protocols for AI applications
- Networking and acceleration for AI
- Topic: Flow Monitoring and eBPF in Efficient Flow Management
addressing one or more of the following topics:
- Applying eBPF technology for real-time, fine-grained flow monitoring.
- Traffic filtering, performance monitoring, and anomaly detection use-cases
- Kernel offload and eBPF with programmable, hardware-accelerated NICs
- Topic: Building Future-Ready Instrumented Protocol Stacks
addressing one or more of the following topics:
- Designing systems that combine software and hardware capabilities for different applications, including AI training, AI interference, HPC and other performance-demanding cloud applications.
- Panel discussion on open challenges and areas for further innovation (e.g., programmable data planes, processing of flow monitoring data for automated network and system configuration and management).
Presenters
Planned presenters include
- Marco Canini (KAUST, SA)
- Marco Chiesa (KTH, SE)
- Jesus Escudero-Sahuquillo (University of Castilla-La Mancha, ES)
- Sebastian Gallenmüller (Technical University of Munich, DE)
- Pedro Javier García (University of Castilla-La Mancha, ES)
- Andreas Herkersdorf (Technical University of Munich, DE)
- Michio Honda (University of Edinburgh, GB)
- Sergio Iserte, Antonio Pena (Barcelona Supercomputing Center, ES)
- Marios Kogias (Imperial College London, GB)
- Leonardo Linguaglossa (Telecom ParisTech, FR)
- Sebastiano Miano (CTO of Path Network)
- Jörg Ott (Technical University of Munich, DE)
- Gabriel Paradzik/Michael Menth (University of Tübingen, DE)
Program
Wednesday Afternoon, 12 March
- 13:00 Arrival (with light lunch)
- 14:00 Session starts
14:00 Session 1
- Session Chair: Georg Carle
- Opening and Introduction
- Andreas Herkersdorf (Technical University of Munich, DE): “NIC Architectures and their Role in High-Speed Networking”
- Sergio Iserte, Antonio Pena (Barcelona Supercomputing Center, ES): “Leveraging SmartNICs in HPC via OpenMP offloading with ODOS”
- Marco Chiesa (KTH, SE): “Breaking Limits: Terabit Speeds on a single CPU server”
15:45 Coffee Break
16:15 Session 2
- Session Chair: Sebastian Gallenmüller
- Marco Canini (KAUST, SA): “Metrics, Mayhem, and Microservices: Taming the Cloud Observability Beast”
- Pedro J. García, Jesus Escudero-Sahuquillo (University of Castilla-La Mancha, ES): “High-Performance Interconnection Networks in the Exascale and AI Era: Challenges and Solutions”
- Sebastiano Miano (CTO of Path Network) - online: “State-Compute Replication: Parallelizing High-Speed Stateful Packet Processing”
- Panel Discussion: “Future-Ready High-Performance Networking and Acceleration”
19:00 Dinner
- Garchinger Augustiner, Freisinger Landstraße 4, 85748 Garching bei München
Thursday Morning, 13 March 2025
09:00 Session 3
- Session Chair: Andreas Herkersdorf
- Jörg Ott (Technical University of Munich, DE): “Towards Multicast for Consensus in (Large-Scale) Distributed Systems”
- Michio Honda (University of Edinburgh, GB) - online: “Designing Transport-Level Encryption for Datacenter Networks”
- Marios Kogias (Imperial College London, GB) - online: “Towards Functional Verification of eBPF Programs”
10:30 Coffee Break
11:00 Session 4
- Session Chair: Jörg Ott
- Leonardo Linguaglossa (Telecom ParisTech, FR): “The Data Uncertainty Principle: Measurement and analysis in high-speed network systems”
- Sebastian Gallenmüller (Technical University of Munich, DE): “High-Performance Packet Processing Experiments”
- Gabriel Paradzik, Michael Menth (University of Tübingen, DE): “Scaling Threat Detection to High Data Rates Using IPFIX”
- Plenary discussion
- Conclusion and wrapup
13:00 Closing (with light lunch)
14:00 Farewell
Registration
- If you want to dial-in or join the event in person, there is a link provided for registration. Please note that the number of spots for on-site participations are limited. The link for registration is: https://forms.gle/TBmmAwgP5Eajt8DZ6
Venue - Institute of Advanced Studies, Garching Campus of TUM
Address: Lichtenbergstraße 2 a, 85748 Garching near Munich, Germany
- Faculty club, fourth floor
Transport
- Subway U6 - Garching Forschungszentrum
- Possible to arrive with a car - free parking on
- Detailed directions (including airport):