Academic Salon on High-Performance Ethernet: Host Networking and Monitoring

Organisation by Technical University of Munich

Dates

Wednesday Afternoon, 12 March and Thursday Morning, 13 March 2025

Location

TUM-IAS Institute of Advanced Studies Lichtenbergstraße 2 a, 85748 Garching near Munich, Germany also online (Hybrid Event) - information for remote participants: https://hedgedoc.net.in.tum.de/s/urD1cxHI8

Organizers

Georg Carle, Sebastian Gallenmüller, Andreas Herkersdorf, Jörg Ott

Aims and Scope

The rapid evolution of Ethernet technology has led to unprecedented data transmission speeds, with current data rates of up to 800 Gbit/s. These advancements enable higher bandwidths and lower latencies, which is important to meet the needs of data-intensive and latency-sensitive applications in the areas of machine learning and real-time networked systems.

The physical layer of Ethernet has been widely adopted across both Scale-Out and Scale-Up network architectures. Scale-Out networks, which expand horizontally by adding more interconnected nodes, typically use the standardized full Ethernet stack in data centers to support lossless communication. This is achieved through technologies such as Priority Flow Control (PFC). However, as network size increases, the efficiency of bandwidth utilization can degrade due to PFC-induced performance issues. To address this challenge, both industry (e.g., Universal Ethernet Consortium) and academia are exploring more efficient protocol mechanims to enhance bandwidth utilization in larger-scale networks.

Scale-Up networks, which increase the resources of individual nodes, benefit from the high bandwidth offered by standard Ethernet. However, the increasing network speed presents significant challenges to the computers connected via these networks. Although hardware acceleration in network interface cards is beneficial for performance, server architectures, interconnection between processors and devices, and software protocols stacks remain bottlenecks in leveraging the full potential of modern high-speed Ethernet. This becomes particularly challenging in the context of modern host architectures with GPU and NPU (Neural Processing Unit) co-processors, suitable to support a wide range of applications, including AI training, AI inference, AI recommendation, HPC, and other performance-demanding cloud applications.

Conventional software protocol stacks introduce significant latency and processing overheads, limiting their ability to handle high-speed traffic efficiently. To mitigate this, protocols can be improved. Additionally, kernel offload functionality has emerged as a key technology, enabling the delegation of certain network processing tasks directly to NICs. While this offload capability reduces CPU usage and accelerates data processing, it is not yet fully optimized for fine-grained flow management at ultra-high speeds.

Additionally, the ability to monitor and analyze individual network flows in real time has become critical for effective network management, security, and optimization. Existing solutions often fail to balance the demands of performance and granularity. To address these challenges, approaches capable of providing fine-grained measurements will be regarded, based on technologies such as eBPF (extended Berkeley Packet Filter), programmable NICs, and kernel offload functionality, to meet the demands in the context of ultra-high-speed Ethernet, achieving scalability and precision.

Sessions

Sessions are planned consisting of regular presentations followed by Q&A, and panel discussions. Possible sessions that address the above outlined aims of the academic salon could be organized along the following content grouping.

Topics

Presenters

Planned presenters include

Program

Wednesday Afternoon, 12 March

14:00 Session 1

15:45 Coffee Break

16:15 Session 2

19:00 Dinner

Thursday Morning, 13 March 2025

09:00 Session 3

10:30 Coffee Break

11:00 Session 4

13:00 Closing (with light lunch)

14:00 Farewell

Registration

Venue - Institute of Advanced Studies, Garching Campus of TUM

Address: Lichtenbergstraße 2 a, 85748 Garching near Munich, Germany

ias

Transport