Everyone but Nvidia joins forces for new AI interconnect

The UALink group plans to develop a specification to define a high-speed, low-latency interconnect for scale-up communications between accelerators and switches in AI computing pods. The 1.0 specification will enable the connection of up to 1,024 accelerators within an AI computing pod and allow for direct loads and stores between the memory attached to accelerators, such as GPUs, in the pod, according to the group.

Norrod pointed out that the UALink members are also backers of the Ultra Ethernet Consortium, which was formed to develop technologies aimed at increasing the scale, stability, and reliability of Ethernet networks to satisfy AI’s high-performance networking requirements. The UEC was founded last year by AMD, Arista, Broadcom, Cisco, Eviden, HPE, Intel, Meta and Microsoft, and it now includes more than 50 vendors. Later this year, it plans to release official specifications that will focus on a variety of scalable Ethernet improvements, including better multi-path and packet delivery options as well as modern congestion and telemetry features.

“And so by coming together, we believe that this promoters group is filling in an important element of future … scaled out AI systems architectures with this pod-level interconnect. And in concert with Ultra Ethernet, [it] will enable systems of hundreds of thousands or millions of accelerators to efficiently work together,” Norrod said.

J Metz, chair of the Ultra Ethernet Consortium, touted opportunities for collaboration among UALink and UEC backers in a statement announcing the new group’s formation: “In a very short period of time, the technology industry has embraced challenges that AI and HPC have uncovered. Interconnecting accelerators like GPUs requires a holistic perspective when seeking to improve efficiencies and performance. At UEC, we believe that UALink’s scale-up approach to solving pod cluster issues complements our own scale-out protocol, and we are looking forward to collaborating together on creating an open, ecosystem-friendly, industry-wide solution that addresses both kinds of needs in the future.”

The UALink Promoter Group expects the 1.0 specification is expected to be available in the third quarter of this year and made available to companies that join the Ultra Accelerator Link (UALink) Consortium. Products could appear next year, with implementation potentially around 2026.



Source link