GigaIO introduces single-node AI supercomputer

Installation and configuration of high-performance computing (HPC) systems can be a considerable challenge that requires skilled IT pros to set up the software stack, for example, and optimize it for maximum performance – it isn’t like building a PC with parts bought off NewEgg.

GigaIO, which specializes in infrastructure for AI and technical computing, is looking to simplify the task. The vendor recently announced a self-contained, single-node system with 32 configured GPUs in the box to offer simplified deployment of AI and supercomputing resources.

Up to now, the only way to harness 32 GPUs would require four servers with eight GPUs apiece. There would be latency to contend with, as the servers communicate over networking protocols, and all that hardware would consume floor space.

What makes GigaIO’s device – called SuperNODE – notable is that it offers a choice of GPUs: up to 32 AMD Instinct MI210 GPUs or 24 NVIDIA A100s, plus up to 1PB storage to a single off-the-shelf server. The MI210 is a step down in performance from the top-of-the-line MI250 card (at least for now) that’s used in the Frontier exaFLOP supercomputer. It has a few less cores and less memory but is still based on AMD’s Radeon GPU technology.

“AMD collaborates with startup innovators like GigaIO in order to bring unique solutions to the evolving workload demands of AI and HPC,” said Andrew Dieckmann, corporate vice president and general manager of the data center and accelerated processing group at AMD, in a statement. “The SuperNODE system created by GigaIO and powered by AMD Instinct accelerators offers compelling TCO for both traditional HPC and generative AI workloads.”

SuperNODE is built on GigaIO’s FabreX custom fabric technology, a memory-centric fabric that reduces latency from system memory of one server communicating with other servers in the system to just 200ns. This enables the FabreX Gen4 implementation to scale up to 512Gbits/sec bandwidth.

FabreX can connect a variety of resources, including accelerators such as GPUs, DPUs, TPUs, FPGAs and SoCs; storage devices, such as NVMe, PCIe native storage; and other I/O resources connected to compute nodes. Basically, anything that uses a PCI Express bus can be connected to FabreX for direct device-to-device communication across the same fabric.

SuperNODE has three modes of operation: beast mode, for applications that make the most of many or all GPUs; freestyle mode, where every user gets their own GPU to use for processing purposes; and swarm mode, where applications run on multiple servers.

SuperNODE can run existing applications written on popular AI frameworks such as PyTorch and TensorFlow without requiring modification. It uses Nvidia’s Bright Cluster Manager Data Science software to manage and configure the environment and handle scheduling as well as container management.

SuperNODE is available now from GigaIO.

Source link

GigaIO introduces single-node AI supercomputer

VMWARE

Helping Public Sector Organisations Define Cloud Strategy

How to change the VLAN ID of the Service Console in ESX from the command line/console

Cisco UCS and Vmware Interfaces (Vnics) HA Design Considerations

Troubleshooting network and TCP/UDP port connectivity issues on ESX/ESXi(2020669)

vSphere Client Parameters

Configuration Templates

CUE Licenses

Trouble shooting Unity Express with Call Manager Integeration & Operational Issues

CME Configuration Example: SIP Trunks to Viatalk and VoIP.ms

SIP Phone registration – CME Configuration

CUE Voicemail + VPIM networking (CUE to unity)

Related Post

VMWARE

Configuration Templates