Nvidia claims near 50% boost in AI storage speed

Storage is an overlooked element of AI that has been overshadowed by all the emphasis on processors, namely GPUs. Large language models (LLMs) measure in the terabytes of size and all that needs to be moved around to be processed. So the faster you can move data, the better, so that the GPUs aren’t sitting around waiting for data to be fed to them.

Nvidia says it has tested out these Spectrum-4 features with its Israel-1 AI supercomputer. The testing process measured the read and write bandwidth generated by Nvidia HGX H100 GPU server clients accessing the storage, first with the network configured as a standard RoCE v2 fabric, and then with the adaptive routing and congestion control from Spectrum-X turned on, Nvidia stated.

Tests were run using a range of GPU servers as clients, from 40 to 800 GPUs. In every case, the enhanced Spectrum-X networking performed better than the standard version, with the modified read bandwidth improving from 20% to 48% and write bandwidth improving from 9% to 41% over standard RoCE networking, according to Nvidia.

Another method for improving efficiency is checkpointing, where the processing job state is saved periodically so that if the training run fails for any reason, it can be restarted from a saved checkpoint state rather than starting it over from the beginning.

Nvidia’s storage partners such as DDN, Dell, HPE, Lenovo, VAST Data, and WEKA will likely support these Spectrum-X features in the future.

Source link

Nvidia claims near 50% boost in AI storage speed

VMWARE

Helping Public Sector Organisations Define Cloud Strategy

How to change the VLAN ID of the Service Console in ESX from the command line/console

Cisco UCS and Vmware Interfaces (Vnics) HA Design Considerations

Troubleshooting network and TCP/UDP port connectivity issues on ESX/ESXi(2020669)

vSphere Client Parameters

Configuration Templates

CUE Licenses

Trouble shooting Unity Express with Call Manager Integeration & Operational Issues

CME Configuration Example: SIP Trunks to Viatalk and VoIP.ms

SIP Phone registration – CME Configuration

CUE Voicemail + VPIM networking (CUE to unity)

Related Post

VMWARE

Configuration Templates