- SAP, ECC 기반 새 솔루션 발표 예정··· 애널리스트 “클라우드 전환 미루는 기업에의 대안”
- 워크데이 , 직원 1750명 정리해고 발표··· “AI와 글로벌 확장에 초점”
- Have an old Kindle? How to add new books - and one thing you should never do
- ChatGPT's Deep Research just identified 20 jobs it will replace. Is yours on the list?
- Too many tabs? Try these browsers with better tab management than Chrome
Nvidia claims near 50% boost in AI storage speed
Storage is an overlooked element of AI that has been overshadowed by all the emphasis on processors, namely GPUs. Large language models (LLMs) measure in the terabytes of size and all that needs to be moved around to be processed. So the faster you can move data, the better, so that the GPUs aren’t sitting around waiting for data to be fed to them.
Nvidia says it has tested out these Spectrum-4 features with its Israel-1 AI supercomputer. The testing process measured the read and write bandwidth generated by Nvidia HGX H100 GPU server clients accessing the storage, first with the network configured as a standard RoCE v2 fabric, and then with the adaptive routing and congestion control from Spectrum-X turned on, Nvidia stated.
Tests were run using a range of GPU servers as clients, from 40 to 800 GPUs. In every case, the enhanced Spectrum-X networking performed better than the standard version, with the modified read bandwidth improving from 20% to 48% and write bandwidth improving from 9% to 41% over standard RoCE networking, according to Nvidia.
Another method for improving efficiency is checkpointing, where the processing job state is saved periodically so that if the training run fails for any reason, it can be restarted from a saved checkpoint state rather than starting it over from the beginning.
Nvidia’s storage partners such as DDN, Dell, HPE, Lenovo, VAST Data, and WEKA will likely support these Spectrum-X features in the future.