- "기밀 VM의 빈틈을 메운다" 마이크로소프트의 오픈소스 파라바이저 '오픈HCL'란?
- The best early Black Friday AirPods deals: Shop early deals
- The 19 best Black Friday headphone deals 2024: Early sales live now
- I tested the iPad Mini 7 for a week, and its the ultraportable tablet to beat at $100 off
- The best Black Friday deals 2024: Early sales live now
Nvidia launches Blackwell GPU architecture
Nvidia kicked off its GTC 2024 conference with the formal launch of Blackwell, its next-generation GPU architecture due at the end of the year.
Blackwell uses a chiplet design, to a point. Whereas AMD’s designs have several chiplets, Blackwell has two very large dies that are tied together as one GPU with a high-speed interlink that operates at 10 terabytes per second, according to Ian Buck, vice president of HPC at Nvidia.
Nvidia will deliver three new Blackwell data center and AI GPUs: the B100, B200, and GB200. The B100 has a single processor, the B200 has two GPUs interconnected, and the GB200 features two GPUs and a Grace CPU.
Buck says the GB200 will deliver inference performance that’s seven times greater than the Hopper GH200 can deliver. It delivers four times the AI training performance of Hopper, 30 times better inference performance overall, and 25 times better energy efficiency, Buck claimed. “This will expand AI data center scale to beyond 100,000 GPUs,” he said on a press call ahead of the announcement.
Blackwell has 192GB of HBM 3E memory with more than 8TB/sec of bandwidth and 1.8 TB of secondary link. Blackwell also supports the company’s second-generation transformer engine, which tracks the accuracy and dynamic range of every layer of every tensor and the entire neural network as it proceeds in computing.
Blackwell has 20 petaflops of FP4 AI performance on a single GPU. FP4, with four bits of floating point precision per operation, is new to the Blackwell processor. Hopper had FP8. The shorter the floating-point string, the faster it can be executed. That’s why as floating-point strings go up – FP8, FP16, FP32, and FP64 – performance is cut in half with each step. Hopper has 4 Pflops of FP8 AI performance, which is less than half the performance of Blackwell.