Google targets AI inferencing opportunity with Ironwood chip

The gain in performance while using the same amount of power also results in the chip being more cost-effective as it delivers more capacity per watt, Amin Vahdat, VP of machine learning, systems, and Cloud AI, wrote in a blog post.

Ironwood pod can also scale to up to  9,216 chips, compared to its predecessors TPU v5p and TPU v4 which support up to 8,960 and 4,896 chips respectively.

Ironwood also offers 6x the bandwidth memory of Trillium at 192GB per chip compared to 95GB and 32GB for TPU v5p and TPU v4.

The higher bandwidth memory capacity is crucial for processing larger models and datasets, reducing the need for frequent data transfers and improving performance, Vahdat wrote.

Google has also upgraded Ironwood’s HBM bandwidth and Inter-Chip Interconnect (ICI) bandwidth to 4.5x and 1.5x of Trillium respectively.

While the improved HBM bandwidth will aid in running more intensive AI workloads, faster communication speeds between chips will enable efficient distribution of workload while training LLMs or inferencing, Vahdat said.

Ironwood also comes with SparseCore and Pathways as features. While SparseCore is a specialized accelerator for processing “ultra-large” embeddings, Pathways is Google’s proprietary machine learning runtime software that enables efficient distributed computing across multiple TPU chips, especially across pods.



Source link

Leave a Comment