- One of the best mid-range sports watches I've tested is on sale for Black Friday
- This monster 240W charger has features I've never seen on other accessories (and get $60 off this Black Friday)
- This laptop power bank has served me well for years, and this Black Friday deal slashes the price in half
- This power bank is thinner than your iPhone and this Black Friday deal slashes 27% off the price
- New Levels, New Devils: The Multifaceted Extortion Tactics Keeping Ransomware Alive
Meta plans the world’s fastest supercomputer for AI
Facebook’s parent company Meta said it is building the world’s largest AI supercomputer to power machine-learning and natural language processing for building its metaverse project.
The new machine, called the Research Super Computer (RSC), will contain 16,000 Nvidia A100 GPUs and 4,000 AMD Epyc Rome 7742 processors. It has 2,000 Nvidia DGX-A100 nodes, with eight GPU chips and two Epyc microprocessors per node. Meta expects to complete construction this year.
RSC is already partially built, with 760 of the DGX-A100 systems deployed. Meta researchers have already started using RSC to train large models in natural language processing (NLP) and computer vision for research with the goal of eventually training models with trillions of parameters, according to Meta.
“Meta has developed what we believe is the world’s fastest supercomputer. We’re calling it RSC for AI Research SuperCluster, and it’ll be complete later this year. The experiences we’re building for the metaverse require enormous compute power (quintillions of operations/second!) and RSC will enable new AI models that can learn from trillions of examples, understand hundreds of languages, and more,” said CEO Mark Zuckerberg in an emailed statement.
RSC is expected to hit a peak performance of 5 exaFLOPS at mixed precision processing, both FP16 and FP32, which would rocket it to the top of the Top500 supercomputer list whose top performing supercomputer can hit 442 Pflop/s. It is being built in partnership with Penguin Computing, a specialist in HPC systems.
Meta is not disclosing where the system is located.
“RSC will help Meta’s AI researchers build new and better AI models that can learn from trillions of examples; work across hundreds of different languages; seamlessly analyze text, images, and video together; develop new augmented reality tools; and much more,” Kevin Lee, a technical program manager, and Shubho Sengupta, a software engineer, both at Meta, wrote in a blog post.
“We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they can seamlessly collaborate on a research project or play an AR game together,” they wrote.
In addition to all of the processing power, RSC also has to 175 petabytes in Pure Storage FlashArray, 46 petabytes in a cache storage, and 10 petabytes of Pure’s object storage equipment.
RSC is estimated to be nine times faster than Meta’s previous research cluster, made up of 22,000 of Nvidia’s older generation V100 GPUs, and 20 times faster than its current AI systems. Meta does not plan to retire the old system.
The company is focused on building learning models for automated tasks focused around content. It wanted this infrastructure in order to train models with more than a trillion parameters on data sets as large as an exabyte, with the goal of getting its arms around all the content generated on its platform.
“By doing this, we can help advance research to perform downstream tasks such as identifying harmful content on our platforms as well as research into embodied AI and multimodal AI to help improve user experiences on our family of apps. We believe this is the first time performance, reliability, security, and privacy have been tackled at such a scale,” Lee and Sengupta wrote.
Copyright © 2022 IDG Communications, Inc.