- Spotlight on Success: EMEA Customers Recognized at Cisco Live Amsterdam 2025
- This Dell Inspiron is one of the most versatile, well-rounded laptops I've tested
- I tried Asus' dual-screen laptop, and it revitalized my workflow in the best way
- Incident Response Planning: A Portion of Planning is Worth a Pound of Gold
- I found the most private and secure way to browse the web - and it isn't incognito mode
Cerebras CEO on DeepSeek: Every time computing gets cheaper, the market gets bigger
![Cerebras CEO on DeepSeek: Every time computing gets cheaper, the market gets bigger Cerebras CEO on DeepSeek: Every time computing gets cheaper, the market gets bigger](https://www.zdnet.com/a/img/resize/a4f57d489c24d65954ec030f266ce5fc9281fff0/2024/08/27/03c3fc66-b1c8-4362-ac5e-7502e28bfe42/cerebras-feldman-2024-large.jpg?auto=webp&fit=crop&height=675&width=1200)
“When you are 50 or 70 times faster than the competition, you can do things they can’t do at all,” says Cerebras CEO Andrew Feldman.
Tiernan Ray/ZDNET
AI computer pioneer Cerebras Systems has been “crushed” with demand to run DeepSeek’s R1 large language model, says company co-founder and CEO Andrew Feldman.
“We are thinking about how to meet the demand; it’s big,” Feldman told me in an interview via Zoom last week.
DeepSeek R1 is heralded by some as a watershed moment for artificial intelligence because the cost of pre-training the model can be as little as one-tenth that of dominant models such as OpenAI’s GPTo1 while having results as good or better.
The impact of DeepSeek on the economics of AI is significant, Feldman indicated. But the more profound result is that it will spur even larger AI systems.
Also: Perplexity lets you try DeepSeek R1 without the security risk
“As we bring down the cost of compute, the market gets bigger and bigger and bigger,” said Feldman.
Numerous AI cloud services rushed to offer DeepSeek inference after the AI model became a sensation, including Cerebras but also much larger firms such as Amazon’s AWS. (You can try Cerebras’s inference service here.)
Cerebras’s edge is speed. According to Feldman, running inference on the company’s CS-3 computers achieves output 57 times faster than other DeepSeek service providers.
Cerebras also highlights its speed relative to other large language models. In a demo of a reasoning problem done by DeepSeek running on Cerebras versus OpenAI’s o1 mini, the Cerebras machine finishes in a second and a half, while o1 takes a full 22 seconds to complete the task.
“This speed can’t be achieved with any number of GPUs,” said Feldman, referring to the chips sold for AI by Nvidia, Advanced Micro Devices, and Intel.
The challenge for anyone hosting DeepSeek is that DeepSeek, like other so-called reasoning models, such as OpenAI’s GPTo1, uses much more computing power when it produces output at inference time, making it harder to deliver results at the user prompt in a timely fashion.
“A basic GPT model does one inference pass through all the parameters for every word” of input at the prompt, Feldman explained.
“These reasoning models, or, chain-of-thought models, do that many times” for each word, “and so they use a great deal more compute at inference time.”
Cerebras followed one standard procedure for companies wanting to run DeepSeek inference: download the R1 neural parameters — or weights — on Hugging Face, then use the parameters to train a smaller open-source model, in this case, Meta Platforms’s Llama 70B, to create a “distillation” of R1.
“We were able to do that extremely quickly, and we were able to produce results that are just plain faster than everybody else — not by a little bit, by a lot,” said Feldman.
Also: I tested DeepSeek’s R1 and V3 coding skills – and we’re not all doomed (yet)
Cerebras’s results with the DeepSeek R1 distilled Llama 70B are comparable to published accuracy benchmarks for the model. Cerebras is not disclosing DeepSeek R1 distilled Llama 70B pricing for inference, but said that it is “Competitively priced, especially for delivering top industry performance.”
DeepSeek’s breakthrough has several implications.
One, it’s a big victory for open-source AI, Feldman indicated, by which he means AI models that post their neural parameters for download. Many of a new AI model’s advances can be replicated by researchers when they have access to the weights, even without having access to the source code. Private models such as GPT-4 do not disclose their weights.
“Open source is having its minute for sure,” said Feldman. “This was the first top-flight open-source reasoning model.”
At the same time that the economics of DeepSeek have stunned the AI world, the advance will lead to a continued investment in cutting-edge chip and networking technology for AI, said Feldman.
Also: Is DeepSeek’s new image model another win for cheaper AI?
“The public markets have been wrong every single time in the past 50 years,” said Feldman, alluding to the massive sell-off in shares of Nvidia and other AI technology providers. “Every time compute has been made less expensive, they [public market investors] have systematically assumed that made the market smaller. And in every single instance, over 50 years, it’s made the market bigger.”
Feldman cited the example of driving down the price of x86 PCs, which led to more PCs being sold and used. Nowadays, he noted, “You have 25 computers in your house. You have one in your pocket, you’ve got one you’re working on, your dishwasher has one, your washing machine has one, your TVs each have one.”
Not only more of the same, but larger and larger AI systems will be built to get results beyond the reach of commodity AI — a point that Feldman has been making since Cerebras’s founding almost a decade ago.
“When you are 50 or 70 times faster than the competition, you can do things they can’t do at all,” he said, alluding to Cerebras’s CS-3 and its chip, the world’s largest semiconductor, the WSE-3. “At some point, differences in degree become differences in kind.”
Also: Apple researchers reveal the secret sauce behind DeepSeek AI
Cerebras started its public inference service last August, demonstrating speeds much faster than most other providers for running generative AI. It claims to be “the world’s fastest AI inference provider.”
Aside from the distilled Llama model, Cerebras is not currently offering the full R1 in inference because doing so is cost-prohibitive for most customers.
“A 671-billion-parameter model is an expensive model to run,” says Feldman, referring to the full R1. “What we saw with Llama 405B was a huge amount of interest at the 70B node and much less at the 405B node because it was way more expensive. That’s where the market is right now.”
Cerebras does have some customers who pay for the full Llama 405B because “they find the added accuracy worth the added cost,” he said.
Cerebras is also betting that privacy and security are features it can use to its advantage. The initial enthusiasm for DeepSeek was followed by numerous reports of concerns with the model’s handling of data.
“If you use their app, your data goes to China,” said Feldman of the Android and iOS native apps from DeepSeek AI. “If you use us, the data is hosted in the US, we don’t store your weights or any of your information, all that stays in the US”
Asked about numerous security vulnerabilities that researchers have publicized about DeepSeek R1, Feldman was philosophical. Some issues will be worked out as the technology matures, he indicated.
Also: Security firm discovers DeepSeek has ‘direct links’ to Chinese government servers
“This industry is moving so fast. Nobody’s seen anything like it,” said Feldman. “It’s getting better week over week, month over month. But is it perfect? No. Should you use an LLM [large language model] to replace your common sense? You should not.”
Following the R1 announcement, Cerebras last Thursday announced it has added support for running Le Chat, the inference prompt run by French AI startup Mistral. When running Le Chat’s “Flash Answers” feature, at 1,100 tokens per second, the model is “10 times faster than popular models such as ChatGPT 4o, Sonnet 3.5, and DeepSeek R1,” claimed Cerebras, “making it the world’s fastest AI assistant.”