- Final Participant List for Cyber Pilot Released
- 9 ways to delete yourself from the internet (and hide your identity online)
- Download our CIO Pulse AI priorities survey
- Own an Apple Watch? You might qualify for a settlement payout - how to check
- I switched to LED lightbulbs to save money, but doing so uncovered 5 other benefits
xAI's Grok 3 is better than expected. How to try it for free (before you subscribe)

Elon Musk was an investor in OpenAI when it was founded in 2015. Since then, he’s completely severed his ties with the startup, alleging the company has departed from its original non-profit mission. He created his own AI company, xAI, and with it, a large language model (LLM) called Grok. Now, the company has launched a new model, Grok 3, which is soaring to the top of the chatbot leaderboards.
Grok 3
On Monday, Elon Musk launched xAI’s latest family of AI models, Grok 3, via a live stream. Grok 3 boasts 10 times more training than Grok 2, made possible by xAI’s creation of its own Memphis, Tenn.-based data center, home to 200,000 GPUs.
“We are excited to present Grok 3, which we think is an order of magnitude more capable than Grok 2,” said Musk during the livestream.
The family of models also includes a reasoning model, which builds on Grok 3. Like other reasoning models on the market, including OpenAI’s o1 and o3 models, the Grok 3 Reasoning beta thinks for a bit longer to output higher-quality results.
All Grok 3 models are meant to compete with leading models. Grok 3 competes with OpenAI’s GPT-4o and Google’s Gemini, and Grok 3 Reasoning competes with 03-mini (high), o1, and Deepseek-R1. With less than 24 hours on the market, xAI’s offerings are dominating benchmarks and leaderboards.
Performance
The model’s pre-training ended in early January, and even though it is still undergoing training, Grok 3 has outperformed leading models on AI benchmarks, including the AIME ’24, which tests for mathematical reasoning; GPQA, which tests for proficiency in science, specifically biology, physics, and chemistry; and the LCB Oct-Feb, which tests for coding capabilities.
The Grok 3 reasoning model and Grok 3 mini reasoning model are still being developed, but according to results shared by xAI during the live stream, the betas of both models performed competitively against o3-mini (high), o1, DeepSeek-R1, and Gemini-2 Flash Thinking across the AIME, GPQA, and LCB.
Beyond technical benchmarks, Grok 3 climbed the charts on the Chatbot Arena, a crowdsourced platform where users can evaluate LLMs by chatting with two LLMs side by side and comparing their responses to each other without knowing the models’ names.
BREAKING: @xAI early version of Grok-3 (codename “chocolate”) is now #1 in Arena! 🏆
Grok-3 is:
– First-ever model to break 1400 score!
– #1 across all categories, a milestone that keeps getting harder to achieve
Huge congratulations to @xAI on this milestone! View thread 🧵… https://t.co/p8z8lccNd5 pic.twitter.com/hShGy8ZN1o— lmarena.ai (formerly lmsys.org) (@lmarena_ai) February 18, 2025
Before the official launch of Grok 3, an early version of the model ran in the Arena under the title “chocolate,” and it placed first above Gemini, GPT-4o, DeepSeek r1, and more across all categories. It also became the first model to break a 1400 score in the Arena.
DeepSearch
To meet the demand for agentic capabilities, xAI also launched DeepSearch, which is similar to OpenAI’s and Google’s deep research features. With DeepSearch, users can ask a question, and Grok will think it through, search the web, output its thinking process as it goes, and then generate a final, robust response with data and tables as necessary. This means you can ask it to research a topic, come back 10 minutes later, and the task will be completed.
Also: ChatGPT’s Deep Research just identified 20 jobs it will replace. Is yours on the list?
One of the biggest standouts is being able to scroll through Grok’s thoughts — “reading through the mind of Grok” — and understanding how it landed on its final response. This makes the experience more steerable and helps you better understand your results.
How to access
Starting today, you can access some of the Grok models in beta. Grok 3 is available on X Premium+, which also grants users access to the latest features, an increased usage limit, DeepSearch access, and advanced reasoning modes by clicking on the “Think” or “Big Brain” options.
The X Premium+ subscription costs $40 per month, up from $22 before the announcement was made, as spotted by TechCrunch, and subscribers should update the app to see the updates.
Also: These nations are banning DeepSeek AI – here’s why
xAI also unveiled a new subscription tier, SuperGrok, akin to ChatGPT Pro, meant for super fans who want the earliest access to the most advanced capabilities. This plan’s price is yet to be shared, but you can expect it to be a hefty penny, as OpenAI’s Pro subscription costs $200 per month.
For the most polished version, Musk encourages users to wait a week. By then, a new voice integration will likely be ready to deploy. If you’d rather participate in the Chatbot Arena and let luck show you Grok 3, visit the website, click Arena side-by-side, and then enter a sample prompt. Even though the arena still has an early version of Grok 3, it’s still a powerful model; after all, it reached the top of the leaderboard compared to the other models, which are in their latest versions.