Google reveals Gemini 2.5 Flash, its 'most cost-efficient thinking model'


Yuichiro Chino/Getty Images

Just weeks after unveiling Gemini 2.5 Pro, Google is on to its next top-performing model. 

On Thursday, the company released an “early version” of Gemini 2.5 Flash in preview in the Gemini API, AI Studio, and Vertex AI. The model has a knowledge cutoff of January 2025. It can take text, images, video, and audio prompts, and has a one-million-token context window. 

Also: Gemini Pro 2.5 is a stunningly capable coding assistant – and a big threat to ChatGPT

Google says the new version expands on Flash 2.0 with improved reasoning, but “without compromising its renowned speed or cost.” Reasoning models spend more time “thinking” — or interpreting a query — before responding, which results in more thorough and direct output that, ideally, aligns better with a user’s needs, compared to earlier models that prioritize speed. Models that reason are also better equipped to accurately deliver on multi-step problems or tasks. 

“Gemini 2.5 Flash performs strongly on Hard Prompts in ChatBot Arena, second only to 2.5 Pro,” Google notes in the announcement. 

Referring to the new model as its most cost-efficient, Google notes that 2.5 Flash “allows developers to configure the amount of thinking it does to maximize performance.” This gives developers a “thinking budget,” or the power to pay for reasoning only when they need it most. With reasoning on, the output price jumps from 60 cents per one million tokens to $3.50. 

screenshot-2025-04-17-at-11-54-19am.png

Screenshot by Radhika Rajkumar/ZDNET

If developers don’t give the model a budget, it determines the query’s thinking needs itself by evaluating the request for complexity. For example, it will identify prompts with minimal reasoning needs — like “How many states are there in the US?” — separately from multi-step math problems. Google notes that to replicate Flash 2.0 latency and cost, developers should set the budget to 0. 

Also: How to try Google’s Veo 2 AI video generator – and what you can do with it

Gemini 2.5 Flash scored 12% on Humanity’s Last Exam (HLE), a new, alternative benchmark to industry tests that have become too easy for rapidly evolving models. This score outperformed competitor models, including Claude 3.7 Sonnet and DeepSeek R1, but not OpenAI’s just-launched o4-mini, which came in at 14% on the test. 

You can try Gemini 2.5 Flash in preview through the Gemini API in Google AI Studio and Vertex AI. 

Want more stories about AI? Sign up for Innovation, our weekly newsletter.





Source link

Leave a Comment