Google reveals Gemini 2.5 Flash, its 'most cost-efficient…

Yuichiro Chino/Getty Images

Just weeks after unveiling Gemini 2.5 Pro, Google is on to its next top-performing model.

On Thursday, the company released an “early version” of Gemini 2.5 Flash in preview in the Gemini API, AI Studio, and Vertex AI. The model has a knowledge cutoff of January 2025. It can take text, images, video, and audio prompts, and has a one-million-token context window.

Also: Gemini Pro 2.5 is a stunningly capable coding assistant – and a big threat to ChatGPT

Google says the new version expands on Flash 2.0 with improved reasoning, but “without compromising its renowned speed or cost.” Reasoning models spend more time “thinking” — or interpreting a query — before responding, which results in more thorough and direct output that, ideally, aligns better with a user’s needs, compared to earlier models that prioritize speed. Models that reason are also better equipped to accurately deliver on multi-step problems or tasks.

“Gemini 2.5 Flash performs strongly on Hard Prompts in ChatBot Arena, second only to 2.5 Pro,” Google notes in the announcement.

Referring to the new model as its most cost-efficient, Google notes that 2.5 Flash “allows developers to configure the amount of thinking it does to maximize performance.” This gives developers a “thinking budget,” or the power to pay for reasoning only when they need it most. With reasoning on, the output price jumps from 60 cents per one million tokens to $3.50.

If developers don’t give the model a budget, it determines the query’s thinking needs itself by evaluating the request for complexity. For example, it will identify prompts with minimal reasoning needs — like “How many states are there in the US?” — separately from multi-step math problems. Google notes that to replicate Flash 2.0 latency and cost, developers should set the budget to 0.

Also: How to try Google’s Veo 2 AI video generator – and what you can do with it

Gemini 2.5 Flash scored 12% on Humanity’s Last Exam (HLE), a new, alternative benchmark to industry tests that have become too easy for rapidly evolving models. This score outperformed competitor models, including Claude 3.7 Sonnet and DeepSeek R1, but not OpenAI’s just-launched o4-mini, which came in at 14% on the test.

You can try Gemini 2.5 Flash in preview through the Gemini API in Google AI Studio and Vertex AI.

Want more stories about AI? Sign up for Innovation, our weekly newsletter.

Source link

Google reveals Gemini 2.5 Flash, its 'most cost-efficient thinking model'

VMWARE

Helping Public Sector Organisations Define Cloud Strategy

How to change the VLAN ID of the Service Console in ESX from the command line/console

Cisco UCS and Vmware Interfaces (Vnics) HA Design Considerations

Troubleshooting network and TCP/UDP port connectivity issues on ESX/ESXi(2020669)

vSphere Client Parameters

Configuration Templates

CUE Licenses

Trouble shooting Unity Express with Call Manager Integeration & Operational Issues

CME Configuration Example: SIP Trunks to Viatalk and VoIP.ms

SIP Phone registration – CME Configuration

CUE Voicemail + VPIM networking (CUE to unity)

Related Post

VMWARE

Configuration Templates