Gemini
Our most intelligent AI models
Gemini 2.5 models are capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy.
Model family
Gemini 2.5 builds on the best of Gemini — with native multimodality and a long context window.
Hands-on with Gemini 2.5
See how Gemini 2.5 uses its reasoning capabilities to create interactive simulations and do advanced coding.
Adaptive and budgeted thinking
Adaptive controls and adjustable thinking budgets allow you to balance performance and cost.
-
Calibrated
The model explores diverse thinking strategies, leading to more accurate and relevant outputs.
-
Controllable
Developers have fine-grained control over the model's thinking process, allowing them to manage resource usage.
-
Adaptive
When no thinking budget is set, the model assesses the complexity of a task and calibrates the amount of thinking accordingly.
Benchmarks
In addition to its strong performance on academic benchmarks, Gemini 2.5 tops the popular coding leaderboard WebDev Arena.
Benchmark | Gemini 2.5 Flash-Lite Preview 06-17 Non-thinking | Gemini 2.5 Flash-Lite Preview 06-17 Thinking | Gemini 2.5 Flash Non-thinking | Gemini 2.5 Flash Thinking View 2.5 Flash | Gemini 2.5 Pro Thinking View 2.5 Pro | |
---|---|---|---|---|---|---|
Input price | $/1M tokens (no caching) | $0.10 | $0.10 | $0.30 | $0.30 | $1.25 $2.50 > 200k tokens |
Output price | $/1M tokens | $0.40 | $0.40 | $2.50 | $2.50 | $10.00 $15.00 > 200k tokens |
Reasoning & knowledge Humanity's Last Exam (no tools) | 5.1% | 6.9% | 8.4% | 11.0% | 21.6% | |
Science GPQA diamond | 64.6% | 66.7% | 78.3% | 82.8% | 86.4% | |
Mathematics AIME 2025 | 49.8% | 63.1% | 61.6% | 72.0% | 88.0% | |
Code generation LiveCodeBench (UI: 1/1/2025-5/1/2025) | 33.7% | 34.3% | 41.1% | 55.4% | 69.0% | |
Code editing Aider Polyglot | 26.7% | 27.1% | 44.0% | 56.7% | 82.2% | |
Agentic coding SWE-bench Verified | single attempt | 31.6% | 27.6% | 50.0% | 48.9% | 59.6% |
| multiple attempts | 42.6% | 44.9% | 60.0% | 60.3% | 67.2% |
Factuality SimpleQA | 10.7% | 13.0% | 25.8% | 26.9% | 54.0% | |
Factuality FACTS grounding | 84.1% | 86.8% | 83.4% | 85.3% | 87.8% | |
Visual reasoning MMMU | 72.9% | 72.9% | 76.9% | 79.7% | 82.0% | |
Image understanding Vibe-Eval (Reka) | 51.3% | 57.5% | 66.2% | 65.4% | 67.2% | |
Long context MRCR v2 (8-needle) | 128k (average) | 16.6% | 30.6% | 34.1% | 54.3% | 58.0% |
| 1M (pointwise) | 4.1% | 5.4% | 16.8% | 21.0% | 16.4% |
Multilingual performance Global MMLU (Lite) | 81.1% | 84.5% | 85.8% | 88.4% | 89.2% |