Google Unveils Gemini 2.5 Pro: We Tested It With Rubik’s Cube Simulation

If you’ve been following AI lately, you know reasoning is the next big thing. AI models aren’t just completing sentences anymore — they’re solving problems, making decisions, and thinking through complex scenarios. Now, Google’s new Gemini 2.5 Pro has entered the arena, claiming to outthink every other reasoning AI model out there.

So, what’s new in Gemini 2.5 Pro? And how does it actually stack up against other top models likeOpenAI’s o3 Mini,DeepSeek R1, andGrok 3 Thinking?I tested them all using real-world prompts.

What is Gemini 2.5 Pro?

Google just unveiled its most powerful AI model — Gemini 2.5 Pro. It’s a reasoning model, so it can solve complex problems by thinking step-by-step to reach a logical answer. It understands multimodal inputs — like text, images, audio, and video. It’s currently available to Gemini Advanced users and free to try in Google’s AI Studio.

Gemini 2.5 scored 18.8% on Humanity’s Last Exam — the highest among all reasoning models without using tools or search. HLE is a rigorous benchmark designed to assess AI models’ expert-level reasoning across various subjects. For context, o3 Mini scored 14%, andDeepSeek R1scored 8.6%.

Gemini 2.5 Pro also beat others in multiple benchmarks and claimed to be much better at reasoning and coding. In LMArena, where users vote for the better answer, Gemini 2.5 Pro topped the chart with a score of 1,443 — higher than any other AI model out there. The only model that beat it in one test wasChatGPT’s Deep Researchmodel with 26.6%, but that isn’t a reasoning model.

Here’s what you need to get excited about in Gemini 2.5 Pro:

As you can see, the model’s major advantage is coding — especially where logic and multimodal understanding are involved. So, let’s see how it performs in real-world tests compared to other popular reasoning models out there.

Gemini 2.5 Pro vs Other AI Reasoning Models

Since the model is strong in multimodal understanding and coding, I started by testing those areas.

1. Rubik’s Cube Simulation (Code Test)

First, I provided a detailed prompt to create a Rubik’s Cube simulation with scramble and solve options. I asked for it in p5.js without HTML and listed all the features, functions, and technical tools needed to create the animations.

To my surprise, Gemini delivered. While the solve option isn’t working perfectly, I was able to manually rotate the cube and use the scramble option successfully.

I also tested it with other models, but none of them delivered proper results. To be frank, Gemini 2.5 Pro is the first model to get the simulations and demos right. Simply put, this wasn’t possible with any other AI model before.

2. Logic Puzzle (Reasoning Test)

We also tested some reasoning-based prompts. Here’s one. This question doesn’t have a definitive answer:

Let’s see which model can figure out that this is a paradox. Gemini took just 24 seconds to identify that it’s a paradoxical situation with no clear answer. OpenAI’s o3 Mini and Grok both took around 40 seconds and predicted the right answer. DeepSeek R1, however, took 434 seconds and got it wrong the first time—though it did get it right when asked again.

This isn’t just a one-off case. DeepSeek tends to stumble on more complex questions. That said, the overall difference isn’t huge, as most models correctly predicted the answers using logic in most cases.

3. Physics Problem (Math Test)

Next, I tested all the models with some math tests. o3 Mini has lead the math until now, however, Gemini 2.5 Pro scored better benchmarks. Here is one example of all.

All models solved this accurately and provided clear, step-by-step explanations. While Gemini leads in math benchmarks, the actual performance gap is minimal — all models handled most problems well.

Gemini 2.5 Pro

Gemini 2.5 Pro is a massive improvement over 2.0 Flash Thinking. However, it’s more or less on the same level as models like o3 Mini,Grok 3,or DeepSeek R1. That said, when it comes to multimodal understanding, this model finally delivers much better results. Apart from that, we can now say that Gemini has officially joined the level of other models when it comes to reasoning.

Ravi Teja KNTS

Tech writer with over 4 years of experience at TechWiser, where he has authored more than 700 articles on AI, Google apps, Chrome OS, Discord, and Android. His journey started with a passion for discussing technology and helping others in online forums, which naturally grew into a career in tech journalism. Ravi’s writing focuses on simplifying technology, making it accessible and jargon-free for readers. When he’s not breaking down the latest tech, he’s often immersed in a classic film – a true cinephile at heart.

Google Unveils Gemini 2.5 Pro: We Tested It With Rubik’s Cube Simulation