Google’s Gemini 3 is finally here, and we’re impressed with the results, especially when it comes to building simple games.
Gemini 3 Pro is an impressive model, and early benchmarks confirm it.
For example, it tops the LMArena Leaderboard with a score of 1501 Elo. It also offers PhD-level reasoning with top scores on Humanity’s Last Exam (37.5% without the usage of any tools) and GPQA Diamond (91.9%).
Real life results also back these numbers.
Pietro Schirano, who created MagicPath, a vibe coding tool for designers, says we’re entering a new era with Gemini 3.
In his tests, Gemini 3 Pro successfully created a 3D LEGO editor in one shot. This means a single prompt is enough to create simple games in Gemini 3, which is a big deal if you ask me.
I asked Gemini 3 Pro to create a 3D LEGO editor.
In one shot it nailed the UI, complex spatial logic, and all the functionality.We’re entering a new era. pic.twitter.com/Y7OndCB8CK
— Pietro Schirano (@skirano) November 18, 2025
LLMs have been traditionally bad with games, but Gemini 3 shows some improvements in that direction.
It’s also amazing at games.
It recreated the old iOS game called Ridiculous Fishing from just a text prompt, including sound effects and music. pic.twitter.com/XIowqGt4dc— Pietro Schirano (@skirano) November 18, 2025
This aligns with Google’s claims that Gemini 3 Pro redefines multimodal reasoning with 81% on MMMU-Pro and 87.6% on Video-MMMU benchmarks.
“It also scores a state-of-the-art 72.1% on SimpleQA Verified, showing great progress on factual accuracy,” Google noted in a blog post.
“This means Gemini 3 Pro is highly capable of solving complex problems across a vast array of topics like science and mathematics with a high degree of reliability.”
Gemini 3 is impressive in my early tests, but adherence remains an issue
I’ve been using Claude Code for a year now, and it’s been a great help with my Flutter/Dart projects.
Gemini 3 is a better model than Claude Sonnet 4.5, but there are some areas where Claude shines.
So far, no model has come close to Claude Code, particularly with adherence, and Gemini 3 is no exception.
One of the areas is adherence.
I personally found Claude Code better for following instructions. Likewise, Claude Code is also a better CLI than Gemini 3 Pro, which gives it an edge over competitors.
For everything else, Gemini 3 is a better choice, especially if you’ve been using Gemini 2.5 Pro.
If you use LLMs, I’d recommend sticking to Sonnet 4.5 for regular tasks and Gemini 3 Pro for complex queries.
Whether you’re cleaning up old keys or setting guardrails for AI-generated code, this guide helps your team build securely from the start.
Get the cheat sheet and take the guesswork out of secrets management.