Google has released Gemma 4, a new family of open models designed specifically for advanced reasoning and agentic workflows. The release includes four sizes — Effective 2B (E2B), Effective 4B (E4B), a 26B Mixture of Experts model, and a 31B Dense model — giving developers options across the performance-to-efficiency spectrum. The 31B model currently ranks third among all open models on the Arena AI text leaderboard, with the 26B MoE variant securing sixth place.
The performance improvements over Gemma 3 are substantial. Google claims Gemma 4 is up to 4x faster than previous versions while using up to 60% less battery on mobile devices. The models support multimodal inputs including text, images, and audio, with enhanced capabilities for chain-of-thought reasoning, mathematical problem solving, and image understanding including OCR and chart analysis. Language support has expanded to over 140 languages.
For the on-device story, Google simultaneously announced the AICore Developer Preview, which brings Gemma 4 to Android devices via the E2B and E4B variants. The developer preview adds support for tool calling, structured output, system prompts, and thinking mode — capabilities that make on-device agentic workflows practical for the first time. Gemini Nano 4-enabled devices are expected later this year.
For context engineers, Gemma 4 represents the open-source ecosystem catching up to proprietary models on agentic capability. The combination of strong reasoning, tool calling support, and efficient on-device deployment opens up use cases where developers need AI that runs locally — privacy-sensitive applications, offline-capable agents, and edge computing scenarios where cloud latency is unacceptable.