Update24 Apr 2026

Google Splits Its TPU Into Two Chips for the First Time — TPU 8t Delivers 12.6 PetaFLOPS for Training While TPU 8i Targets Low-Latency Inference With 80% Better Price-Performance Than Ironwood

Google announced TPU 8t and TPU 8i at Cloud Next on 22 April, the first time the company has purpose-built separate training and inference chips. TPU 8t scales to 9,600 accelerators per superpod with 2.8x faster training than Ironwood, while TPU 8i delivers 80 per cent higher inference performance per dollar with a five-fold reduction in synchronisation latencies.

Google announced the eighth generation of its Tensor Processing Unit at Cloud Next on 22 April, splitting the TPU line into two purpose-built variants for the first time in its history. The TPU 8t is optimised for training and delivers up to 12.6 petaFLOPS of 4-bit floating-point compute per chip, with 216 GB of high-bandwidth memory at 6.5 TB/s, 128 MB of on-chip SRAM, and 19.2 Tbps chip-to-chip bandwidth. A single superpod scales to 9,600 accelerators connected via optical-circuit switches, with 2 petabytes of shared HBM and the ability to connect multiple pods through Google's Virgo Network — enabling up to 134,000 TPUs per data centre and 1 million across sites. Google claims the TPU 8t is 2.8 times faster at training than the previous Ironwood generation and achieves 97 per cent 'goodput,' meaning actual training time versus downtime.

The TPU 8i is built specifically for inference workloads, prioritising low latency for real-time agentic AI applications. It delivers 10.1 petaFLOPS of FP4 compute with 288 GB of HBM at 8.6 TB/s bandwidth and 384 MB of on-chip SRAM — triple the SRAM of the training variant, allowing a model's active working set to reside entirely on-chip. The standout feature is a new Collective Acceleration Engine that reduces synchronisation latencies by five-fold, which is critical for multi-agent systems where dozens of models need to coordinate responses in real time. Google claims 80 per cent higher performance per dollar for LLM inference compared to Ironwood, meaning enterprises can serve twice the users at the same cost. The company is also replacing x86 CPUs with ARM-based Axion processors as TPU hosts.

For context engineers, the architectural decision to split training and inference into separate silicon is the most significant detail. Google's statement — 'forget one chip to rule them all' — acknowledges that the compute profiles for training frontier models and serving them at scale are fundamentally different problems requiring different hardware solutions. Training demands massive floating-point throughput and interconnect bandwidth across thousands of chips; inference demands low latency, high SRAM for model weights, and efficient synchronisation for agentic workflows where multiple models collaborate. This bifurcation mirrors what developers already know from software architecture: the same system rarely optimises well for both batch processing and real-time serving. Both chips reach general availability later in 2026, and they will underpin Google's entire Gemini infrastructure — including the Gemini Enterprise Agent Platform announced at the same conference.

Read original source

Join the Conversation

Discuss this with developers building with AI tools every day in the COR community.

Join Discord

Update

Apple Intelligence Finally Approved for China — Alibaba's Qwen to Power AI Features Across iOS, iPadOS, macOS and visionOS After 22-Month Regulatory Wait

Update

ASML Smashes Q2 Estimates on AI Chip Demand — Raises Full-Year Guidance to €43–45 Billion as Stock Surges 75% Year-to-Date and EUV Capacity Is Fully Booked Through 2027

Update

China's Anthropomorphic AI Rules Take Effect Today — ByteDance Doubao and Alibaba Qwen Forced to Disable Humanlike Agent Features as Beijing Draws Line on Emotional AI

Google Splits Its TPU Into Two Chips for the First Time — TPU 8t Delivers 12.6 PetaFLOPS for Training While TPU 8i Targets Low-Latency Inference With 80% Better Price-Performance Than Ironwood

Join the Conversation

Related Posts

Apple Intelligence Finally Approved for China — Alibaba's Qwen to Power AI Features Across iOS, iPadOS, macOS and visionOS After 22-Month Regulatory Wait

ASML Smashes Q2 Estimates on AI Chip Demand — Raises Full-Year Guidance to €43–45 Billion as Stock Surges 75% Year-to-Date and EUV Capacity Is Fully Booked Through 2027

China's Anthropomorphic AI Rules Take Effect Today — ByteDance Doubao and Alibaba Qwen Forced to Disable Humanlike Agent Features as Beijing Draws Line on Emotional AI

Google Splits Its TPU Into Two Chips for the First Time — TPU 8t Delivers 12.6 PetaFLOPS for Training While TPU 8i Targets Low-Latency Inference With 80% Better Price-Performance Than Ironwood

Join the Conversation

Related Posts

Apple Intelligence Finally Approved for China — Alibaba's Qwen to Power AI Features Across iOS, iPadOS, macOS and visionOS After 22-Month Regulatory Wait

ASML Smashes Q2 Estimates on AI Chip Demand — Raises Full-Year Guidance to €43–45 Billion as Stock Surges 75% Year-to-Date and EUV Capacity Is Fully Booked Through 2027

China's Anthropomorphic AI Rules Take Effect Today — ByteDance Doubao and Alibaba Qwen Forced to Disable Humanlike Agent Features as Beijing Draws Line on Emotional AI