Update8 May 2026

OpenAI Ships Three Real-Time Audio APIs — GPT-Realtime-2 for Voice Reasoning, Realtime-Translate for 70+ Languages, and Realtime-Whisper for Live Transcription

OpenAI launched three new audio APIs on 8 May: GPT-Realtime-2 delivers voice-native reasoning for conversational AI, Realtime-Translate supports real-time translation across 70+ languages, and Realtime-Whisper provides production-grade live transcription. The trio positions OpenAI to capture the voice-first AI application market.

OpenAI expanded its real-time capabilities on 8 May with the simultaneous launch of three production-ready audio APIs, each targeting a different slice of the voice AI market.

GPT-Realtime-2 is the flagship: a voice-native reasoning model that processes speech directly without intermediate text conversion, enabling more natural conversational AI applications. Unlike previous voice APIs that converted speech to text, processed it through a language model, and then synthesised audio output, Realtime-2 maintains an end-to-end audio pipeline that preserves tone, emotion, and conversational context.

Realtime-Translate supports real-time translation across more than 70 languages, targeting applications from customer support to live event interpretation. The model handles code-switching — conversations that move between languages mid-sentence — which has historically been a weak point for translation APIs.

Realtime-Whisper is the production-grade evolution of OpenAI's Whisper transcription model, optimised for live streaming scenarios with sub-second latency. It includes speaker diarisation (identifying who said what in multi-speaker conversations), punctuation and formatting, and domain-specific vocabulary handling.

For context engineers building voice-enabled applications, the three APIs create a complete audio stack: transcribe with Realtime-Whisper, reason with Realtime-2, translate with Realtime-Translate. The pricing model follows OpenAI's standard per-token structure, with audio tokens priced at a premium over text. Developers can mix and match the three APIs or use them independently. The launch positions OpenAI against ElevenLabs (voice synthesis), Deepgram (transcription), and Google's Gemini Live (multimodal conversation) in the rapidly growing voice AI market.

Read original source

Join the Conversation

Discuss this with developers building with AI tools every day in the COR community.

Join Discord

Update

Apple Intelligence Finally Approved for China — Alibaba's Qwen to Power AI Features Across iOS, iPadOS, macOS and visionOS After 22-Month Regulatory Wait

Update

ASML Smashes Q2 Estimates on AI Chip Demand — Raises Full-Year Guidance to €43–45 Billion as Stock Surges 75% Year-to-Date and EUV Capacity Is Fully Booked Through 2027

Update

China's Anthropomorphic AI Rules Take Effect Today — ByteDance Doubao and Alibaba Qwen Forced to Disable Humanlike Agent Features as Beijing Draws Line on Emotional AI

OpenAI Ships Three Real-Time Audio APIs — GPT-Realtime-2 for Voice Reasoning, Realtime-Translate for 70+ Languages, and Realtime-Whisper for Live Transcription

Join the Conversation

Related Posts

Apple Intelligence Finally Approved for China — Alibaba's Qwen to Power AI Features Across iOS, iPadOS, macOS and visionOS After 22-Month Regulatory Wait

ASML Smashes Q2 Estimates on AI Chip Demand — Raises Full-Year Guidance to €43–45 Billion as Stock Surges 75% Year-to-Date and EUV Capacity Is Fully Booked Through 2027

China's Anthropomorphic AI Rules Take Effect Today — ByteDance Doubao and Alibaba Qwen Forced to Disable Humanlike Agent Features as Beijing Draws Line on Emotional AI

OpenAI Ships Three Real-Time Audio APIs — GPT-Realtime-2 for Voice Reasoning, Realtime-Translate for 70+ Languages, and Realtime-Whisper for Live Transcription

Join the Conversation

Related Posts

Apple Intelligence Finally Approved for China — Alibaba's Qwen to Power AI Features Across iOS, iPadOS, macOS and visionOS After 22-Month Regulatory Wait

ASML Smashes Q2 Estimates on AI Chip Demand — Raises Full-Year Guidance to €43–45 Billion as Stock Surges 75% Year-to-Date and EUV Capacity Is Fully Booked Through 2027

China's Anthropomorphic AI Rules Take Effect Today — ByteDance Doubao and Alibaba Qwen Forced to Disable Humanlike Agent Features as Beijing Draws Line on Emotional AI