OpenAI Unveils Three New Real-Time Voice Models

OpenAI Unveils Three New Voice Models for Real-Time Speech
© RusPhotoBank

OpenAI has unveiled three new voice models for real-time speech processing: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. Each is tailored to a specific use case, from conversational reasoning to translation and speech recognition.

Headlining the release is GPT-Realtime-2, a flagship audio model packing GPT-5-level reasoning and a larger context window of up to 128,000 tokens. Compared to its predecessor GPT-Realtime-1.5, it delivers a performance boost of approximately 11%. The model handles more fluid dialogue, interjecting clarifying phrases, multitasking, and providing updates on request progress.

The model introduces adjustable reasoning levels, from minimal to very high, letting users balance speed and response quality. In live testing at Zillow, GPT-Realtime-2 boosted successful call rates from 69% to 95%. Pricing remains at $32 per million audio input tokens and $64 per million audio output tokens.

The second model, GPT-Realtime-Translate, is built for real-time speech translation. It handles more than 70 input languages and 13 output languages, while preserving the pace and structure of natural conversation. In tests by BolnaAI, translation error rates for several Indian languages fell by 12.5%. The model costs $0.034 per minute.

The third model, GPT-Realtime-Whisper, handles streaming speech-to-text conversion. It is an evolution of the Whisper family, first introduced in 2022. The model delivers near-real-time transcription and costs $0.017 per minute.

OpenAI states that these new tools are aimed at creating voice assistants, next-generation call centers, and synchronous translation services. All three models are now available to developers via API and come with built-in content filters.