RealtimeTTS is a low-latency text-to-speech library built for real-time applications such as voice chat with LLMs, assistants, and interactive tools. It is designed around a streaming model: you can feed it text incrementally (for example, as an LLM responds) and get audio output almost immediately, which keeps end-to-end latency very low. The library is engine-agnostic and plugs into a wide range of cloud and local TTS systems, including OpenAI, ElevenLabs, Azure, Coqui, Piper, StyleTTS2, Edge TTS, Google TTS, system TTS and others, so you can swap providers without rewriting your pipeline. It supports both internet-based engines and fully local engines, which lets you choose between privacy, cost, and quality trade-offs. RealtimeTTS also includes robustness features such as automatic fallbacks when a backend fails, so production systems can stay responsive even if one TTS provider is temporarily unavailable.
Features
- Streaming text-to-speech designed for near-instant audio output in real time
- Pluggable support for many TTS backends (OpenAI, ElevenLabs, Azure, Coqui, StyleTTS2, Piper, Edge TTS, Google TTS, system TTS and more)
- Fallback mechanism that switches engines automatically to keep audio output reliable
- Flexible installation extras so you only install the engines and dependencies you actually need
- Multilingual documentation and sentence tokenization options suitable for multi-language text
- Simple Python API for feeding strings, generators, or character streams directly from LLM output