LTX-2.3 is an open-source multimodal artificial intelligence foundation model developed by Lightricks for generating synchronized video and audio from prompts or other inputs. Unlike most earlier video generation systems that only produced silent clips, LTX-2 combines video and audio generation in a unified architecture capable of producing coherent audiovisual scenes. The model uses a diffusion-transformer-based architecture designed to generate high-fidelity visual frames while simultaneously producing corresponding audio elements such as speech, music, ambient sound, or effects. This unified approach allows creators to generate complete multimedia sequences where motion, timing, and sound are aligned automatically. LTX-2 is designed for both research and production workflows and can generate high-resolution video clips with precise control over structure, motion, and camera behavior.
Features
- Unified audio-video generation using a single multimodal AI model
- Text-to-video, image-to-video, and audio-to-video generation capabilities
- Native synchronized audio including dialogue, music, and ambient sound
- High-resolution video generation with configurable frame rates
- Support for fine-tuning and LoRA training on custom datasets
- Open-source pipelines and inference tools for local or production deployment