Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
Multi-modal large language model designed for audio understanding
Capable of understanding text, audio, vision, video
Foundational Models for State-of-the-Art Speech and Text Translation
Qwen3-omni is a natively end-to-end, omni-modal LLM
VMZ: Model Zoo for Video Modeling
Qwen3-ASR is an open-source series of ASR models
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Portuguese ASR model fine-tuned on XLSR-53 for 16kHz audio input
Russian ASR model fine-tuned on Common Voice and CSS10 datasets