Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
Large Audio Language Model built for natural interactions
Speech recognition module for Python
Robust Speech Recognition via Large-Scale Weak Supervision
Multi-modal large language model designed for audio understanding
Speech-to-text, text-to-speech, and speaker recognition
Fast and accurate automatic speech recognition (ASR) for edge devices
Captcha solver extension for humans
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
HTML5 js recording mp3 wav ogg webm amr format
Speech recognition for your site
A free, open source, and extensible speech-to-text application
Voice Recognition to Text Tool
Capable of understanding text, audio, vision, video
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Python Audio Analysis Library: Feature Extraction, Classification
Omnilingual ASR Open-Source Multilingual SpeechRecognition
A library for audio and music analysis, feature extraction
Cross-platform AI language practice app
Video translation and dubbing tool powered by LLMs
Data manipulation and transformation for audio signal processing
StreamSpeech is a seamless model for offline speech recognition
Foundational Models for State-of-the-Art Speech and Text Translation
Qwen3-omni is a natively end-to-end, omni-modal LLM