Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
Large Audio Language Model built for natural interactions
Speech recognition module for Python
Robust Speech Recognition via Large-Scale Weak Supervision
Multi-modal large language model designed for audio understanding
Speech-to-text, text-to-speech, and speaker recognition
Captcha solver extension for humans
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Speech recognition for your site
A free, open source, and extensible speech-to-text application
Voice Recognition to Text Tool
Capable of understanding text, audio, vision, video
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Python Audio Analysis Library: Feature Extraction, Classification
Cross-platform AI language practice app
A library for audio and music analysis, feature extraction
Video translation and dubbing tool powered by LLMs
StreamSpeech is a seamless model for offline speech recognition
Foundational Models for State-of-the-Art Speech and Text Translation
Data manipulation and transformation for audio signal processing
Qwen3-omni is a natively end-to-end, omni-modal LLM
Unofficial (Golang) Go bindings for the Hugging Face Inference API
LLM Large Model of Selling Anchor
Workflow and speech recognition app