A simple, high-quality voice conversion tool focused on ease of use
GUI for a Vocal Remover that uses Deep Neural Networks
A state-of-the-art open visual language model
The most powerful and modular diffusion model GUI, api and backend
GUI Exploration Lab. One of the best GUI agent solutions
Fast stable diffusion on CPU and AI PC
A GUI tool for extracting hard-coded subtitle (hardsub) from videos
Real-World Centric Foundation GUI Agents
Framework and no-code GUI for fine-tuning LLMs
An open sourced end-to-end VLM-based GUI Agent
Generate audiobooks from e-books
UI-TARS-desktop version that can operate on your local personal device
Witness the aha moment of VLM with less than $3
Generate audiobooks from e-books, voice cloning & 1107+ languages
Image polygonal annotation with Python
Agent framework and applications built upon Qwen>=3.0
Clone a voice in 5 seconds to generate arbitrary speech in real-time
GUI/CLI tool for downloading Xiaohongshu
Convert AI papers to GUI
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
The AI toolkit for the AI developer
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Weaving the Digital Agent Galaxy
Enable AI to control your desktop, mobile and HMI devices
A simple screen parsing tool towards pure vision based GUI agent