WhisperJAV is an open-source speech transcription pipeline designed specifically for generating subtitles for Japanese adult video content. The project addresses challenges that standard speech recognition models face when transcribing this type of audio, which often includes low signal-to-noise ratios and large numbers of non-verbal vocalizations. Traditional automatic speech recognition systems can misinterpret these sounds as words, leading to inaccurate transcripts. WhisperJAV introduces a specialized pipeline that separates text generation from timestamp alignment, allowing the system to generate transcripts and then align them with audio using forced alignment techniques. The framework supports several speech recognition models, including Qwen-based ASR systems and fine-tuned Whisper models trained on domain-specific dialogue.

Features

  • Domain-specific transcription pipeline for Japanese adult video audio
  • Support for multiple ASR models including Qwen3-ASR and anime-whisper
  • Two-stage pipeline separating transcription and timestamp alignment
  • Forced alignment system for precise word-level subtitle timing
  • Multiple processing modes optimized for different audio conditions
  • Configurable sensitivity settings to reduce transcription hallucinations

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow WhisperJAV

WhisperJAV Web Site

Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of WhisperJAV!

Additional Project Details

Programming Language

Python

Related Categories

Python Large Language Models (LLM)

Registered

2 days ago