Open Source Audio & Voice Tools

As of June 2026, AIDiveForge tracks 4 open source audio & voice tools. Curated open source audio & voice tools tracked by AIDiveForge. Each project has a verified public source repository. Listings are verified against each tool's live website and re-checked regularly.

Last updated June 18, 2026 · 4 tools

1. DJ Mix
The application runs two Magenta RealTime 2 model decks locally on Apple Silicon, letting you crossfade, EQ, and cue between AI-generated audio streams in real time. Text prompts steer what each deck generates next; a Pioneer DDJ-FLX4 maps to the full hardware surface if you have one. Stable Audio 3 handles pad generation and finished track renders alongside the live decks. The hard ceiling is the hardware requirement — Apple Silicon only, with roughly 13 GB of model weights to download before you touch anything. Teams on Linux or Windows have no path forward here.
FreeOpen Source
2. Kami Subs
The pipeline is fixed and local: the browser extension captures tab audio, faster-whisper transcribes it, a translation layer converts it, and the result overlays directly on the video — no API keys, no per-minute billing, no audio leaving the device. It works on YouTube, Twitch, Vimeo, podcasts, and lecture streams, with one hard constraint: DRM-protected content is off-limits. The self-hosted backend means setup requires a working Python environment and a GPU capable of running faster-whisper at acceptable latency — that's a real installation step, not a one-click install. Community activity on the repository is minimal at the time of listing, so expect to self-diagnose when something breaks.
FreeOpen Source
3. Whisper
Whisper solves the transcription bottleneck: turning audio from meetings, interviews, and podcasts into searchable text. It's trained on 680,000 hours of multilingual audio, so it handles accents and background noise better than most competitors. OpenAI charges $0.006 per minute of audio via API, with a free tier capped at modest monthly usage. The catch is real: heavy users quickly hit rate limits, and the free tier vanishes once you scale beyond hobbyist volume. You're paying per minute consumed, not per month.
FreeOpen Source
4. Whissle Gateway
Whissle's Stream2Action architecture feeds audio, text, or video through a single-pass discriminative model — META-1 — and returns structured JSON carrying transcription, speaker diarization, emotion, intent, age, gender, and entities simultaneously. The full stack (ASR, LLM, TTS, diarization) runs self-hosted on a single GPU via Docker, which is the core production story here. The cloud API is documented as temporarily down while on-prem infrastructure is reinforced, so teams who need cloud failover have no fallback path right now. Video input is on a stated roadmap; text streaming arrives next. For contact center or privacy-sensitive workloads where you control the hardware, the on-prem path is active — for anything cloud-dependent, you are waiting.
PaidOpen Source

Listings on this page are sourced and verified by the AIDiveForge data pipeline. AIDiveForge is editorially independent — no money changes hands for inclusion.

Open Source Audio & Voice Tools

1. DJ Mix

2. Kami Subs

3. Whisper

4. Whissle Gateway