---
name: speaker-diarization-cleanup
description: Correct auto-diarized transcripts by merging short speaker swaps and relabeling anonymous speakers from known-voice samples.
title: Speaker Diarization Cleanup
category: voice-audio
difficulty: advanced
icon: 🎚️
input: audio
output: structured-json
phase: post
domain: data
tags: speaker-diarization,audio-processing,transcript-cleanup,voice-embedding,speech-recognition,post-processing,cosine-similarity,speaker-identification,audio-correction,nlp
best_for:
  - Cleaning up automatic speech-to-text transcripts with speaker labels
  - Identifying and relabeling speakers in multi-speaker recordings
  - Removing spurious speaker switches in diarized audio
  - Post-processing interview or meeting transcripts
---

## Description

Takes a diarized transcript (with Speaker 0, Speaker 1 labels) and a folder of reference audio clips for each known speaker. Cleans up two classes of error — spurious short swaps (Speaker 0 for 1.2 seconds in the middle of Speaker 1's turn) and anonymous speakers (renames Speaker 0 to 'Dana' based on voice embedding similarity to a reference).

## Why it works

Auto-diarization has two predictable failure modes that a post-processing pass can fix without retraining: short spurious swaps (easily filtered by duration) and generic speaker labels (easily fixed by matching against known voices). Both are cheap to detect and apply.

## How it works

1. Parse the diarized transcript into turns. 2. Merge turns shorter than a configurable threshold (default 2s) into the neighboring dominant speaker. 3. For each anonymous speaker label, compute an average voice embedding from their turns. 4. Compute embeddings for each reference clip. 5. Match anonymous labels to reference labels by cosine similarity above a threshold; unmatched speakers keep their numeric label. 6. Emit the cleaned transcript + a diff showing what changed.