---
name: brand-voice-extractor
description: "Sample existing marketing copy to produce a voice guide — do/don't, vocabulary, rhythm rules — that new drafts can be checked against."
title: Brand Voice Extractor
category: design-media
difficulty: intermediate
icon: 🗣️
input: text
output: structured-json
phase: post
domain: content
tags: brand-voice,copy-analysis,nlp-corpus,style-guide,marketing-copy,linguistic-stats,voice-consistency,writing-checklist,tone-guide,bigram-trigram
best_for:
  - Marketing teams scaling copy production
  - Brand consistency across writers
  - Onboarding new copywriters
  - Auditing off-brand copy
  - Content platforms with multiple contributors
---

## Description

Input: 10-50 pieces of existing brand copy (blog posts, ads, social). Output: a voice guide with Do and Don't examples pulled from the source material, a vocabulary list of high-frequency brand words and phrases to avoid, and sentence-rhythm rules (average length, contraction frequency, question-to-statement ratio).

## Why it works

Brand voice guides written from scratch tend to be aspirational and ignored. Voice guides derived from corpus statistics of what the brand actually publishes describe the real voice — so new writers can match it, and outlier drafts are easy to spot.

## How it works

1. Tokenize and stat the corpus: sentence length distribution, contraction rate, question rate, top bigrams and trigrams. 2. Cluster sentences by topic and ask the LLM to identify the 'most representative' three per cluster — those become Do examples. 3. Compare the brand corpus against a generic-marketing baseline and flag words that are over-represented (brand vocabulary) or under-represented (words the brand avoids). 4. Emit a voice guide with the stats up top, Do/Don't next, and a checklist at the bottom for new drafts.