---
name: quote-attribution-verifier
description: Check every quoted sentence in a draft against the original source to catch misattribution and paraphrasing.
title: Quote Attribution Verifier
category: search-retrieval
difficulty: intermediate
license: Apache-2.0
author: admin
source_url: "https://github.com/stanford-oval/WikiChat"
icon: 🔎
input: text
output: review
phase: post
domain: research
tags: quote-verification,fact-checking,hallucination-detection,source-attribution,document-validation,embedding-similarity,substring-search,paraphrase-detection,misquote-flagging,source-corpus,post-generation-audit,semantic-matching
best_for:
  - Fact-checking long-form articles and reports
  - Academic and research paper validation
  - Editorial review workflows before publication
  - Legal and compliance document auditing
  - Journalistic accuracy verification
---

## Description

Takes a draft document and a source corpus, extracts every quoted passage, and verifies that each one actually appears — verbatim or within a paraphrase-acceptance margin — in the source. Flags hallucinated quotes and subtle word-swap errors.

## Why it works

LLMs are notorious for inventing plausible-sounding quotes, and human editors miss the subtle ones (single word changed, source slightly misremembered). Verifying *after* drafting is cheaper than preventing at generation time and scales to long documents. The tight verbatim-first + paraphrase-fallback logic matches how fact-checkers actually work: exact match is strong evidence, semantic match is a weaker signal that still warrants a flag.

## How it works

1) Regex-extract every sentence between double quotes or within blockquote markdown. 2) For each quote, run a verbatim substring search across every source in the corpus — if found, mark verified. 3) For the remainder, compute embedding similarity against every source sentence; top-1 above threshold becomes a 'paraphrase candidate' (still flagged). 4) For the remainder (no verbatim, no embedding match), mark 'unsupported'. 5) Output a report with line-by-line status and suggested fixes: the nearest real quote from the source, or a recommendation to remove the quote. 6) Optional: rewrite unsupported quotes using an LLM, constrained to only use phrases from the source.
