---
name: receipt-ocr-normalizer
description: Extract vendor, line items, tax, and total from a photo of a receipt into a single clean JSON row — ready for expense systems.
title: Receipt OCR Normalizer
category: data-parsing
difficulty: beginner
author: admin
icon: 🧾
input: image
output: structured-json
phase: post
domain: data
tags: ocr,receipt-parsing,json-normalization,arithmetic-validation,vision-llm,expense-extraction,data-cleaning,vendor-deduplication,financial-data,schema-enforcement,error-recovery,manual-review-flagging
best_for:
  - expense report automation
  - receipt digitization and archival
  - accounting system data ingestion
  - fraud detection and variance analysis
---

## Description

Input: a photo or PDF of a receipt. Output: a normalized JSON row with vendor name, ISO date, currency, per-item lines (description, qty, unit price, total), subtotal, tax, tip, grand total. Dollar amounts are numeric, not strings. Rows where totals don't sum get flagged instead of silently saved.

## Why it works

OCR alone gives you a noisy text dump. Most receipts can be reconstructed with high confidence if you enforce the arithmetic constraint — line items must sum to subtotal, subtotal plus tax plus tip must equal grand total. Using that constraint as a validation step catches most OCR errors automatically.

## How it works

1. Run the image through a vision-capable LLM with a JSON schema that includes every field. 2. Apply the arithmetic checks: sum(lines) == subtotal ± 0.02, subtotal + tax + tip == total ± 0.02. 3. If checks fail, re-prompt with the original image plus the failing calculation and ask for the single most likely OCR error. 4. On second failure, emit the row with a 'manual_review' flag and the reason. 5. Normalize vendor name against a known-vendors table to prevent 'MCDONALDS' / 'Mc Donalds' / 'McDonald's' duplicates.
