Skip to main content
AIDiveForge AIDiveForge
Visit Spanlens

Get This Tool

License: MIT Any use incl. commercial
Local-run terms: MIT license permits commercial use, modification, and distribution. Users can self-host Spanlens server via Docker Compose or single binary without vendor involvement or license restrictions. Source code is publicly available on GitHub.

Share This Tool

Compare This Tool
📋 Embed this tool on your site

Copy this code to embed a compact tool card:

Spanlens

FreemiumOpen SourceAPISelf-Hosted

Summary

Your OpenAI bill doubled last month and nobody can tell you which endpoint, which model, or which user caused it — that's the problem Spanlens exists to solve.

Spanlens sits in front of your LLM provider via a single baseURL change, recording every call's cost, latency, tokens, and full request-response body with no SDK rewrite required. Agent runs surface as waterfall span trees so you can identify the one step consuming 80% of wall-clock time. The model recommender flags GPT-4o calls that look like classification tasks and shows the cost delta if you swap — with numbers from your own traffic, not benchmarks. The eval and experiment layer lets you replay a fixed dataset across prompt versions before you ship, so quality regressions don't surprise you in production. PII scanning and anomaly detection run at log time, which matters when sensitive data crosses the wire at 3 a.m. with nobody watching.

Bottom line: Pick Spanlens when you need to know exactly why your LLM bill spiked and whether your new prompt version is actually better — but plan a different path if your compliance team requires audit-grade PII redaction rather than flag-and-review.

Pricing Plans

Usage-BasedLast verified 2 days ago
Price
$29/mo
Free Tier
50K requests/month, 60 req/min rate limit, 1 seat, 1 workspace, 14-day log retention

Free

Free

For personal projects and exploration

  • 50K requests/month
  • 60 req/min rate limit
  • 1 seat
  • 1 workspace
  • Unlimited projects
  • 14-day log retention
  • All core features included
  • CSV + JSON export
  • Community support

Team

$149per month

For teams that need full visibility

  • 1M requests/month
  • 1,500 req/min rate limit
  • 10 seats
  • 5 workspaces
  • Unlimited projects
  • 365-day log retention
  • Unlimited alerts
  • Email + Slack notifications
  • Webhooks
  • CSV + JSON export
  • Priority support
  • $5 per 100K extra requests

Enterprise

Custom

For large teams with advanced needs

  • Custom requests/month
  • Custom rate limit
  • Unlimited seats
  • Unlimited workspaces
  • Unlimited projects
  • 365-day log retention
  • Unlimited alerts
  • Email + Slack + Discord
  • Webhooks
  • CSV + JSON export
  • SSO (SAML/Okta)
  • Dedicated support + SLA

View full pricing on spanlens.io →

Pricing may have changed since last verified. Check the official site for current plans.

Community Performance Report Card

No community ratings yet. Be the first to rate this tool!

Best For: Teams building agents or multi-step LLM workflows needing deep tracing, Cost-sensitive workloads where model swaps and caching decisions require evidence, Organizations requiring self-hosted observability for data residency, Development teams iterating prompts and models with quantified evaluation, Production systems needing anomaly detection and PII/injection flagging

Community Benchmarks Community

No community benchmarks yet. Be the first to share a real-world data point.

  • Proxy-layer instrumentation via a single baseURL change, so existing code requires no structural rewrite and every provider call is captured from day one rather than after a manual instrumentation sprint.
  • Per-user and per-route cost attribution, which means you can identify the specific customer or endpoint burning disproportionate budget before it compounds across a billing cycle.
  • Agent waterfall trace trees with critical-path highlighting, so a slow or expensive step in a multi-agent run is pinpointed in seconds instead of reproduced manually in a staging environment.
  • Experiment runner replays a fixed dataset across prompt versions and models with quality, cost, and latency compared side by side, which means you ship with evidence that v8 is better than v7 rather than finding out the hard way in production.
  • Self-hosted deployment via Docker Compose under MIT license, so teams with data residency or audit requirements can run the full platform without sending trace data to a third-party cloud.
  • PII detection is regex-based and runs at log time as a flag — not a pre-storage redaction guarantee. Teams operating under HIPAA or SOC 2 controls where sensitive data must never reach a log store, even briefly, need a dedicated redaction layer upstream of Spanlens or a different architecture entirely.
  • The LLM-as-judge eval scoring is a single 0–1 scalar per response. Teams needing structured, multi-criteria evaluation rubrics — for example, factual accuracy scored separately from tone and policy compliance — hit the ceiling of what the built-in scorer expresses and end up maintaining a custom eval harness alongside Spanlens.
  • At high request volumes where the proxy layer adds measurable latency to every call, teams running latency-sensitive production paths at scale have moved to SDK-side instrumentation tools or full APM platforms with LLM plugins, where the observability path is out of band rather than in the critical path.

Community Reviews

No reviews yet. Be the first to share your experience.

About

Platforms
Node.js, Python, Next.js, Edge, self-hosted
API Available
Yes
Self-Hosted
Yes
Last Updated
2026-06-09T11:11:33.805Z

Best For

Who it's for

  • Teams building agents or multi-step LLM workflows needing deep tracing
  • Cost-sensitive workloads where model swaps and caching decisions require evidence
  • Organizations requiring self-hosted observability for data residency
  • Development teams iterating prompts and models with quantified evaluation
  • Production systems needing anomaly detection and PII/injection flagging

What it does well

  • Cost optimization by identifying and swapping expensive model calls with cheaper alternatives
  • Agent debugging via trace trees to pinpoint which step consumed latency or budget
  • Prompt iteration with quantified quality metrics and side-by-side A/B testing
  • Production monitoring with anomaly alerts and PII detection on sensitive workloads
  • Multi-tenant cost attribution to track per-user or per-route LLM spend

Integrations

OpenAIAnthropicGoogle GeminiAzure OpenAIMistralBedrockVertex AIOllamaLangChainLlamaIndexLangGraphVercel AI SDK

Discussion Community

No discussion yet. Sign in to start the conversation.

Spotted incorrect or missing data? Join our community of contributors.

Sign Up to Contribute

Community Notes & Tips Community

Be the first to contribute. General notes, observations, gotchas, and tips from people who use this tool day-to-day.

Frequently Asked Questions

Is Spanlens free?
Spanlens is a paid tool ($29/mo). No permanent free tier is offered.
Is Spanlens open source?
Yes. Spanlens is open source.
Does Spanlens have an API?
Yes. Spanlens exposes a developer API. See the official documentation at https://spanlens.io for details.
Can I self-host Spanlens?
Yes. Spanlens supports self-hosting on your own infrastructure.
What platforms does Spanlens support?
Spanlens is available on: Node.js, Python, Next.js, Edge, self-hosted.

Hours Saved & ROI Stories Community

Be the first to contribute. Concrete time/cost savings, with context. e.g. "Cut my code review backlog from 4h to 45m per week."

Spanlens

Most LLM cost overruns are invisible until the invoice lands. Spanlens intercepts every OpenAI, Anthropic, and Gemini call at the proxy layer — capturing model, tokens, cost, latency, and full body — then organizes that data into per-request logs, daily rollups, budget alerts, and per-user attribution. The SDK (TypeScript and Python) and a CLI that rewrites your baseURL are the two entry points; the vendor describes the CLI path as requiring one line to instrument existing code.

The differentiating feature is the combination of agent trace trees with an experiment layer that reads from the same span store. A multi-step agent run surfaces as a waterfall with critical path highlighted and per-LLM, per-tool cost attributed at each node — so debugging an 18-second support agent means clicking to the exact summarize step rather than reading logs. That same span data feeds the experiment runner: capture real traffic into a dataset, score responses with an LLM-as-judge (0–1 per version), and replay the dataset across prompt versions and models side by side before deploying. The vendor’s own example shows a prompt-plus-model swap from GPT-4o v7 to GPT-4o-mini v8 yielding a 0.11 quality improvement and a 94% cost reduction on the same 320-case dataset.

Spanlens fits teams that are already running agents or multi-step workflows and need evidence — not intuition — to justify model swaps, catch budget spikes, or sign off on a new prompt version. It is self-hostable via Docker Compose under an MIT license, which the vendor states directly, making it viable for organizations with data residency requirements. The PII scanning is regex-based at log time with API keys auto-masked before storage — adequate for flagging and review workflows, not a substitute for a dedicated data-loss-prevention pipeline if your compliance posture requires guaranteed redaction before data touches any storage layer.

The free hosted tier handles up to 50,000 requests per month; above that, paid cloud tiers apply. The SDK version at time of listing adds Ollama and LangGraph tracing, covering local LLM runs alongside hosted providers.