Get This Tool
Honcho
Summary
Most agent memory systems are glorified key-value stores — they retrieve what you wrote, not what it means, and your context window fills with facts the user mentioned once six weeks ago. Honcho is a reasoning layer that sits between your agent and its memory, built to infer patterns and surface the context that actually matters.
Every message written to Honcho triggers automatic reasoning via the vendor's Neuromancer model, which learns user psychology and behavioral patterns rather than just indexing text. The `context()` call returns a curated summary plus conversation history shaped to a token budget you set — the vendor claims 60–90% token reduction versus naive retrieval. Multi-participant sessions model each peer separately, so a group conversation doesn't collapse everyone's state into one blob. The ceiling appears when you need reasoning beyond user memory — Honcho does not run tasks, make decisions, or coordinate agents; it only informs them. Teams building full autonomous pipelines still wire Honcho into a separate orchestration layer.
Bottom line: Pick Honcho for a personal assistant that should get smarter with every conversation; plan a separate execution layer if your agent needs to do more than remember.
Pricing Plans
Usage-BasedLast verified 2 days ago- Free Tier
- Standard dreaming included with every workspace
HONCHO MEMORY - Ingestion
Store + Neuromancer reasoning
- $2.00/M tokens
- context() with no limits
- ~200ms response time
HONCHO REASONING - Minimal
Basic conclusions u2014 instant
- $0.001 per query
- Single semantic search
- One lookup
HONCHO REASONING - Low
Efficient synthesis u2014 instant
- $0.01 per query
- Conclusions with surrounding context
- Default tier
HONCHO REASONING - Medium
Steerable reasoning u2014 fast
- $0.05 per query
- Multiple searches
- Directed synthesis
HONCHO REASONING - High
Deep synthesis u2014 async
- $0.10 per query
- Multi-pass analysis
- Patterns over time
HONCHO REASONING - Max
Research-grade u2014 async
- $0.50 per query
- Exhaustive full history search
- Quantitative methods
STARTUPS (<$5M RAISED)
Subsidized startup program
- $1,000 in credits
- 12 months subsidized pricing
- Integration support
ENTERPRISE
Custom plans with dedicated support
- Custom pricing
- Forward-deployed engineers
- Dedicated integration and maintenance support
View full pricing on honcho.dev →
Pricing may have changed since last verified. Check the official site for current plans.
Community Performance Report Card
No community ratings yet. Be the first to rate this tool!
Community Benchmarks Community
Sign in to submit a benchmarkNo community benchmarks yet. Be the first to share a real-world data point.
Pros
Sign in to edit- Reasoning-first memory via the Neuromancer model infers behavioral patterns rather than returning raw stored text, so agents stop re-asking questions the user already answered three sessions ago.
- Token budget enforcement on `context()` means you get the 10K tokens that matter instead of dumping 100K of history into every prompt, which keeps per-call costs from compounding as conversation history grows.
- Multi-peer session modeling keeps each participant's state separate, so a group conversation doesn't corrupt individual user context — something flat key-value stores cannot express at all.
- AGPL-3.0 licensing with a self-hosted FastAPI deployment path means teams with data residency requirements can run the full stack on their own infrastructure rather than routing user data through a third-party cloud.
- Provider-agnostic design means swapping the underlying LLM for a cheaper or on-premises model is a configuration change, not a migration — protecting the investment when model pricing shifts.
Cons
Sign in to edit- Honcho is memory infrastructure, not an execution engine — it has no task runner, no branching logic, and no agent coordination. Teams that start with Honcho and then need agents to act on remembered context still build a full orchestration layer on top, at which point Honcho is one dependency among several rather than a standalone solution.
- AGPL-3.0 licensing blocks commercial products from embedding Honcho without open-sourcing their own code or negotiating a separate commercial license. Teams building proprietary SaaS that want to bundle memory infrastructure discover this constraint when legal reviews the dependency, and some switch to MIT-licensed alternatives or vendor-specific memory APIs instead.
- The deeper `.chat()` reasoning tiers carry per-call cost that scales with usage — for high-volume applications making frequent on-demand reasoning calls, cost modeling must happen before production, not after traffic grows.
- Neuromancer, the reasoning model that powers Honcho's memory, is a Plastic Labs proprietary model. Teams that need full auditability of every inference step in memory construction — regulated industries, for instance — cannot inspect or reproduce that reasoning without the vendor's cooperation.
Community Reviews
Sign in to write a reviewNo reviews yet. Be the first to share your experience.
About
- Platforms
- Python and TypeScript SDKs; integrations with Claude Code, OpenCode, Cursor, Hermes Agent, OpenClaw
- API Available
- Yes
- Self-Hosted
- Yes
- Last Updated
- 2026-06-09T05:26:26.124Z
Best For
Who it's for
- Teams building personal AI assistants where relationship deepens with use
- Applications requiring multi-participant sessions with cross-peer modeling
- Projects prioritizing reasoning-first memory over vector similarity search
- Developers wanting self-hostable infrastructure with open-source availability
What it does well
- Building stateful AI agents that remember user preferences and context across sessions
- Multi-agent and multi-peer conversations with separate modeling of each participant
- Extracting and reasoning about user psychology and behavior patterns from conversations
- Reducing token consumption in long-running LLM conversations with intelligent context management
- Personalizing agent responses based on accumulated understanding rather than simple retrieval
Integrations
Discussion Community
Sign in to commentNo discussion yet. Sign in to start the conversation.
Compare Honcho
Spotted incorrect or missing data? Join our community of contributors.
Sign Up to ContributeCommunity Notes & Tips Community
Sign in to contributeBe the first to contribute. General notes, observations, gotchas, and tips from people who use this tool day-to-day.
Frequently Asked Questions
- Is Honcho free?
- Honcho is a paid tool. No permanent free tier is offered.
- Is Honcho open source?
- Yes. Honcho is open source.
- Does Honcho have an API?
- Yes. Honcho exposes a developer API. See the official documentation at https://honcho.dev for details.
- Can I self-host Honcho?
- Yes. Honcho supports self-hosting on your own infrastructure.
- What platforms does Honcho support?
- Honcho is available on: Python and TypeScript SDKs; integrations with Claude Code, OpenCode, Cursor, Hermes Agent, OpenClaw.
Hours Saved & ROI Stories Community
Sign in to contributeBe the first to contribute. Concrete time/cost savings, with context. e.g. "Cut my code review backlog from 4h to 45m per week."
Curated lists that include this category
Persistent agent memory that degrades into irrelevant context dumps is the problem Honcho is designed around. The core workflow is two steps: write messages to Honcho and call `context()` when your agent needs state. Behind that call, the vendor’s Neuromancer reasoning model has already processed incoming messages, extracted behavioral hypotheses, and ranked what to return within your token budget — the vendor states this happens in approximately 200ms, fast enough to run on every turn. The API is model-agnostic, supporting OpenAI, Anthropic, and custom model backends.
The differentiating bet is reasoning over retrieval. Where vector-search memory systems return chunks that are semantically close to the query, Honcho’s docs describe a system that reasons toward conclusions — patterns across interactions, inferences about user psychology, hypotheses tested against new data. The vendor’s own benchmark page reports state-of-the-art scores on LoCoMo (89.9%), LongMem S (90.4%), and BEAM at context lengths up to 10M tokens, with published evals verifiable on GitHub. The `chat()` method provides on-demand deeper reasoning at tiered cost and latency when a single `context()` call isn’t enough.
Honcho fits cleanly into personal assistant applications, coaching tools, and any system where the relationship between user and agent is meant to deepen with use. Multi-peer session support models each participant separately, which the docs describe as enabling cross-peer reasoning without state collisions. The hard boundary: Honcho provides memory infrastructure, not execution. It does not run workflows, branch on conditions, or coordinate agents autonomously — teams that need those capabilities integrate Honcho as one component in a larger stack.
The tool ships as AGPL-3.0 open source with self-hosted deployment via FastAPI, and a managed cloud API at api.honcho.dev for teams that don’t want to run infrastructure. Pre-built integrations exist for Claude Code (persistent memory plugin), OpenClaw (WhatsApp, Telegram, Discord, Slack gateways), and the NousResearch Hermes agent. A CLI and SDK packages for Python and TypeScript are available.
