Get This Tool
Supermemory
Summary
Every new conversation your agent starts from scratch — no memory of what the user prefers, corrected last time, or asked three sessions ago. Supermemory is the context infrastructure layer that fixes that.
Supermemory wraps memory, retrieval, user profiling, data connectors, and document extraction into one API so your agent doesn't reassemble context from scratch on every request. The retrieval layer claims sub-300ms latency using hybrid search with reranking, and the memory layer maintains a knowledge graph that merges contradictions and evolves facts over time rather than appending chunks blindly. Connectors to Slack, Notion, Drive, Gmail, GitHub, and S3 sync automatically — no ETL pipeline to maintain. The core memory engine is proprietary and hosted-only; self-hosting requires an enterprise agreement, so teams with strict data residency requirements hit a wall before they ship.
Bottom line: Pick this when you need persistent, evolving user profiles and cross-session memory without building the retrieval infrastructure yourself — but plan a different path if your compliance posture requires the memory engine on your own infrastructure.
Hosted & API Pricing
The model is free to self-host. These are the creator's hosted/API options.Memory
Memory graph per user. Auto profiles and fact hierarchies so agents learn in real-time.
- Plain text: $0.005
- Rich content: $0.010
- 2u00d7 cheaper than next-best, with better quality
SuperRAG
Multimodal Extraction -> Contextual Chunking -> Retrieval for agents. No embeddings or vectors required.
- Text mode: $0.001
- Rich mode: $0.002
- Available as a filesystem
Search and Traversal
Insanely cheap semantic search and graph traversal against your content.
- Hybrid search u2014 RAG+Memory in one call
- Graph traversal across linked memories
- Configurable filters and re-ranking
- Sub-300ms p50
Operations
Additional operations for API calls
- Re-ranking
- Aggregation
- Query rewriting
- Other operations
Pricing may have changed since last verified. Check the official site for current plans.
Pricing Plans
SubscriptionLast verified 2 days ago- Price
- $0 - $399+/mo
- Free Tier
- $0/mo with $5/mo of usage built in
Free
For builders tinkering, prototypes and sideprojects.
- $5/mo of usage built in
- Hermes Plugin
- Supermemory MCP
- Community support
Pro
For small teams and plugin power-users.
- ~$20/mo of usage built in
- Unlimited storage
- Unlimited users
- Auto top-up available
- Google Drive, Notion & OneDrive connectors
- 2 teammates included
- OpenClaw, Claude Code and other plugins
- Email support
Max
More headroom for developers who need it.
- ~$130/mo of usage built in (6u00d7 Pro)
- Unlimited storage
- Unlimited users
- Gmail connector
- Granola connector
- Auto top-up available
- OpenClaw, Claude Code and other plugins
- Priority support
Scale
For teams running production workloads.
- ~$600/mo of usage built in
- Unlimited storage
- Unlimited users
- Up to 10 teammates
- All connectors (Gmail, GitHub, S3, Web Crawler + Pro)
- Auto top-up + spend caps
- Priority support
- SOC 2
- HIPAA BAA
- Self-hosted option
Enterprise
For organizations with committed spend, custom deployments, and security requirements.
- Custom pricing
- Air-gapped self-hosting
- Dedicated managed instance
- SOC 2
- HIPAA
- GDPR
- Custom contracts & DPA
- Unlimited usage
- Dedicated account manager
- Forward Deployed engineer
- 1:1 onboarding & integration
- Uptime SLA
- Priority Slack channel
View full pricing on supermemory.ai →
Pricing may have changed since last verified. Check the official site for current plans.
Community Performance Report Card
No community ratings yet. Be the first to rate this tool!
Community Benchmarks Community
Sign in to submit a benchmarkNo community benchmarks yet. Be the first to share a real-world data point.
Pros
Sign in to edit- Knowledge graph memory that merges and contradicts facts across sessions, which means your agent doesn't tell a user something they already corrected two conversations ago.
- Sub-300ms hybrid search with reranking baked into the retrieval layer, so you avoid building and tuning a separate retrieval pipeline to hit production latency targets.
- Persistent user profiles that carry preference, behavior, and identity context across sessions, which means a support agent or personalized chatbot doesn't reset its understanding of the user on every ticket.
- Real-time connectors to Slack, Notion, Drive, Gmail, GitHub, and S3 with automatic sync, so your agent's memory reflects live changes in the tools your users actually work in — no manual import jobs to maintain.
- Multi-format extraction for PDFs, web pages, images, and audio consolidated into one provider, which means you don't wire together separate parsing services before you can ingest mixed document types.
Cons
Sign in to edit- The core memory engine is not self-hostable without an enterprise agreement — teams with data residency requirements or strict policies against sending user memory to a third-party managed service cannot deploy this in production without negotiating a contract first, and most either wait on procurement or replace the memory layer with a self-managed vector store.
- The knowledge graph and memory update logic are proprietary and closed; when retrieval behaves unexpectedly — returning stale facts or failing to surface a contradiction — there is no source code to inspect. Teams debugging production retrieval issues work from API responses and vendor support, not from the system itself.
- The free tier is capped at defined token and query limits, meaning a team validating the tool at scale will exhaust the free tier before they have enough production data to make a confident architecture decision — at which point cost exposure begins before the build is complete.
- Agent frameworks that manage their own memory or context windows require explicit integration work to hand off to Supermemory rather than their native store; teams already deep in a framework with memory primitives — LangGraph, for example — often find the integration layer adds complexity that exceeds the benefit for their specific architecture and abandon Supermemory in favor of the framework's native memory tooling.
Community Reviews
Sign in to write a reviewNo reviews yet. Be the first to share your experience.
About
- Platforms
- Cloud-hosted (SaaS); MCP server; Browser plugins (Chrome); IDE integrations (Claude Code, Cursor, VS Code)
- API Available
- Yes
- Self-Hosted
- No
- Last Updated
- 2026-06-09T06:08:35.038Z
Best For
Who it's for
- Development teams building production AI agents
- Organizations needing low-latency retrieval with personalization
- Teams integrating multiple data sources (Notion, Drive, GitHub)
- Applications requiring user profile continuity across conversations
What it does well
- AI assistants with persistent user context and preference memory
- Personalized chatbots and customer support agents that remember interaction history
- Knowledge base RAG with user-specific context injection
- Coding assistants with memory of developer patterns and codebase context
- Multi-session AI agents that learn and adapt over time
Integrations
Discussion Community
Sign in to commentNo discussion yet. Sign in to start the conversation.
Spotted incorrect or missing data? Join our community of contributors.
Sign Up to ContributeCommunity Notes & Tips Community
Sign in to contributeBe the first to contribute. General notes, observations, gotchas, and tips from people who use this tool day-to-day.
Frequently Asked Questions
- Is Supermemory free?
- Supermemory is a paid tool ($0 - $399+/mo). No permanent free tier is offered.
- Is Supermemory open source?
- Yes. Supermemory is open source.
- Does Supermemory have an API?
- Yes. Supermemory exposes a developer API. See the official documentation at https://supermemory.ai for details.
- When was Supermemory released?
- Supermemory was first released in 2024.
- What platforms does Supermemory support?
- Supermemory is available on: Cloud-hosted (SaaS); MCP server; Browser plugins (Chrome); IDE integrations (Claude Code, Cursor, VS Code).
Hours Saved & ROI Stories Community
Sign in to contributeBe the first to contribute. Concrete time/cost savings, with context. e.g. "Cut my code review backlog from 4h to 45m per week."
Curated lists that include this category
Most RAG systems store chunks and return chunks. Each session begins cold, with no knowledge of who the user is, what they corrected last time, or which preferences have shifted. Supermemory replaces that append-only pattern with a knowledge graph that merges, contradicts, and forgets facts across sessions — the vendor describes this as ‘continual learning’ rather than retrieval. The core workflow is a single API that handles ingestion, retrieval, and memory updates together, with TypeScript and Python SDKs and an npx setup command the docs surface as the entry point.
The differentiating feature is the Profiles primitive: preference, behavior, and identity context that persists across sessions so the agent knows the person, not just the current conversation thread. Paired with the Filesystem primitive — a POSIX-mountable memory layer where grep becomes semantic search and any file is indexed automatically — agents built on Supermemory can access context the way a developer accesses a filesystem rather than constructing retrieval queries by hand.
Supermemory fits teams building production agents that serve returning users: personalized support bots, coding assistants that learn a developer’s patterns, or RAG applications that need user-specific context injected at retrieval time. It does not fit organizations that cannot send user memory data to a managed third-party service. The core memory engine is proprietary and cloud-hosted; the open-source components are the MCP server and plugins, released under MIT. Self-hosting the full engine requires an enterprise agreement — there is no self-hosted community path. Teams that hit that wall typically shift to a combination of a self-hosted vector database and a custom memory management layer, which means rebuilding what Supermemory abstracts.
The Connectors layer syncs Slack, Notion, Drive, Gmail, GitHub, and S3 in real time — the vendor states changes in connected tools are reflected in agent memory without manual imports. The Extractors layer handles PDFs, web pages, images, and audio, with chunking described as meaning-preserving across document boundaries rather than character-count splits. The vendor reports SOC 2 Type II certification and claims throughput exceeding 100 billion tokens per month across the platform.
