Get This Tool
ArXiv Scholar
Pricing
- Model
- Free
Summary
Ask an LLM what the latest advances in RAG are and it will cite papers that don't exist, authors who didn't write them, and findings that were never published — ArXiv Scholar exists because that failure mode is unacceptable in a research workflow.
ArXiv Scholar is an open-source RAG infrastructure that indexes roughly 5,600 curated AI engineering papers from arXiv and exposes them through a streaming API, so agents and developers can query verified literature instead of relying on a model's training memory. The retrieval pipeline runs a 1ms ML-based router that classifies each query as Direct, Decompose, or HyDE before spinning up hybrid dense-plus-sparse search and a cross-encoder re-ranker. Every answer ships with real arXiv paper IDs attached. The hard ceiling is the corpus: 5,600 papers covering RAG, LLMs, agents, training, and inference — nothing outside that domain, and nothing beyond what was ingested through the pipeline as of June 2026. The public endpoint is rate-limited to 5 requests per minute per IP, which breaks any agent loop that needs to fire queries in bursts.
Bottom line: Solid choice for an agent or RAG prototype that needs hallucination-free retrieval over AI engineering literature — but the moment your project touches a domain outside that corpus, or your agent needs more than 5 requests per minute, you are building your own pipeline on top of the self-hosted version.
Community Performance Report Card
No community ratings yet. Be the first to rate this tool!
Community Benchmarks Community
Sign in to submit a benchmarkNo community benchmarks yet. Be the first to share a real-world data point.
Pros
Sign in to edit- Every answer is grounded in real arXiv paper IDs, so the hallucinated-citation failure mode that breaks LLM-powered research assistants does not surface here.
- ML-based query routing classifies incoming questions in 1ms and selects Direct, Decompose, or HyDE paths automatically, which means complex multi-part research questions get decomposed before retrieval instead of returning a single low-precision vector match.
- Hybrid retrieval fuses dense BGE embeddings with BM25 sparse search and a Jina cross-encoder re-ranker, so recall stays high on both keyword-specific queries and semantically fuzzy ones — without requiring the developer to tune separate retrieval modes manually.
- MIT license with a public GitHub repository, so teams that need higher rate limits or want to extend the corpus can self-host and modify the full pipeline without a commercial dependency.
- No authentication required on the public endpoint, so an agent or prototype can start querying the live API immediately without provisioning API keys or managing credentials.
Cons
Sign in to edit- The corpus is fixed at roughly 5,600 AI engineering papers across RAG, LLMs, agents, training, and inference — any query touching adjacent domains like bioinformatics, finance, or even adjacent ML subfields returns nothing useful, and teams building cross-domain research agents have to build or integrate a separate retrieval system.
- The public endpoint is rate-limited to 5 requests per minute per IP; an agent running a multi-step literature review that fires sequential sub-queries will start queuing or failing at that ceiling, forcing teams to either self-host the full stack or throttle their agent's query rate to the point it defeats the purpose of automation.
- The autonomous agent layer described on the roadmap is marked as planned for Q4 2026 and is not shipped — teams expecting a ready-made research agent on top of this pipeline are building that orchestration layer themselves, which means this is retrieval infrastructure, not a finished agent product.
- The ingestion pipeline ran via Google Colab notebooks against a static pull from arXiv, so the corpus does not update continuously; a team that needs retrieval over papers published after the ingestion run must re-run the pipeline themselves on the self-hosted version — there is no documented automated refresh cadence on the public endpoint.
Community Reviews
Sign in to write a reviewNo reviews yet. Be the first to share your experience.
About
- Platforms
- Web API, self-hostable via GitHub
- API Available
- Yes
- Self-Hosted
- Yes
- Last Updated
- 2026-06-18T05:16:00.441Z
Best For
Who it's for
- Researchers needing hallucination-free academic retrieval
- AI agents requiring grounded scientific knowledge
- Developers building RAG systems over domain-specific literature
What it does well
- Retrieving precise AI engineering literature with citations
- Powering autonomous AI agents for research synthesis
- Grounding LLM responses in verified academic papers
- Filtering high-signal papers across RAG, LLMs, agents, training, and inference
Integrations
Discussion Community
Sign in to commentNo discussion yet. Sign in to start the conversation.
Compare ArXiv Scholar
Spotted incorrect or missing data? Join our community of contributors.
Sign Up to ContributeCommunity Notes & Tips Community
Sign in to contributeBe the first to contribute. General notes, observations, gotchas, and tips from people who use this tool day-to-day.
Frequently Asked Questions
- Is ArXiv Scholar free?
- Yes — ArXiv Scholar is fully free to use. There is no paid tier.
- Is ArXiv Scholar open source?
- Yes. ArXiv Scholar is open source.
- Does ArXiv Scholar have an API?
- Yes. ArXiv Scholar exposes a developer API. See the official documentation at https://ethereal-agents.space for details.
- Can I self-host ArXiv Scholar?
- Yes. ArXiv Scholar supports self-hosting on your own infrastructure.
- When was ArXiv Scholar released?
- ArXiv Scholar was first released in 2026.
- What platforms does ArXiv Scholar support?
- ArXiv Scholar is available on: Web API, self-hostable via GitHub.
Hours Saved & ROI Stories Community
Sign in to contributeBe the first to contribute. Concrete time/cost savings, with context. e.g. "Cut my code review backlog from 4h to 45m per week."
Curated lists that include this category
ArXiv Scholar indexes roughly 5,600 AI engineering papers from arXiv and makes them queryable through a FastAPI streaming endpoint hosted on Hugging Face Spaces. A query hits an ML-based router first — classified in 1ms as Direct, Decompose, or HyDE — then moves through LLM-based decomposition for complex questions, hybrid retrieval combining BGE dense embeddings and BM25 sparse search with calibrated score weighting, and finally a Jina cross-encoder re-ranker before the grounded answer streams back via Server-Sent Events with source arXiv IDs attached. No authentication is required for the public endpoint.
The differentiating engineering choice is the query routing layer. Rather than sending every query through the same retrieval path, the router decides whether a question is simple enough for direct retrieval, complex enough to decompose into atomic sub-queries with metadata filters, or abstract enough to benefit from Hypothetical Document Embeddings. That routing decision happens before any vector search, which means complex research questions get decomposed into sub-queries that match the paper corpus more precisely — rather than returning a single dense retrieval pass against a vague question.
This tool fits tightly inside one lane: an agent or developer that needs grounded answers about RAG architectures, LLM training, inference optimization, or agent design, with zero tolerance for hallucinated citations. It does not fit workflows that need coverage beyond AI engineering literature. The public API’s rate limit of 5 requests per minute per IP is a hard constraint for any agent running a multi-step research loop. Teams that hit that wall are directed toward the self-hosted path via the open-source GitHub repository under the MIT license.
The vendor states the codebase uses Qdrant as the vector store with embeddings pushed to Qdrant Cloud during ingestion. Ingestion was run via Google Colab notebooks. An autonomous agent layer and an evaluation benchmark targeting AI engineering tasks are described on the roadmap as planned for Q4 2026 — neither is shipped as of the published roadmap.
