Is ArXiv Scholar free?

Yes — ArXiv Scholar is fully free to use. There is no paid tier.

Is ArXiv Scholar open source?

Yes. ArXiv Scholar is open source.

Does ArXiv Scholar have an API?

Yes. ArXiv Scholar exposes a developer API. See the official documentation at https://ethereal-agents.space for details.

Can I self-host ArXiv Scholar?

Yes. ArXiv Scholar supports self-hosting on your own infrastructure.

When was ArXiv Scholar released?

ArXiv Scholar was first released in 2026.

What platforms does ArXiv Scholar support?

ArXiv Scholar is available on: Web API, self-hostable via GitHub.

Visit ArXiv Scholar

Get This Tool

License: MIT Any use incl. commercial

Local-run terms: MIT license permits commercial use, modification, and distribution with attribution.

Official Website

ArXiv Scholar

FreeOpen SourceAPISelf-Hosted

Pricing

Model: Free

Summary

Ask an LLM what the latest advances in RAG are and it will cite papers that don't exist, authors who didn't write them, and findings that were never published — ArXiv Scholar exists because that failure mode is unacceptable in a research workflow.

ArXiv Scholar is an open-source RAG infrastructure that indexes roughly 5,600 curated AI engineering papers from arXiv and exposes them through a streaming API, so agents and developers can query verified literature instead of relying on a model's training memory. The retrieval pipeline runs a 1ms ML-based router that classifies each query as Direct, Decompose, or HyDE before spinning up hybrid dense-plus-sparse search and a cross-encoder re-ranker. Every answer ships with real arXiv paper IDs attached. The hard ceiling is the corpus: 5,600 papers covering RAG, LLMs, agents, training, and inference — nothing outside that domain, and nothing beyond what was ingested through the pipeline as of June 2026. The public endpoint is rate-limited to 5 requests per minute per IP, which breaks any agent loop that needs to fire queries in bursts.

Bottom line: Solid choice for an agent or RAG prototype that needs hallucination-free retrieval over AI engineering literature — but the moment your project touches a domain outside that corpus, or your agent needs more than 5 requests per minute, you are building your own pipeline on top of the self-hosted version.

Community Performance Report Card

No community ratings yet. Be the first to rate this tool!

Best For: Researchers needing hallucination-free academic retrieval, AI agents requiring grounded scientific knowledge, Developers building RAG systems over domain-specific literature

Community Benchmarks Community

No community benchmarks yet. Be the first to share a real-world data point.

Inference Engines & Infra RAG Frameworks

Released June 2026

Pros

Every answer is grounded in real arXiv paper IDs, so the hallucinated-citation failure mode that breaks LLM-powered research assistants does not surface here.
ML-based query routing classifies incoming questions in 1ms and selects Direct, Decompose, or HyDE paths automatically, which means complex multi-part research questions get decomposed before retrieval instead of returning a single low-precision vector match.
Hybrid retrieval fuses dense BGE embeddings with BM25 sparse search and a Jina cross-encoder re-ranker, so recall stays high on both keyword-specific queries and semantically fuzzy ones — without requiring the developer to tune separate retrieval modes manually.
MIT license with a public GitHub repository, so teams that need higher rate limits or want to extend the corpus can self-host and modify the full pipeline without a commercial dependency.
No authentication required on the public endpoint, so an agent or prototype can start querying the live API immediately without provisioning API keys or managing credentials.

Cons

The corpus is fixed at roughly 5,600 AI engineering papers across RAG, LLMs, agents, training, and inference — any query touching adjacent domains like bioinformatics, finance, or even adjacent ML subfields returns nothing useful, and teams building cross-domain research agents have to build or integrate a separate retrieval system.
The public endpoint is rate-limited to 5 requests per minute per IP; an agent running a multi-step literature review that fires sequential sub-queries will start queuing or failing at that ceiling, forcing teams to either self-host the full stack or throttle their agent's query rate to the point it defeats the purpose of automation.
The autonomous agent layer described on the roadmap is marked as planned for Q4 2026 and is not shipped — teams expecting a ready-made research agent on top of this pipeline are building that orchestration layer themselves, which means this is retrieval infrastructure, not a finished agent product.
The ingestion pipeline ran via Google Colab notebooks against a static pull from arXiv, so the corpus does not update continuously; a team that needs retrieval over papers published after the ingestion run must re-run the pipeline themselves on the self-hosted version — there is no documented automated refresh cadence on the public endpoint.

Community Reviews

No reviews yet. Be the first to share your experience.

About

Platforms: Web API, self-hostable via GitHub
API Available: Yes
Self-Hosted: Yes
Last Updated: 2026-06-18T05:16:00.441Z

Best For

Who it's for

Researchers needing hallucination-free academic retrieval
AI agents requiring grounded scientific knowledge
Developers building RAG systems over domain-specific literature

What it does well

Retrieving precise AI engineering literature with citations
Powering autonomous AI agents for research synthesis
Grounding LLM responses in verified academic papers
Filtering high-signal papers across RAG, LLMs, agents, training, and inference

Integrations

Hugging Face SpacesQdrantFastAPI

Discussion Community

No discussion yet. Sign in to start the conversation.

Compare ArXiv Scholar

Spotted incorrect or missing data? Join our community of contributors.

Community Notes & Tips Community

Be the first to contribute. General notes, observations, gotchas, and tips from people who use this tool day-to-day.

Frequently Asked Questions

Is ArXiv Scholar free?: Yes — ArXiv Scholar is fully free to use. There is no paid tier.
Is ArXiv Scholar open source?: Yes. ArXiv Scholar is open source.
Does ArXiv Scholar have an API?: Yes. ArXiv Scholar exposes a developer API. See the official documentation at https://ethereal-agents.space for details.
Can I self-host ArXiv Scholar?: Yes. ArXiv Scholar supports self-hosting on your own infrastructure.
When was ArXiv Scholar released?: ArXiv Scholar was first released in 2026.
What platforms does ArXiv Scholar support?: ArXiv Scholar is available on: Web API, self-hostable via GitHub.

Hours Saved & ROI Stories Community

Be the first to contribute. Concrete time/cost savings, with context. e.g. "Cut my code review backlog from 4h to 45m per week."

Curated lists that include this category

ArXiv Scholar indexes roughly 5,600 AI engineering papers from arXiv and makes them queryable through a FastAPI streaming endpoint hosted on Hugging Face Spaces. A query hits an ML-based router first — classified in 1ms as Direct, Decompose, or HyDE — then moves through LLM-based decomposition for complex questions, hybrid retrieval combining BGE dense embeddings and BM25 sparse search with calibrated score weighting, and finally a Jina cross-encoder re-ranker before the grounded answer streams back via Server-Sent Events with source arXiv IDs attached. No authentication is required for the public endpoint.

The differentiating engineering choice is the query routing layer. Rather than sending every query through the same retrieval path, the router decides whether a question is simple enough for direct retrieval, complex enough to decompose into atomic sub-queries with metadata filters, or abstract enough to benefit from Hypothetical Document Embeddings. That routing decision happens before any vector search, which means complex research questions get decomposed into sub-queries that match the paper corpus more precisely — rather than returning a single dense retrieval pass against a vague question.

This tool fits tightly inside one lane: an agent or developer that needs grounded answers about RAG architectures, LLM training, inference optimization, or agent design, with zero tolerance for hallucinated citations. It does not fit workflows that need coverage beyond AI engineering literature. The public API’s rate limit of 5 requests per minute per IP is a hard constraint for any agent running a multi-step research loop. Teams that hit that wall are directed toward the self-hosted path via the open-source GitHub repository under the MIT license.

The vendor states the codebase uses Qdrant as the vector store with embeddings pushed to Qdrant Cloud during ingestion. Ingestion was run via Google Colab notebooks. An autonomous agent layer and an evaluation benchmark targeting AI engineering tasks are described on the roadmap as planned for Q4 2026 — neither is shipped as of the published roadmap.

Get This Tool

ArXiv Scholar

Pricing

Summary

Community Performance Report Card

Community Benchmarks Community

Pros

Cons

Community Reviews

About

Best For

Who it's for

What it does well

Integrations

Discussion Community

Compare ArXiv Scholar

Community Notes & Tips Community

Frequently Asked Questions

Hours Saved & ROI Stories Community

Curated lists that include this category

Apertis

gate-oc-audit

Memori

Get This Tool

Share This Tool

ArXiv Scholar

Pricing

Summary

Community Performance Report Card

Community Benchmarks Community

Pros

Cons

Community Reviews

About

Best For

Who it's for

What it does well

Integrations

Discussion Community

Compare ArXiv Scholar

Community Notes & Tips Community

Frequently Asked Questions

Hours Saved & ROI Stories Community

Curated lists that include this category