arxiv.org APIarxiv.org ↗
Search arXiv research papers by keyword, author, title, or category. Fetch full metadata by arXiv ID. Browse the complete category taxonomy. 3 endpoints.
curl -X GET 'https://api.parse.bot/scraper/9380e1b0-fae2-4340-9056-3d416f86c775/search_papers?query=transformer&max_results=2' \ -H 'X-API-Key: $PARSE_API_KEY'
Search arXiv for research papers using keyword, author, title, and category filters. Supports combining multiple search fields with AND logic. At least one search parameter (query, author, title, or category) must be provided. Returns paginated results.
| Param | Type | Description |
|---|---|---|
| query | string | General keyword search across all fields |
| start | integer | Starting index for pagination (0-based) |
| title | string | Title keywords to search for |
| author | string | Author name to search for |
| sort_by | string | Sort field: relevance, lastUpdatedDate, or submittedDate |
| category | string | arXiv category code (e.g., cs.AI, math.CO, hep-th) |
| sort_order | string | Sort order: descending or ascending |
| max_results | integer | Maximum number of results to return (max 100) |
{
"type": "object",
"fields": {
"papers": "array of paper objects with arxiv_id, title, authors, summary, categories, primary_category, published, updated, pdf_url, abs_url, comment, journal_ref, doi",
"start_index": "integer pagination offset",
"total_results": "integer total number of matching papers"
},
"sample": {
"data": {
"papers": [
{
"doi": null,
"title": "PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture",
"abs_url": "https://arxiv.org/abs/2201.00978v1",
"authors": [
"Kai Han",
"Jianyuan Guo",
"Yehui Tang",
"Yunhe Wang"
],
"comment": "Tech Report",
"pdf_url": "https://arxiv.org/pdf/2201.00978v1",
"summary": "Transformer networks have achieved great progress...",
"updated": "2022-01-04T04:56:57Z",
"arxiv_id": "2201.00978v1",
"published": "2022-01-04T04:56:57Z",
"categories": [
"cs.CV"
],
"journal_ref": null,
"primary_category": "cs.CV"
}
],
"start_index": 0,
"total_results": 167798
},
"status": "success"
}
}About the arxiv.org API
The arXiv API covers 3 endpoints that let you search millions of research papers, retrieve detailed metadata for any paper by its arXiv ID, and browse the full category taxonomy across disciplines. The search_papers endpoint accepts keyword, author, title, and category filters simultaneously, returning paginated results with abstracts, authors, PDF links, and category assignments.
Searching Papers
The search_papers endpoint accepts up to four independent search fields — query (general keyword), author, title, and category — combined with AND logic. At least one must be provided. Results are paginated via start (0-based offset) and max_results (up to 100 per call). The total_results field in the response tells you how many papers match overall. Sort options include relevance, lastUpdatedDate, and submittedDate in either direction via sort_order.
Each paper object in the results array includes arxiv_id, title, authors (array of names), summary (full abstract text), categories (all assigned category codes), primary_category, published and updated timestamps in ISO 8601, and direct pdf_url and abs_url links.
Fetching a Specific Paper
The get_paper endpoint takes a single arxiv_id and returns the full metadata record for that paper. The ID can be in current format (2301.00001), versioned format (2301.00001v1), or legacy format (hep-th/9901001). The response includes the doi field (string or null), an optional comment field for author-provided notes, and all the same title, abstract, author, category, and link fields returned by search.
Category Taxonomy
get_category_taxonomy returns the complete arXiv subject hierarchy. Each entry in the groups array contains a group_name (e.g., Computer Science, Physics, Mathematics, Quantitative Biology, Quantitative Finance) and a categories array of objects with id, name, and description. Pass the optional group parameter to filter down to a single discipline, which is useful when building category pickers or validating category codes before passing them to search_papers.
- Build a literature review tool that queries
search_papersby author name and exports abstracts to a spreadsheet - Monitor a research area by polling
search_paperswith a category code likecs.AIsorted bysubmittedDateto surface new papers - Resolve a list of arXiv IDs to full metadata records including DOI and PDF URL using
get_paperin bulk - Populate a category selector UI from
get_category_taxonomyso users can filter searches by valid arXiv subject codes - Cross-reference papers in a citation graph by fetching
doiandabs_urlfields fromget_paperfor each node - Ingest paper metadata into a RAG pipeline using
summaryandauthorsfields fromsearch_papersresults - Track version history of a specific paper by fetching versioned IDs (e.g.,
2301.00001v1,2301.00001v2) viaget_paper
| Tier | Price | Credits/month | Rate limit |
|---|---|---|---|
| Free | $0/mo | 100 | 5 req/min |
| Hobby | $30/mo | 1,000 | 20 req/min |
| Developer | $100/mo | 5,000 | 250 req/min |
One credit = one API call regardless of which marketplace API you call. Exceeding the rate limit returns a 429 response. Authenticate with the X-API-Key header.
Does arXiv have an official developer API?+
What does `get_paper` return that `search_papers` does not?+
get_paper returns two additional fields not present in search results: doi (a DOI string or null) and comment (an author-supplied note string or null, often containing page counts, conference names, or revision notes). Both endpoints return title, authors, summary, categories, timestamps, and URLs.How does pagination work in `search_papers`?+
start parameter (0-based) combined with max_results (maximum 100) to page through results. The response includes total_results so you can compute how many pages exist. For example, to fetch the second page of 25 results, set start=25 and max_results=25.Does the API return full paper PDFs or citation counts?+
pdf_url (a direct link to the PDF on arXiv) and abs_url (the abstract page URL), but does not fetch or return the PDF content itself. Citation counts and reference lists are also not exposed — those are not part of arXiv's own metadata. You can fork this API on Parse and revise it to add an endpoint that fetches and parses the PDF content or integrates citation data from a source like Semantic Scholar.Are preprint versions other than the latest accessible?+
get_paper you can pass a versioned ID like 2301.00001v1 or 2301.00001v2 to retrieve metadata for a specific revision. However, search_papers results always reflect the latest version of each paper. Version-specific search filtering is not currently supported. You can fork this API on Parse and revise it to add version-aware search behavior.