Metadata-Version: 2.4
Name: pyopenalex
Version: 0.4.0
Summary: A Pydantic-powered Python client for the OpenAlex API
Project-URL: Homepage, https://github.com/nthomsencph/pyopenalex
Project-URL: Repository, https://github.com/nthomsencph/pyopenalex
Project-URL: Issues, https://github.com/nthomsencph/pyopenalex/issues
Author-email: "Nicolai B. Thomsen" <nbth.msc@cbs.dk>
License-Expression: MIT
License-File: LICENSE
Keywords: academic,api,openalex,pydantic,research,scholarly
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: httpx>=0.27
Requires-Dist: pydantic-settings>=2.0
Requires-Dist: pydantic>=2.0
Requires-Dist: tenacity>=8.0
Description-Content-Type: text/markdown

<p align="center">
  <img src="https://raw.githubusercontent.com/nthomsencph/pyopenalex/main/logo.png" alt="PyOpenAlex" width="320">
</p>

<p align="center">
  A Pydantic-powered Python client for the <a href="https://openalex.org">OpenAlex API</a>.
</p>

OpenAlex is an open catalog of the global research system: 270M+ scholarly works, 90M+ authors, and 100K+ sources. PyOpenAlex gives you typed access to all of it with an API that follows the patterns of FastAPI and Pydantic.

```python
from pyopenalex import OpenAlex, gt

with OpenAlex() as client:
    for work in client.works.filter(cited_by_count=gt(1000), publication_year=2024).limit(10):
        print(work.to_markdown(limit_abstract=150))
```

## Installation

```bash
pip install pyopenalex
```

Requires Python 3.10+.

## Quick Start

```python
from pyopenalex import OpenAlex

client = OpenAlex(api_key="your-key")  # or set OPENALEX_API_KEY env var

# Get a single work by ID
work = client.works.get("W2741809807")
print(work.title)
print(work.doi)
print(work.abstract)  # reconstructed from inverted index

# Search for authors
results = client.authors.search("Einstein").get(5)
for author in results.results:
    print(f"{author.name}: {author.works_count} works")
```

## Entities

PyOpenAlex supports all OpenAlex entity types:

```python
# Core entities
client.works              # Scholarly documents (articles, books, datasets)
client.authors            # Researcher profiles
client.sources            # Journals, repositories, conferences
client.institutions       # Universities, research organizations
client.topics             # Subject classifications
client.keywords           # Extracted keywords
client.publishers         # Publishing organizations
client.funders            # Funding agencies

# Topic hierarchy
client.domains            # Top-level categories (4 total)
client.fields             # Second-level categories (26 total)
client.subfields          # Third-level categories (254 total)

# Reference entities
client.sdgs               # UN Sustainable Development Goals
client.countries          # Countries
client.continents         # Continents
client.languages          # Languages
client.work_types         # Work types (article, book, dataset, etc.)
client.source_types       # Source types (journal, repository, etc.)
client.institution_types  # Institution types (education, company, etc.)
client.licenses           # Open access licenses (CC BY, etc.)
client.awards             # Research grants and funding awards
```

Every entity is a Pydantic model with fully typed fields and convenience aliases:

```python
work = client.works.get("W2741809807")

work.title                              # str | None
work.year                               # alias for publication_year
work.citations                          # alias for cited_by_count
work.authors                            # list of author name strings
work.name                               # alias for display_name
work.abstract                           # reconstructed from inverted index
work.open_access.is_oa                  # bool
work.open_access.oa_status              # str (gold, green, hybrid, bronze, diamond, closed)
work.authorships[0].author.display_name # str | None
work.authorships[0].institutions        # list[DehydratedInstitution]
work.primary_location.source            # DehydratedSource | None
```

The `.name` and `.citations` aliases are available on all entity types.

## Looking Up Entities

### By OpenAlex ID

```python
work = client.works.get("W2741809807")
author = client.authors.get("A5023888391")
```

### By External ID

Works accept DOIs, authors accept ORCIDs, institutions accept ROR IDs:

```python
work = client.works.get("https://doi.org/10.7717/peerj.4375")
author = client.authors.get("https://orcid.org/0000-0001-6187-6610")
institution = client.institutions.get("https://ror.org/0161xgx34")
```

### Batch Lookup

Fetch up to 100 entities at once:

```python
works = client.works.get(["W2741809807", "W2100837269", "W1775749144"])
```

### Random Entity

```python
work = client.works.random()
```

## Finding Works by Name

Look up works by author, institution, source, topic, or funder name. PyOpenAlex handles the two-step ID resolution automatically:

```python
# By author
client.works.by_author("Yann LeCun").sort("cited_by_count", desc=True).get(10)

# By institution
client.works.by_institution("MIT").filter(publication_year=2024).get(10)

# By journal
client.works.by_source("Nature").filter(publication_year=2024).count()

# By topic
client.works.by_topic("machine learning").get(10)

# By funder
client.works.by_funder("NIH").filter(is_oa=True).get(10)
```

## Filtering

Chain `.filter()` calls to narrow results. Multiple filters combine with AND:

```python
results = (
    client.works
    .filter(publication_year=2024, is_oa=True)
    .sort("cited_by_count", desc=True)
    .get(10)
)
```

### Filter Expressions

PyOpenAlex provides expression functions for building filters, similar to how FastAPI uses `Query()`, `Path()`, and `Body()`:

```python
from pyopenalex import gt, lt, ne, or_, between

# Greater than / less than
client.works.filter(cited_by_count=gt(100))
client.works.filter(publication_year=lt(2020))

# Not equal
client.works.filter(type=ne("paratext"))

# OR (up to 100 values)
client.works.filter(doi=or_(
    "https://doi.org/10.7717/peerj.4375",
    "https://doi.org/10.1038/nature12373",
))

# Range
client.works.filter(publication_year=between(2020, 2024))
```

### Nested Filters

Use dicts for dot-notation filter paths. PyOpenAlex flattens them automatically:

```python
# These are equivalent:
client.works.filter(authorships={"institutions": {"id": "I136199984"}})
client.works.filter_raw("authorships.institutions.id:I136199984")
```

### Raw Filters

For full control, pass the filter string directly:

```python
client.works.filter_raw("publication_year:2024,is_oa:true,cited_by_count:>100")
```

## Searching

### Full-Text Search

```python
results = client.works.search("machine learning").get()
```

### Field-Specific Search

```python
results = client.works.search_filter(title="neural networks").get()
```

Search and filters can be combined:

```python
results = (
    client.works
    .search("CRISPR")
    .filter(publication_year=2024, is_oa=True)
    .sort("cited_by_count", desc=True)
    .get()
)
```

## Sorting

```python
# Ascending (default)
client.works.sort("publication_date")

# Descending
client.works.sort("cited_by_count", desc=True)
```

## Field Selection

Request only the fields you need to reduce response size:

```python
results = client.works.select("id", "title", "doi", "cited_by_count").get()
```

## Fetching Results

### Get a specific number of results

Pass a count to `.get()` and PyOpenAlex handles pagination internally:

```python
# Get exactly 10 results
results = client.works.search("CRISPR").get(10)

# Get 250 results (auto-paginates across multiple API calls)
results = client.works.filter(publication_year=2024).get(250)

# Get all matching results
results = client.works.filter(publication_year=2024, type="dataset").get(all=True)

# Default: single page (25 results)
results = client.works.search("CRISPR").get()
```

### Iteration

Iterate over any query and PyOpenAlex handles cursor pagination automatically:

```python
for work in client.works.filter(publication_year=2024, is_oa=True):
    print(work.title)
```

Use `.limit()` to cap the total number of results:

```python
for work in client.works.filter(publication_year=2024).limit(500):
    process(work)
```

### Manual page control

For cases where you need direct control over pagination:

```python
page1 = client.works.filter(publication_year=2024).page(1).per_page(100).get()
page2 = client.works.filter(publication_year=2024).page(2).per_page(100).get()
```

## Counting

Get the total number of matching results without fetching them:

```python
count = client.works.filter(publication_year=2024, is_oa=True).count()
```

## Grouping

Aggregate results by a field:

```python
response = client.works.filter(publication_year=2024).group_by("type").get()
for group in response.group_by:
    print(f"{group.key_display_name}: {group.count}")
```

## Sampling

Get a random sample of results:

```python
results = client.works.sample(100, seed=42).get()
```

## Autocomplete

Fast typeahead search returning up to 10 results:

```python
results = client.institutions.autocomplete("harvard")
for r in results:
    print(f"{r.display_name} ({r.works_count} works)")
```

## Query Reuse

The query builder is immutable. Each method returns a new instance, so you can safely branch from a base query:

```python
base = client.works.filter(publication_year=2024, is_oa=True)

most_cited = base.sort("cited_by_count", desc=True).get(10)
recent = base.sort("publication_date", desc=True).get(10)
count = base.count()
```

## Response Objects

List queries return a `ListResponse` with three parts:

```python
response = client.works.search("CRISPR").get()

response.meta        # Meta: count, page, per_page, cost_usd, ...
response.results     # list[Work]: the entities
response.group_by    # list[GroupByResult]: populated when using group_by
```

## Configuration

### API Key

Set your API key in any of these ways (in order of precedence):

```python
# 1. Constructor argument
client = OpenAlex(api_key="your-key")

# 2. Environment variable
# export OPENALEX_API_KEY=your-key
client = OpenAlex()

# 3. .env file (loaded automatically)
# OPENALEX_API_KEY=your-key
client = OpenAlex()
```

Get a free API key at [openalex.org/settings/api](https://openalex.org/settings/api).

### Other Settings

```python
client = OpenAlex(
    api_key="your-key",
    base_url="https://api.openalex.org",  # default
    timeout=30.0,                          # request timeout in seconds
    max_retries=3,                         # retries on 403/5xx errors
)
```

All settings can be set via environment variables with the `OPENALEX_` prefix:

```bash
export OPENALEX_API_KEY=your-key
export OPENALEX_TIMEOUT=60
export OPENALEX_MAX_RETRIES=5
```

### Context Manager

The client can be used as a context manager to ensure the HTTP connection is closed:

```python
with OpenAlex() as client:
    work = client.works.get("W2741809807")
```

## Markdown Rendering

All entities can render themselves as clean markdown, useful for LLM tool responses and reports:

```python
work = client.works.get("W2741809807")
print(work.to_markdown())
```

Output:

```markdown
## The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles (2018)

- **DOI:** https://doi.org/10.7717/peerj.4375
- **Type:** book-chapter
- **Citations:** 1,163
- **Open Access:** Yes (gold)
- **Source:** PeerJ
- **Authors:** Heather Piwowar, Jason Priem, Vincent Lariviere, ...

Despite growing interest in Open Access (OA) to scholarly literature...
```

Use `limit_abstract` to truncate long abstracts:

```python
work.to_markdown(limit_abstract=150)
```

## Special Endpoints

```python
# Check rate limit status (requires API key)
rl = client.rate_limit()
print(f"Remaining: ${rl.remaining_cost_today_usd}")

# List available changefiles for bulk data sync
dates = client.changefiles()
entries = client.changefile("2026-03-18")

# Download a PDF ($0.01 per download)
client.download_pdf("W2741809807", "/tmp/paper.pdf")
```

## Error Handling

PyOpenAlex raises typed exceptions:

```python
from pyopenalex import (
    AuthenticationError,
    NotFoundError,
    RateLimitError,
    APIError,
)

try:
    work = client.works.get("W0000000000")
except NotFoundError:
    print("Work not found")
except AuthenticationError:
    print("Invalid or missing API key")
except RateLimitError:
    print("Rate limit exceeded")
except APIError as e:
    print(f"HTTP {e.status_code}: {e}")
```

Error responses include the API's own messages, including helpful suggestions like "Did you mean: authorships.author.id?".

- **403** (burst rate limit): retries automatically with exponential backoff
- **429** (daily limit exhausted): fails immediately with an actionable message
- **5xx** (server error): retries automatically with exponential backoff

## Abstract Reconstruction

OpenAlex stores abstracts as inverted indexes. PyOpenAlex reconstructs them for you:

```python
work = client.works.get("W2741809807")
print(work.abstract)  # full abstract text, or None if unavailable
```

## Examples

See [`examples/quickstart.py`](examples/quickstart.py) for a runnable script showcasing core features and [`examples/new_endpoints.py`](examples/new_endpoints.py) for the topic hierarchy, geographic, and special endpoints.

## License

MIT
