Metadata-Version: 2.4
Name: icdlookup
Version: 0.1.0
Summary: Look up ICD codes and map them to descriptions. Supports ICD-10 (extensible to other versions).
Author: icdcodex contributors
License-Expression: MIT
Project-URL: Homepage, https://github.com/yabdulle/icdlookup
Project-URL: Repository, https://github.com/yabdulle/icdlookup
Project-URL: Issues, https://github.com/yabdulle/icdlookup/issues
Keywords: icd,icd-10,medical-coding,healthcare,cms
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: pandas
Requires-Dist: pandas>=1.5; extra == "pandas"
Provides-Extra: all
Requires-Dist: pandas>=1.5; extra == "all"
Provides-Extra: dev
Requires-Dist: pandas>=1.5; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Dynamic: license-file

# icdlookup

A Python package for looking up ICD codes and mapping them to human-readable descriptions.

Ships with a built-in ICD-10-CM 2026 dataset sourced from the official [CMS ICD-10 files](https://www.cms.gov/medicare/coding-billing/icd-10-codes). The hierarchy approach was inspired by [rjake/ICD10-hierarchy](https://github.com/rjake/ICD10-hierarchy).

The package name is version-agnostic — future releases may extend support to ICD-9, ICD-11, and other coding systems.

## Installation

```bash
pip install icdlookup
```

For DataFrame annotation support:

```bash
pip install icdlookup[pandas]
```

## Quick Start

### Look up a single code

```python
import icdlookup

icdlookup.lookup("A00.1")
# → "cholera due to vibrio cholerae 01, biovar eltor"
```

### Get full hierarchy detail

```python
icdlookup.lookup("A00.1", detail=True)
# → {
#     "icd10_code": "A00.1",
#     "description": "cholera due to vibrio cholerae 01, biovar eltor",
#     "chapter_desc": "Certain infectious and parasitic diseases (A00-B99)",
#     "section_desc": "Intestinal infectious diseases (A00-A09)",
#     "category_desc": "cholera"
#   }
```

### Search by keyword

```python
icdlookup.search("cholera")
# → [("A00", "cholera"), ("A00.0", "cholera due to ..."), ...]
```

### Annotate a pandas DataFrame

```python
import pandas as pd
import icdlookup

df = pd.DataFrame({"icd_code": ["A00.1", "J06.9", "E11.9"]})
df = icdlookup.annotate(df, code_column="icd_code")
# df now has an "icd_description" column
```

### Use a custom reference CSV

Any CSV with at least `icd10_code` and `description` columns works:

```python
icdlookup.lookup("A00.1", reference="/path/to/custom.csv")
icdlookup.annotate(df, code_column="icd_code", reference="/path/to/custom.csv")
```

## CLI

```bash
# Single lookup
icdlookup A00.1

# Full detail
icdlookup A00.1 --detail

# Search by keyword
icdlookup --search cholera

# Annotate a CSV file
icdlookup --file data.csv --column icd_code --output annotated.csv

# Use custom reference
icdlookup A00.1 --reference /path/to/custom.csv
```

## Design

- **No pandas required for core functions** — `lookup()` and `search()` use only the standard library (`csv` + `gzip`). pandas is only needed for `annotate()`.
- **Case & format insensitive** — lookups strip whitespace, are case-insensitive, and work with or without dots (`A001` and `A00.1` both match).
- **Lazy-loaded & cached** — the built-in CSV is only parsed on first use, then kept in memory for the session.
- **Custom reference support** — pass any CSV with `icd10_code` + `description` columns.
- **48,600+ ICD-10-CM codes** — the bundled 2026 dataset includes full hierarchy info (chapter, section, category descriptions).

## Data Source

The bundled dataset is derived from the official [CMS ICD-10-CM 2026 Code Tables](https://www.cms.gov/medicare/coding-billing/icd-10-codes), specifically the tabular XML file. Each code includes:

| Column | Description |
|---|---|
| `icd10_code` | The ICD-10-CM code (e.g. `A00.1`) |
| `description` | Human-readable description |
| `chapter_desc` | Chapter-level description |
| `section_desc` | Section-level description |
| `category_desc` | 3-character category description |

## Rebuilding the Dataset

To regenerate the bundled data from the latest CMS release:

```bash
python scripts/build_data.py
```

This downloads the CMS XML zip, parses the hierarchy, and writes the compressed CSV to `src/icdlookup/_data/`.

## Development

```bash
git clone https://github.com/icdlookup/icdlookup.git
cd icdlookup
pip install -e ".[dev]"
pytest
```

## License

MIT
