Metadata-Version: 2.4
Name: scitex-dataset
Version: 0.1.3
Summary: Dataset fetcher for neuroscience research (OpenNeuro, BIDS, etc.)
Project-URL: Homepage, https://github.com/ywatanabe1989/scitex-dataset
Project-URL: Documentation, https://scitex-dataset.readthedocs.io
Project-URL: Repository, https://github.com/ywatanabe1989/scitex-dataset.git
Project-URL: Issues, https://github.com/ywatanabe1989/scitex-dataset/issues
Author-email: Yusuke Watanabe <ywatanabe@scitex.ai>
License-Expression: AGPL-3.0
License-File: LICENSE
Keywords: bids,dandi,datasets,eeg,fmri,mri,neuroscience,nwb,openneuro,physionet
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU Affero General Public License v3
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.10
Requires-Dist: click>=8.0.0
Requires-Dist: httpx>=0.24.0
Provides-Extra: all
Requires-Dist: fastmcp>=2.0.0; extra == 'all'
Requires-Dist: myst-parser>=2.0; extra == 'all'
Requires-Dist: pre-commit>=3.5.0; extra == 'all'
Requires-Dist: pytest-cov>=4.0.0; extra == 'all'
Requires-Dist: pytest>=7.0.0; extra == 'all'
Requires-Dist: ruff>=0.1.0; extra == 'all'
Requires-Dist: scitex>=2.0.0; extra == 'all'
Requires-Dist: sphinx-autodoc-typehints>=1.25; extra == 'all'
Requires-Dist: sphinx-copybutton>=0.5; extra == 'all'
Requires-Dist: sphinx-rtd-theme>=2.0; extra == 'all'
Requires-Dist: sphinx>=7.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: pre-commit>=3.5.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: myst-parser>=2.0; extra == 'docs'
Requires-Dist: sphinx-autodoc-typehints>=1.25; extra == 'docs'
Requires-Dist: sphinx-copybutton>=0.5; extra == 'docs'
Requires-Dist: sphinx-rtd-theme>=2.0; extra == 'docs'
Requires-Dist: sphinx>=7.0; extra == 'docs'
Provides-Extra: mcp
Requires-Dist: fastmcp>=2.0.0; extra == 'mcp'
Provides-Extra: scitex
Requires-Dist: scitex>=2.0.0; extra == 'scitex'
Description-Content-Type: text/markdown

# SciTeX Dataset

**Unified access to neuroscience datasets for AI-powered research**

[![PyPI version](https://badge.fury.io/py/scitex-dataset.svg)](https://badge.fury.io/py/scitex-dataset)
[![Tests](https://github.com/ywatanabe1989/scitex-dataset/actions/workflows/test.yml/badge.svg)](https://github.com/ywatanabe1989/scitex-dataset/actions/workflows/test.yml)
[![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL--3.0-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)

SciTeX Dataset provides a unified interface to discover and fetch metadata from major neuroscience data repositories.

Part of [**SciTeX**](https://scitex.ai).

## Data Sources

| Repository | Description | Data Types |
|------------|-------------|------------|
| **OpenNeuro** | Open platform for sharing neuroimaging data | MRI, EEG, MEG, iEEG, PET |
| **DANDI** | BRAIN Initiative data archive | Electrophysiology, Ophys |
| **PhysioNet** | Physiological signal databases | ECG, EEG, clinical data |

## Quick Start

```bash
pip install scitex-dataset
```

### Python API

```python
from scitex_dataset import fetch_all_datasets, format_dataset

# Fetch datasets from OpenNeuro
datasets = fetch_all_datasets(max_datasets=10)

# Format for analysis
for ds in datasets:
    formatted = format_dataset(ds)
    print(f"{formatted['id']}: {formatted['name']} ({formatted['n_subjects']} subjects)")
```

### CLI

```bash
# Fetch OpenNeuro datasets
scitex-dataset openneuro -n 100 -o datasets.json -v

# Search across repositories
scitex-dataset search "epilepsy EEG" --source openneuro

# Database operations
scitex-dataset db init
scitex-dataset db sync openneuro
scitex-dataset db query "modality:eeg"
```

### MCP Server

SciTeX Dataset includes an **MCP (Model Context Protocol) server**, enabling AI agents like Claude to discover and query neuroscience datasets.

```bash
# Add to Claude Code MCP config
scitex-dataset mcp install

# Or run directly
scitex-dataset mcp start
```

**Available MCP Tools:**

| Tool | Description |
|------|-------------|
| `dataset_openneuro_fetch` | Fetch datasets from OpenNeuro |
| `dataset_openneuro_search` | Search OpenNeuro by query |
| `dataset_dandi_fetch` | Fetch datasets from DANDI Archive |
| `dataset_dandi_search` | Search DANDI by query |
| `dataset_physionet_fetch` | Fetch datasets from PhysioNet |
| `dataset_physionet_search` | Search PhysioNet by query |
| `dataset_search` | Unified search across all repositories |
| `dataset_stats` | Get repository statistics |

### With SciTeX Session

```python
import scitex as stx
from scitex_dataset import fetch_all_datasets, format_dataset

@stx.session
def main(logger=stx.INJECTED):
    datasets = fetch_all_datasets(max_datasets=100, logger=logger)
    formatted = [format_dataset(ds) for ds in datasets]
    stx.io.save(formatted, "openneuro_datasets.json")
    return 0

if __name__ == "__main__":
    main()
```

## Why SciTeX Dataset?

- **Unified Interface**: One API for OpenNeuro, DANDI, PhysioNet, and more
- **AI-Ready**: MCP server enables LLMs to discover relevant datasets
- **Metadata Focus**: Fast metadata queries without downloading full datasets
- **SciTeX Integration**: Works seamlessly with `@stx.session` for reproducible research

---

<p align="center">
  <a href="https://scitex.ai" target="_blank"><img src="https://raw.githubusercontent.com/ywatanabe1989/scitex-python/main/docs/assets/images/scitex-icon-navy-inverted.png" alt="SciTeX" width="40"/></a>
  <br>
  AGPL-3.0
</p>

<!-- EOF -->
