Metadata-Version: 2.4
Name: adam-thinking
Version: 1.0.0
Summary: thinking - Python library implementing ADAM 1.0.1 analytics framework for biomedical RAG research (currently supports gpt-oss:20b and gpt-oss:120b; future releases will add more LLM backends)
License-Expression: MIT
Project-URL: Repository, https://github.com/melhzy/alzheimers
Project-URL: Bug Tracker, https://github.com/melhzy/alzheimers/issues
Keywords: ADAM,ADAM-1,alzheimer,RAG,retrieval-augmented-generation,biomedical,NLP,LLM,ChromaDB,FAISS,pubmed,gut-microbiome
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28.0
Requires-Dist: chromadb>=1.5.1
Requires-Dist: pubmed-stream>=0.1.0
Provides-Extra: llm
Requires-Dist: ollama>=0.1.0; extra == "llm"
Provides-Extra: embeddings
Requires-Dist: sentence-transformers>=2.2.0; extra == "embeddings"
Provides-Extra: faiss
Requires-Dist: faiss-cpu>=1.7.0; extra == "faiss"
Requires-Dist: torch>=2.0.0; extra == "faiss"
Requires-Dist: transformers>=4.30.0; extra == "faiss"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Provides-Extra: all
Requires-Dist: ollama>=0.1.0; extra == "all"
Requires-Dist: sentence-transformers>=2.2.0; extra == "all"
Requires-Dist: faiss-cpu>=1.7.0; extra == "all"
Requires-Dist: torch>=2.0.0; extra == "all"
Requires-Dist: transformers>=4.30.0; extra == "all"
Dynamic: license-file

# ADAM Thinking

> **`thinking`** — Python library implementing the **ADAM 1.0.1** analytics framework
>
> **Biomedical Retrieval-Augmented Generation (RAG) for Alzheimer's disease research.**
>
> Download full-text papers from PubMed Central, embed them with domain-specific biomedical
> vectors, store them in a local ChromaDB, and answer research questions at eight levels of
> evidence depth — from a quick LLM-only response all the way to a 100K-token,
> 25-paper research-grade analysis, including a two-pass Chain of Thought (CoT) pipeline.

> **About ADAM**: ADAM (Alzheimer's Disease Analysis Module) is an analytics framework for biomedical RAG research. ADAM 1.0.1 is an enhancement on the published [ADAM 1.0.0](https://pmc.ncbi.nlm.nih.gov/articles/PMC12483529/) framework.

> **LLM Support**: The `thinking` library currently supports **`gpt-oss:20b` (default)** and **`gpt-oss:120b` (advanced)**. Future releases will add support for additional LLM backends.

> **Note:** This library supersedes [`neuromind`](https://pypi.org/project/neuromind/), which is now legacy and will no longer be supported.

[![PyPI](https://img.shields.io/pypi/v/thinking)](https://pypi.org/project/thinking/)
[![Python](https://img.shields.io/pypi/pyversions/thinking)](https://pypi.org/project/thinking/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

---

## Table of Contents

1. [Overview](#overview)
2. [Architecture](#architecture)
3. [Prerequisites](#prerequisites)
4. [Installation](#installation)
5. [Configuration](#configuration)
   - [Configure ChromaDB or FAISS](#configure-chromadb-or-faiss)
6. [Quick Start — 3 Lines](#quick-start--3-lines)
7. [Step-by-Step Guide](#step-by-step-guide)
   - [Step 1 — Download Papers](#step-1--download-papers)
   - [Step 2 — Index into ChromaDB](#step-2--index-into-chromadb)
   - [Step 3 — Query with RAG](#step-3--query-with-rag)
   - [Step 4 — Compare All RAG Modes](#step-4--compare-all-rag-modes)
8. [RAG Modes Explained](#rag-modes-explained)
9. [Command-Line Interface (CLI)](#command-line-interface-cli)
10. [Python API Reference](#python-api-reference)
11. [Working with Results](#working-with-results)
12. [Custom Embeddings (MedEmbed)](#custom-embeddings-medembed)
13. [Using an Existing Vector Database](#using-an-existing-vector-database)
14. [Low-Level Building Blocks](#low-level-building-blocks)
15. [Project Structure](#project-structure)
16. [Author &amp; Copyright](#author--copyright)
17. [License](#license)

---

## Overview

`thinking` is a **Python library** that implements the **ADAM analytics framework** for
biomedical RAG research. Its first sub-package, `thinking.alzheimers`, provides an
end-to-end pipeline purpose-built for Alzheimer's disease literature:

| Stage              | What it does                                                          | Key dependency                          |
| ------------------ | --------------------------------------------------------------------- | --------------------------------------- |
| **Download** | Fetch full-text PMC articles via NCBI E-utilities                     | `pubmed-stream`                       |
| **Index**    | Chunk text, embed with MedEmbed / MiniLM, upsert to ChromaDB          | `chromadb`, `sentence-transformers` |
| **Query**    | Retrieve semantically relevant chunks, build context, generate answer | `ollama` / HTTP fallback              |

The design mirrors the `RAG_Comparison_Demo.ipynb` notebook, which benchmarks six
evidence-depth conditions side-by-side on Alzheimer's disease / gut-microbiome
research questions.

---

## Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                        thinking library                        │
│                                                                 │
│  ┌──────────────┐   ┌──────────────┐   ┌───────────────────┐   │
│  │  downloader  │   │   indexer    │   │    RAGPipeline    │   │
│  │              │   │              │   │                   │   │
│  │ pubmed-stream│──►│ ChromaDB     │──►│ Retriever         │   │
│  │ NCBI ESearch │   │ (persistent) │   │ + OllamaLLM       │   │
│  │ + EFetch     │   │              │   │                   │   │
│  └──────────────┘   └──────────────┘   └───────────────────┘   │
│          ▲                                       │              │
│          │              AlzheimersRAG            ▼              │
│     keyword /           (façade)          RAGResult             │
│     corpus                                .text                 │
│                                           .sources              │
│                                           .summary()            │
└─────────────────────────────────────────────────────────────────┘
```

**RAG data-flow:**

```
User question
      │
      ▼
Retriever.retrieve(top_k)          ← cosine similarity in ChromaDB
      │
      ▼
build_context(token_budget)        ← respects LLM context window
      │
      ▼
OllamaLLM.generate(prompt)         ← local inference via Ollama
      │
      ▼
RAGResult(.text, .sources, .summary())
```

---

## Prerequisites

| Requirement                           | Notes                                                |
| ------------------------------------- | ---------------------------------------------------- |
| **Python ≥ 3.8**               | Tested on 3.8 – 3.12                                |
| **[Ollama](https://ollama.com/)**  | Local LLM server — install and run before querying  |
| **An Ollama model**             | e.g.`ollama pull llama3.2` or any compatible model |
| **~2 GB disk**                  | For a small ChromaDB of 500 papers                   |
| **NCBI account** *(optional)* | Free API key for faster downloads                    |

### Install and start Ollama

```bash
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
ollama serve                        # keep running in a separate terminal

# Pull a model (choose one)
ollama pull llama3.2                # general-purpose, fast
ollama pull llama3.1:8b             # larger context
ollama pull gpt-oss:20b             # reasoning backbone (default in this project)
ollama pull gpt-oss:120b            # larger reasoning backbone (requires much more VRAM)
```

### Use Ollama gpt-oss models as backbone LLMs

This project commonly uses `gpt-oss:20b` and `gpt-oss:120b` as backbone models.

| Model            | Recommended minimum hardware                                     | Typical use                                 | Notes                                                                                                                  |
| ---------------- | ---------------------------------------------------------------- | ------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `gpt-oss:20b`  | 1 GPU with ~16 GB VRAM (or CPU fallback, slower)                 | Default for most AD RAG analysis            | Best balance of speed, cost, and quality                                                                               |
| `gpt-oss:120b` | Multi-GPU setup with large aggregate VRAM (roughly 65+ GB class) | Highest-depth analysis when resources allow | Better reasoning depth, much heavier runtime footprint                                                                 |
| Fallback profile | CPU-only or low-VRAM setup                                       | Basic validation and lightweight testing    | Prefer smaller models (for example `llama3.2`), reduce mode depth (`basic`/`standard`), expect slower generation |

1. Start the Ollama server in one terminal:

```bash
ollama serve
```

2. In a second terminal, confirm your model is available:

```bash
ollama list
```

3. Select the backbone model in Python or CLI:

```python
from thinking.alzheimers import AlzheimersRAG

rag = AlzheimersRAG(model="gpt-oss:20b")
# or
rag = AlzheimersRAG(model="gpt-oss:120b")
```

```bash
alzheimers query "Explain gut-brain-axis mechanisms in AD" \
    --model gpt-oss:20b \
    --mode detailed

# Larger model option
alzheimers query "Explain gut-brain-axis mechanisms in AD" \
    --model gpt-oss:120b \
    --mode detailed
```

> Tip: `gpt-oss:120b` usually needs significantly more GPU memory than `gpt-oss:20b`.
> If generation fails due to resources, switch to `gpt-oss:20b`.

---

## Installation

```bash
# From PyPI (recommended)
pip install thinking

# With the Ollama Python client (enables richer error messages)
pip install "thinking[llm]"

# With biomedical MedEmbed embeddings (higher retrieval quality for AD research)
pip install "thinking[embeddings]"

# Everything at once
pip install "thinking[all]"
```

> The core package uses `requests` for Ollama HTTP calls, so the LLM works even
> without the optional `ollama` Python package.

---

## Configuration

Two environment variables control the most common settings. Set them once and
every subsequent call picks them up automatically.

```bash
# Required if your knowledge base is not in ./thinking_db
export THINKING_DB_PATH="/path/to/your/chromadb"

# Recommended for faster PMC downloads (3 req/s → 10 req/s)
export NCBI_API_KEY="your_ncbi_api_key"
export NCBI_EMAIL="you@institution.edu"
```

Compatibility notes:

- Legacy imports such as `import neuromind` and `from neuromind.alzheimers import AlzheimersRAG` are still supported.
- Legacy environment variables `NEUROMIND_DB_PATH` and `NEUROMIND_FAISS_DB_PATH` are accepted as fallbacks.

Get a free NCBI API key at [https://www.ncbi.nlm.nih.gov/account/](https://www.ncbi.nlm.nih.gov/account/).

### Configure ChromaDB or FAISS

`thinking.alzheimers` supports two vector backends:

- `backend="chroma"` (default): persistent ChromaDB directory
- `backend="faiss"`: FAISS index + SQLite metadata bundle

#### Option A: ChromaDB backend (default)

Use this if your vector DB was built via `rag.index()` or `alzheimers index`.

```bash
# Path to your persistent ChromaDB directory
export THINKING_DB_PATH="/path/to/chromadb"
```

```python
from thinking.alzheimers import AlzheimersRAG

# Reads THINKING_DB_PATH automatically
rag = AlzheimersRAG(backend="chroma")

# Or pass the path explicitly
rag = AlzheimersRAG(backend="chroma", db_path="/path/to/chromadb")
```

#### Option B: FAISS backend

Use this if you already have a FAISS database directory containing:

- `alzheimers_ivfflat.index`
- `id_map.json`
- `metadata.db`

```bash
# Path to your FAISS database directory
export THINKING_FAISS_DB_PATH="/path/to/faiss_db"
```

```python
from thinking.alzheimers import AlzheimersRAG

# Reads THINKING_FAISS_DB_PATH automatically
rag = AlzheimersRAG(backend="faiss")

# Or pass the path explicitly
rag = AlzheimersRAG(backend="faiss", db_path="/path/to/faiss_db")
```

#### Quick verification

```python
result = rag.query("What is the evidence for gut-brain axis changes in AD?", mode="basic")
print(result.summary())
```

If this returns sources and timing, your vector backend is configured correctly.

#### Troubleshooting vector DB setup

- `No sources returned`: verify your DB path and backend (`chroma` vs `faiss`) match the actual database.
- `FAISS file not found`: ensure the FAISS directory contains `alzheimers_ivfflat.index`, `id_map.json`, and `metadata.db`.
- `Empty collection` on Chroma: run indexing first (`rag.index(...)` or `alzheimers index ...`).
- `Cannot connect to Ollama`: start server with `ollama serve`, then re-run your query.
- `Model not found`: run `ollama pull gpt-oss:20b` (or `gpt-oss:120b`) and confirm with `ollama list`.

---

## Quick Start — 3 Lines

```python
from thinking.alzheimers import AlzheimersRAG

rag = AlzheimersRAG()                                         # reads THINKING_DB_PATH
result = rag.query("What is the gut-brain axis in AD?")       # default: detailed mode
print(result.text)
```

That's it — if you already have a ChromaDB indexed, you can query immediately.

---

## Step-by-Step Guide

### Step 1 — Download Papers

`thinking` fetches full-text articles from **PubMed Central (PMC)** via the
NCBI E-utilities API using `pubmed-stream`.

#### Single keyword search

```python
from thinking.alzheimers import AlzheimersRAG

rag = AlzheimersRAG(db_path="./my_db", model="llama3.2")

stats = rag.download(
    keyword="Alzheimer's disease gut microbiome",
    max_results=100,
    output_dir="./publications",    # JSON files written here
)
print(f"Downloaded {stats.downloaded} articles")
```

#### Full built-in AD corpus (15 topics)

```python
# Downloads across all built-in Alzheimer's research topics:
# amyloid, tau, neuroinflammation, gut-brain axis, biomarkers, etc.
results = rag.download_corpus(
    max_results_per_topic=50,       # 50 × 15 topics = up to 750 papers
    output_dir="./publications",
)
```

#### Custom topic list

```python
from thinking.alzheimers import download_papers

download_papers(
    keyword="gut microbiome fecal transplant Alzheimer",
    max_results=200,
    output_dir="./publications",
    api_key="YOUR_NCBI_KEY",        # optional
    email="you@example.com",        # optional
)
```

> Articles are saved as individual `.json` files in `output_dir`.
> Re-running is safe — already-downloaded articles are skipped.

---

### Step 2 — Index into ChromaDB

Indexing reads every `.json` file in your publications directory, splits the
full text into overlapping chunks, embeds them, and upserts into ChromaDB.

```python
stats = rag.index(
    papers_dir="./publications",    # where download() wrote files
    chunk_size=2000,                # characters per chunk
    chunk_overlap=200,              # overlap between consecutive chunks
)
print(stats)
# IndexStats(indexed=94, skipped=6, failed=0, chunks=2_341)
print(f"Success rate: {stats.success_rate:.1f}%")
```

> **Note**: the same `embedding_fn` must be used for both indexing and querying.
> By default `thinking` uses MedEmbed (biomedical, 1024-dim) when
> `sentence-transformers` is installed, and falls back to `all-MiniLM-L6-v2`
> otherwise.

---

### Step 3 — Query with RAG

```python
result = rag.query(
    question="What evidence links gut microbiome dysbiosis to Alzheimer's pathogenesis?",
    mode="detailed",                # see RAG Modes table below
    max_response_tokens=3000,
    temperature=0.3,
)

# Full answer text
print(result.text)

# One-line summary
print(result.summary())
# → [detailed] 847 words | 15 sources | 69,214 ctx tokens | 58.3s

# Inspect retrieved sources
for i, src in enumerate(result.sources, 1):
    print(f"[{i}] {src.get('pmcid', 'N/A')}  sim={src['similarity']:.3f}")
```

#### Available modes at a glance

```python
# All eight modes work as strings or RAGMode enum values
result = rag.query(question, mode="no_rag")           # LLM only — no retrieval
result = rag.query(question, mode="retrieval_only")   # raw excerpts, no LLM
result = rag.query(question, mode="basic")            # 5 papers, ~20K ctx
result = rag.query(question, mode="standard")         # 10 papers, ~40K ctx
result = rag.query(question, mode="detailed")         # 15 papers, ~70K ctx ⭐
result = rag.query(question, mode="comprehensive")    # 25 papers, ~100K ctx
result = rag.query(question, mode="cot_standard")     # 15 papers, 2-pass CoT, ~70K ctx
result = rag.query(question, mode="cot_detailed")     # 25 papers, 2-pass CoT, ~100K ctx
```

---

### Step 4 — Compare All RAG Modes

`compare_modes()` runs the same question through every mode in one call,
returning a typed `dict[RAGMode, RAGResult]`.

```python
from thinking.alzheimers import AlzheimersRAG, RAGMode

rag = AlzheimersRAG()
question = "What therapeutic strategies target the microbiome in Alzheimer's disease?"

all_results = rag.compare_modes(
    question,
    max_response_tokens=2000,
    temperature=0.3,
)

# Print summary for every mode
for mode, result in all_results.items():
    print(result.summary())

# Example output:
# [no_rag]        412 words |  0 sources |        0 ctx tokens |  18.2s
# [retrieval_only]   0 words |  5 sources |        0 ctx tokens |   0.8s
# [basic]         381 words |  5 sources |   19,204 ctx tokens |  22.4s
# [standard]      634 words | 10 sources |   38,771 ctx tokens |  35.1s
# [detailed]      847 words | 15 sources |   69,214 ctx tokens |  58.3s
# [comprehensive] 1203 words | 25 sources |   98,602 ctx tokens |  91.7s

# Run only selected modes
subset = rag.compare_modes(
    question,
    modes=[RAGMode.BASIC, RAGMode.DETAILED, RAGMode.COMPREHENSIVE],
)
```

---

## RAG Modes Explained

| Mode               | Retrieved papers | Context budget | Typical generation time | Best for                                         |
| ------------------ | ---------------: | -------------: | ----------------------: | ------------------------------------------------ |
| `no_rag`         |                0 |             — |               ~15–20 s | Hallucination baseline                           |
| `retrieval_only` |                5 |             — |                   < 1 s | Inspecting raw evidence in the DB                |
| `basic`          |                5 |   ~20 K tokens |               ~20–30 s | Quick grounded answer                            |
| `standard`       |               10 |   ~40 K tokens |               ~30–50 s | Daily clinical queries                           |
| `detailed` ⭐    |               15 |   ~70 K tokens |               ~50–80 s | **Recommended default**                    |
| `comprehensive`  |               25 |  ~100 K tokens |              ~80–120 s | Publications, complex cases                      |
| `cot_standard`   |               15 |   ~70 K tokens |             ~100–160 s | High-stakes clinical reasoning (2-pass CoT)      |
| `cot_detailed`   |               25 |  ~100 K tokens |             ~150–200 s | Research publications, peak quality (2-pass CoT) |

The context budget is enforced against the LLM context window (`num_ctx`, default 131 072 tokens).
Chunks that would overflow the budget are silently dropped.

---

## Command-Line Interface (CLI)

After `pip install thinking`, the `alzheimers` command is available globally.

```
Usage: alzheimers <command> [options]
       python -m thinking.alzheimers <command> [options]
```

### `download` — fetch PMC articles

```bash
# Single keyword search
alzheimers download "Alzheimer gut microbiome" --max-results 100

# Full built-in AD corpus (15 topics)
alzheimers download --corpus --max-results 50

# Custom output directory
alzheimers download "neuroinflammation tau" --max-results 200 --output-dir ./papers

# With NCBI credentials for higher rate limits
alzheimers download "amyloid beta" \
    --max-results 500 \
    --api-key YOUR_KEY \
    --email you@institution.edu
```

### `index` — build the vector database

```bash
# Index publications/ into ./thinking_db
alzheimers index ./publications

# Custom DB path and chunk settings
alzheimers index ./publications \
    --db-path ./my_chromadb \
    --chunk-size 1500 \
    --chunk-overlap 150

# Check how many chunks are stored
alzheimers index ./publications --db-path ./my_chromadb --dry-run
```

### `query` — ask a question

```bash
# Default mode (detailed)
alzheimers query "What is the role of tau in Alzheimer's?"

# Choose a specific mode
alzheimers query "Gut microbiome therapies in AD" --mode comprehensive

# Custom DB and model
alzheimers query "APOE4 risk factors" \
    --db-path ./my_chromadb \
    --model llama3.1:8b \
    --mode standard

# All options
alzheimers query "..." \
    --mode detailed \
    --db-path ./thinking_db \
    --model gpt-oss:20b \
    --max-tokens 4000 \
    --temperature 0.2
```

### `compare` — run all modes on one question

```bash
alzheimers compare "What therapeutic strategies target the gut-brain axis in AD?"

# With a specific DB
alzheimers compare "Neuroinflammation biomarkers" --db-path ./my_chromadb
```

---

## Python API Reference

### `AlzheimersRAG` — constructor parameters

```python
from thinking.alzheimers import AlzheimersRAG

rag = AlzheimersRAG(
    db_path="./thinking_db",           # ChromaDB directory
                                        # overridden by THINKING_DB_PATH env var
    collection_name="publications",     # ChromaDB collection name
    model="gpt-oss:20b",                # Ollama model tag
    ollama_host="http://localhost:11434",# Ollama server URL
    num_ctx=131_072,                    # LLM context window in tokens
    embedding_fn=None,                  # custom ChromaDB EmbeddingFunction
                                        # default: MedEmbed → all-MiniLM-L6-v2
)
```

### Methods

#### `rag.download(keyword, max_results, output_dir, **kwargs)`

Download PMC articles for a keyword search.

| Parameter       | Type    | Default                   | Description                                                                       |
| --------------- | ------- | ------------------------- | --------------------------------------------------------------------------------- |
| `keyword`     | `str` | `"Alzheimer's disease"` | PubMed search query                                                               |
| `max_results` | `int` | `100`                   | Maximum articles to download                                                      |
| `output_dir`  | `str` | `"./publications"`      | Directory for JSON output                                                         |
| `**kwargs`    |         |                           | Forwarded to `pubmed-stream` (`api_key`, `email`, `use_concurrent`, etc.) |

Returns: `DownloadStats` (from `pubmed-stream`)

#### `rag.download_corpus(topics, max_results_per_topic, output_dir, **kwargs)`

Download across multiple AD research topics using the built-in topic list.

| Parameter                 | Type                 | Default                             | Description               |
| ------------------------- | -------------------- | ----------------------------------- | ------------------------- |
| `topics`                | `list[str] \| None` | `None` → use `AD_SEARCH_TERMS` | Topic list                |
| `max_results_per_topic` | `int`              | `100`                             | Papers per topic          |
| `output_dir`            | `str`              | `"./publications"`                | Directory for JSON output |

Returns: `list[DownloadStats]`

#### `rag.index(papers_dir, chunk_size, chunk_overlap)`

Chunk and embed all JSON files in `papers_dir`, upsert into ChromaDB.

| Parameter         | Type    | Default              | Description                      |
| ----------------- | ------- | -------------------- | -------------------------------- |
| `papers_dir`    | `str` | `"./publications"` | Source directory                 |
| `chunk_size`    | `int` | `2000`             | Characters per chunk             |
| `chunk_overlap` | `int` | `200`              | Character overlap between chunks |

Returns: `IndexStats`

```python
stats = rag.index()
print(stats.indexed)       # files successfully indexed
print(stats.skipped)       # files already in DB (unchanged)
print(stats.failed)        # files that raised errors
print(stats.total_chunks)  # total chunks stored in ChromaDB
print(stats.success_rate)  # percentage (0.0 – 100.0)
```

#### `rag.query(question, mode, max_response_tokens, temperature)`

Answer a research question under the specified RAG mode.

| Parameter               | Type              | Default        | Description                              |
| ----------------------- | ----------------- | -------------- | ---------------------------------------- |
| `question`            | `str`           | *(required)* | Research or clinical question            |
| `mode`                | `RAGMode \| str` | `"detailed"` | RAG mode (see table above)               |
| `max_response_tokens` | `int`           | `3000`       | Max tokens for LLM output                |
| `temperature`         | `float`         | `0.3`        | Sampling temperature (0 = deterministic) |

Returns: `RAGResult`

#### `rag.compare_modes(question, modes, max_response_tokens, temperature)`

Run one question through multiple RAG modes.

| Parameter               | Type                     | Default         | Description            |
| ----------------------- | ------------------------ | --------------- | ---------------------- |
| `question`            | `str`                  | *(required)*  | Research question      |
| `modes`               | `list[RAGMode] \| None` | all eight modes | Subset of modes to run |
| `max_response_tokens` | `int`                  | `2000`        | Max tokens per mode    |
| `temperature`         | `float`                | `0.3`         | Sampling temperature   |

Returns: `dict[RAGMode, RAGResult]`

---

## Working with Results

Every `rag.query()` call returns a `RAGResult` dataclass.

```python
result = rag.query("How does gut dysbiosis affect Alzheimer's pathology?", mode="detailed")

# Text response
print(result.text)                  # full LLM answer

# Metadata
print(result.mode)                  # RAGMode.DETAILED
print(result.query)                 # original question string
print(result.context_tokens)        # tokens sent to LLM: e.g. 69_214
print(result.response_words)        # word count of answer: e.g. 847
print(result.retrieve_time)         # seconds for vector search: e.g. 0.82
print(result.generate_time)         # seconds for generation: e.g. 57.5
print(result.total_time)            # retrieve_time + generate_time

# One-line summary
print(result.summary())
# → [detailed] 847 words | 15 sources | 69,214 ctx tokens | 58.3s

# Sources (list of dicts with metadata from ChromaDB)
for i, src in enumerate(result.sources, 1):
    pmcid = src.get("pmcid", "N/A")
    title = src.get("title", "")
    sim   = src.get("similarity", 0)
    print(f"  [{i}] {pmcid}  ({sim:.3f})  {title[:60]}")
```

### `RAGMode` enum values

```python
from thinking.alzheimers import RAGMode

RAGMode.NO_RAG          # "no_rag"
RAGMode.RETRIEVAL_ONLY  # "retrieval_only"
RAGMode.BASIC           # "basic"
RAGMode.STANDARD        # "standard"
RAGMode.DETAILED        # "detailed"
RAGMode.COMPREHENSIVE   # "comprehensive"
RAGMode.COT_STANDARD    # "cot_standard"  — 2-pass CoT, 15 papers
RAGMode.COT_DETAILED    # "cot_detailed"  — 2-pass CoT, 25 papers
```

### `IndexStats` fields

```python
stats.total_files    # int — total JSON files found
stats.indexed        # int — successfully upserted
stats.skipped        # int — already in DB
stats.failed         # int — errors during processing
stats.total_chunks   # int — chunks in ChromaDB after this run
stats.success_rate   # float — indexed / total_files * 100
```

---

## Custom Embeddings (MedEmbed)

By default `thinking` detects whether `sentence-transformers` is installed:

- **Installed** → uses `abhinand/MedEmbed-large-v0.1` (1024-dim, biomedical-grade)
- **Not installed** → falls back to ChromaDB's built-in `all-MiniLM-L6-v2`

To explicitly pass a custom embedding function:

```python
from sentence_transformers import SentenceTransformer
from thinking.alzheimers import AlzheimersRAG

class MedEmbedFn:
    """Wraps MedEmbed as a ChromaDB-compatible embedding function."""
    def __init__(self):
        self._model = SentenceTransformer("abhinand/MedEmbed-large-v0.1")

    def __call__(self, input: list[str]) -> list[list[float]]:
        return self._model.encode(input, normalize_embeddings=True).tolist()

rag = AlzheimersRAG(
    db_path="./my_db",
    embedding_fn=MedEmbedFn(),   # used for both index() and query()
)
```

> **Important**: you must use the **same** embedding function for both `rag.index()`
> and `rag.query()`. Mixing embedding models will produce meaningless similarity scores.

---

## Using an Existing Vector Database

### Existing ChromaDB

If you already have a ChromaDB (e.g. the 917K-chunk research database):

```bash
# Set the path once
export THINKING_DB_PATH="/path/to/existing/chromadb"
```

```python
from thinking.alzheimers import AlzheimersRAG

# Option A — reads THINKING_DB_PATH automatically
rag = AlzheimersRAG()

# Option B — pass explicitly
rag = AlzheimersRAG(db_path="/path/to/existing/chromadb")

# Query immediately — no download or index step needed
result = rag.query("What are the latest probiotic therapies for AD?", mode="comprehensive")
print(result.text)
```

### Existing FAISS database

If your existing vector database is FAISS-based:

```bash
export THINKING_FAISS_DB_PATH="/path/to/existing/faiss_db"
```

```python
from thinking.alzheimers import AlzheimersRAG

# Option A — reads THINKING_FAISS_DB_PATH automatically
rag = AlzheimersRAG(backend="faiss")

# Option B — pass explicitly
rag = AlzheimersRAG(backend="faiss", db_path="/path/to/existing/faiss_db")

result = rag.query("What are the latest probiotic therapies for AD?", mode="comprehensive")
print(result.text)
```

CLI equivalent:

```bash
alzheimers query "What are the latest probiotic therapies for AD?" \
    --backend faiss \
    --db-path /path/to/existing/faiss_db \
    --mode comprehensive
```

---

## Low-Level Building Blocks

For fine-grained control, all components can be used independently:

```python
from thinking.alzheimers import download_papers, download_ad_corpus, AD_SEARCH_TERMS
from thinking.alzheimers import index_directory
from thinking.alzheimers.retriever import Retriever
from thinking.alzheimers.llm import OllamaLLM
from thinking.alzheimers.rag import RAGPipeline
from thinking.alzheimers.types import RAGMode, RAGResult, IndexStats, RetrievalResult

# Download
download_papers(keyword="tau phosphorylation", max_results=50, output_dir="./papers")

# Index
stats = index_directory(
    papers_dir="./papers",
    db_path="./my_db",
    collection_name="publications",
)

# Retrieve without generating
retriever = Retriever(db_path="./my_db", collection_name="publications")
ret: RetrievalResult = retriever.retrieve("gut-brain axis mechanisms", top_k=10)
for doc, meta, sim in zip(ret.documents, ret.metadatas, ret.similarities):
    print(f"{meta.get('pmcid')}  {sim:.3f}  {doc[:100]}")

# Generate without retrieval
llm = OllamaLLM(model="llama3.2", host="http://localhost:11434", num_ctx=131_072)
response = llm.generate("Summarise amyloid cascade hypothesis", max_tokens=500)
print(response.text)

# Combine into a custom pipeline
pipeline = RAGPipeline(retriever=retriever, llm=llm)
result = pipeline.query("What does Lactobacillus do in AD models?", mode=RAGMode.STANDARD)
print(result.summary())
```

---

## Project Structure

```
thinking/                         ← importable Python package
├── __init__.py                    # umbrella re-exports of thinking.alzheimers symbols
├── __main__.py                    # python -m thinking  (help message)
└── alzheimers/                    # sub-package: Alzheimer's disease RAG pipeline
    ├── __init__.py                # AlzheimersRAG façade + public API
    ├── __main__.py                # python -m thinking.alzheimers  (→ CLI)
    ├── cli.py                     # alzheimers CLI: download / index / query / compare
    ├── downloader.py              # PMC download via pubmed-stream + AD_SEARCH_TERMS
    ├── indexer.py                 # text chunking + ChromaDB upsert
    ├── retriever.py               # ChromaDB cosine-similarity retrieval
    ├── llm.py                     # OllamaLLM: Python client + HTTP fallback
    ├── rag.py                     # RAGPipeline: all 8 evidence modes + prompts
    ├── medembed_embedder.py       # MedEmbed ChromaDB embedding function wrapper
    └── types.py                   # RAGMode, RAGResult, IndexStats, RetrievalResult

examples/
└── basic_usage.py                 # self-contained runnable example

RAG_Comparison_Demo.ipynb          # raw pipeline benchmark notebook (low-level)
RAG_Comparison_Demo_thinking.ipynb# same benchmark using the thinking library
pyproject.toml
README.md
LICENSE
```

---

## Author & Copyright

**Author**: Ziyuan Huang

**Copyright © 2026 University of Massachusetts**

- Department of Microbiology
- Department of Emergency Medicine
- Haran Lab
- Bucci Lab
- Microbiology & Microbiome Dynamics AI Hub

---

## License

MIT — see [LICENSE](LICENSE).
