Metadata-Version: 2.4
Name: embed-rerank
Version: 1.1.0
Summary: Single Model Embedding & Reranker API with Apple Silicon acceleration
Project-URL: Documentation, https://github.com/joonsoo-me/embed-rerank#readme
Project-URL: Issues, https://github.com/joonsoo-me/embed-rerank/issues
Project-URL: Source, https://github.com/joonsoo-me/embed-rerank
Author-email: joonsoo-me <bear8203@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: apple-silicon,embeddings,fastapi,mlx,reranking
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: FastAPI
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.13
Requires-Dist: fastapi>=0.104.0
Requires-Dist: httpx>=0.25.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: prometheus-client
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv
Requires-Dist: python-multipart
Requires-Dist: sentence-transformers>=2.2.0
Requires-Dist: structlog
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.30.0
Requires-Dist: uvicorn[standard]>=0.24.0
Provides-Extra: dev
Requires-Dist: bandit[toml]>=1.7.5; extra == 'dev'
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: flake8>=6.0.0; extra == 'dev'
Requires-Dist: isort>=5.12.0; extra == 'dev'
Requires-Dist: locust>=2.16.0; extra == 'dev'
Requires-Dist: mypy>=1.5.0; extra == 'dev'
Requires-Dist: pre-commit>=3.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.11.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Requires-Dist: safety>=2.3.0; extra == 'dev'
Requires-Dist: types-requests>=2.31.0; extra == 'dev'
Provides-Extra: mlx
Requires-Dist: mlx-lm>=0.2.0; (sys_platform == 'darwin') and extra == 'mlx'
Requires-Dist: mlx>=0.4.0; (sys_platform == 'darwin') and extra == 'mlx'
Description-Content-Type: text/markdown

# 🔥 Single Model Embedding & Reranking API

<div align="center">
<strong>Lightning-fast local embeddings & reranking for Apple Silicon (MLX-first, OpenAI & TEI compatible)</strong>
<br/><br/>
<a href="https://pypi.org/project/embed-rerank/"><img src="https://img.shields.io/pypi/v/embed-rerank?logo=pypi&logoColor=white" /></a>
<a href="https://pypi.org/project/embed-rerank/"><img src="https://img.shields.io/pypi/dm/embed-rerank?logo=pypi&logoColor=white" /></a>
<a href="https://pypi.org/project/embed-rerank/"><img src="https://img.shields.io/pypi/pyversions/embed-rerank?logo=python&logoColor=white" /></a>
<a href="https://github.com/joonsoo-me/embed-rerank/blob/main/LICENSE"><img src="https://img.shields.io/github/license/joonsoo-me/embed-rerank?logo=opensource&logoColor=white" /></a>
<a href="https://developer.apple.com/silicon/"><img src="https://img.shields.io/badge/Apple_Silicon-Ready-blue?logo=apple&logoColor=white" /></a>
<a href="https://ml-explore.github.io/mlx/"><img src="https://img.shields.io/badge/MLX-Optimized-green?logo=apple&logoColor=white" /></a>
<a href="https://fastapi.tiangolo.com/"><img src="https://img.shields.io/badge/FastAPI-009688?logo=fastapi&logoColor=white" /></a>
</div>

---

## ⚡ Why This Matters

Transform your text processing with **10x faster** embeddings and reranking on Apple Silicon. Drop-in replacement for OpenAI API and Hugging Face TEI with **zero code changes** required.

### 🏆 Performance Comparison

| Operation | This API (MLX) | OpenAI API | Hugging Face TEI |
|-----------|----------------|------------|------------------|
| **Embeddings** | `0.78ms` | `200ms+` | `15ms` |
| **Reranking** | `1.04ms` | `N/A` | `25ms` |
| **Model Loading** | `0.36s` | `N/A` | `3.2s` |
| **Cost** | `$0` | `$0.02/1K` | `$0` |

*Tested on Apple M4 Max*

---

## 🚀 Quick Start

### Option 1: Install from PyPI (Recommended)

```bash
# Install the package
pip install embed-rerank

# Start the server (default port 9000)
embed-rerank

# Or with custom port and options
embed-rerank --port 8080 --host 127.0.0.1

# See all options
embed-rerank --help
```

**CLI Options:**
- `--host`: Server host (default: 0.0.0.0)
- `--port`: Server port (default: 9000)
- `--reload`: Enable auto-reload for development
- `--log-level`: Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)

**Testing Options:**
- `--test quick`: Run quick validation tests
- `--test performance`: Run performance benchmark tests  
- `--test quality`: Run quality validation tests
- `--test full`: Run comprehensive test suite
- `--test-url`: Custom server URL for testing
- `--test-output`: Test output directory

**Performance Testing:**
```bash
# Start server in background
embed-rerank --port 8080 &

# Run performance tests
embed-rerank --test performance --test-url http://localhost:8080

# Run comprehensive test suite
embed-rerank --test full --test-url http://localhost:8080

# Stop server
pkill -f embed-rerank
```

**Environment Variables:**
```bash
# Alternative: Use environment variables
export PORT=8080
export HOST=127.0.0.1
embed-rerank
```

### Option 2: From Source

```bash
# 1. Clone and setup
git clone https://github.com/joonsoo-me/embed-rerank.git
cd embed-rerank
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 2. Start server (macOS/Linux)
./tools/server-run.sh

# 3. Test it works
curl http://localhost:9000/health/
```

🎉 **Done!** Visit http://localhost:9000/docs for interactive API documentation.

---

## 🛠 Server Management (macOS/Linux)

```bash
# Start server (background)
./tools/server-run.sh

# Start server (foreground/development)
./tools/server-run-foreground.sh

# Stop server
./tools/server-stop.sh
```

> **Windows Support**: Coming soon! Currently optimized for macOS/Linux.

---

## ⚙️ Configuration

Create `.env` file (optional):

```env
# Server
PORT=9000
HOST=0.0.0.0

# Backend
BACKEND=auto                                   # auto | mlx | torch
MODEL_NAME=mlx-community/Qwen3-Embedding-4B-4bit-DWQ

# Model Cache (first run downloads ~2.3GB model)
MODEL_PATH=                               # Custom model directory
TRANSFORMERS_CACHE=                           # HF cache override
# Default: ~/.cache/huggingface/hub/

# Performance
BATCH_SIZE=32
MAX_TEXTS_PER_REQUEST=100
```

---

### 📂 Model Cache Management

The service automatically manages model downloads and caching:

| Environment Variable | Purpose | Default |
|---------------------|---------|---------|
| `MODEL_PATH` | Custom model directory | *(uses HF cache)* |
| `TRANSFORMERS_CACHE` | Override HF cache location | `~/.cache/huggingface/transformers` |
| `HF_HOME` | HF home directory | `~/.cache/huggingface` |
| *(auto)* | Default HF cache | `~/.cache/huggingface/hub/` |

#### Cache Location Check
``` bash
# Find where your model is cached
python3 -c "
import os
print('MODEL_PATH:', os.getenv('MODEL_PATH', '<not set>'))
print('TRANSFORMERS_CACHE:', os.getenv('TRANSFORMERS_CACHE', '<not set>'))
print('HF_HOME:', os.getenv('HF_HOME', '<not set>'))
print('Default cache:', os.path.expanduser('~/.cache/huggingface/hub'))
"

# List cached Qwen3 models
ls ~/.cache/huggingface/hub | grep -i qwen3 || echo "No Qwen3 models found in cache"
```

---

## 🌐 Three APIs, One Service

| API | Endpoint | Use Case |
|-----|----------|----------|
| **Native** | `/api/v1/embed`, `/api/v1/rerank` | New projects |
| **OpenAI** | `/v1/embeddings` | Existing OpenAI code |
| **TEI** | `/embed`, `/rerank` | Hugging Face TEI replacement |

### OpenAI Compatible (Drop-in)

```python
import openai

client = openai.OpenAI(
    api_key="dummy-key",
    base_url="http://localhost:9000/v1"
)

response = client.embeddings.create(
    input=["Hello world", "Apple Silicon is fast!"],
    model="text-embedding-ada-002"
)
# 🚀 10x faster than OpenAI, same code!
```

### TEI Compatible

```bash
curl -X POST "http://localhost:9000/embed" 
  -H "Content-Type: application/json" 
  -d '{"inputs": ["Hello world"], "truncate": true}'
```

### Native API

```bash
# Embeddings
curl -X POST "http://localhost:9000/api/v1/embed/" 
  -H "Content-Type: application/json" 
  -d '{"texts": ["Apple Silicon", "MLX acceleration"]}'

# Reranking  
curl -X POST "http://localhost:9000/api/v1/rerank/" 
  -H "Content-Type: application/json" 
  -d '{"query": "machine learning", "passages": ["AI is cool", "Dogs are pets", "MLX is fast"]}'
```

---

## 🧪 Testing

### Built-in Performance Testing

```bash
# Quick validation (basic functionality)
embed-rerank --test quick

# Performance benchmark (latency, throughput, concurrency)
embed-rerank --test performance --test-url http://localhost:9000

# Quality validation (semantic similarity, multilingual)  
embed-rerank --test quality --test-url http://localhost:9000

# Full comprehensive test suite
embed-rerank --test full --test-url http://localhost:9000
```

**Test Results Include:**
- 📊 Latency metrics (mean, P95, P99)
- 🚀 Throughput analysis (texts/sec)
- 🔄 Concurrency testing
- 🧠 Semantic similarity validation
- 🌍 Multilingual support testing
- 📈 JSON reports with detailed metrics

### Advanced Testing

```bash
# Comprehensive test suite (shell script)
./tools/server-tests.sh

# Quick health & model loaded info check
curl http://localhost:9000/health/

# Run pytest
pytest tests/ -v
```

---

## 🚀 What You Get

- ✅ **Zero Code Changes**: Drop-in replacement for OpenAI API and TEI
- ⚡ **10x Performance**: Apple MLX acceleration on Apple Silicon  
- 💰 **Zero Costs**: No API fees, runs locally
- 🔒 **Privacy**: Your data never leaves your machine
- 🎯 **Three APIs**: Native, OpenAI, and TEI compatibility
- 📊 **Production Ready**: Health checks, monitoring, structured logging

---

## 📄 License

MIT License - build amazing things with this code!
