Metadata-Version: 2.4
Name: evalytic
Version: 0.3.11
Summary: Visual Generation Quality Evaluation SDK
Project-URL: Homepage, https://evalytic.ai
Project-URL: Documentation, https://docs.evalytic.ai
Project-URL: Repository, https://github.com/evalytic/evalytic
Project-URL: Issues, https://github.com/evalytic/evalytic/issues
Author-email: Evalytic <hello@evalytic.ai>
License-Expression: MIT
License-File: LICENSE
Keywords: ai,evaluation,image,quality,video,vlm
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: click>=8.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: jinja2>=3.0
Requires-Dist: pillow>=10.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: rich>=13.0
Provides-Extra: all
Requires-Dist: fal-client>=0.5.0; extra == 'all'
Requires-Dist: insightface>=0.7; extra == 'all'
Requires-Dist: lpips>=0.1.4; extra == 'all'
Requires-Dist: onnxruntime>=1.16; extra == 'all'
Requires-Dist: pyiqa>=0.1.10; extra == 'all'
Requires-Dist: pytesseract>=0.3.10; extra == 'all'
Requires-Dist: torch>=2.0; extra == 'all'
Requires-Dist: transformers>=4.30; extra == 'all'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Provides-Extra: generation
Requires-Dist: fal-client>=0.5.0; extra == 'generation'
Provides-Extra: metrics
Requires-Dist: insightface>=0.7; extra == 'metrics'
Requires-Dist: lpips>=0.1.4; extra == 'metrics'
Requires-Dist: onnxruntime>=1.16; extra == 'metrics'
Requires-Dist: pyiqa>=0.1.10; extra == 'metrics'
Requires-Dist: torch>=2.0; extra == 'metrics'
Requires-Dist: transformers>=4.30; extra == 'metrics'
Provides-Extra: ocr
Requires-Dist: pytesseract>=0.3.10; extra == 'ocr'
Description-Content-Type: text/markdown

# Evalytic

**Evals for visual AI.** Automated quality evaluation for AI-generated images and video.

[![PyPI](https://img.shields.io/pypi/v/evalytic)](https://pypi.org/project/evalytic/)
[![Python](https://img.shields.io/pypi/pyversions/evalytic)](https://pypi.org/project/evalytic/)
[![License](https://img.shields.io/pypi/l/evalytic)](https://github.com/evalytic/evalytic/blob/main/LICENSE)

Know if your AI-generated visuals are good before your users tell you they're not.

```bash
pip install evalytic

# Score any image you already have
evaly eval --image output.png --prompt "A sunset over mountains" --yes

# Compare models side by side
evaly bench -m flux-schnell -m flux-dev -m flux-pro \
  -p "A product photo on marble countertop" --yes
```

## What It Does

Evalytic scores AI-generated images using two complementary approaches:

- **VLM Judges** (Gemini, GPT, Claude, Ollama) evaluate semantic dimensions like prompt adherence, text rendering, and identity preservation
- **Local Metrics** (sharpness, CLIP, LPIPS, ArcFace) run on your machine, free, no API key needed

Use both together or either one alone. Works with any image, from any provider or your own pipeline.

### Use Cases

- **Model Selection** - Compare models with real prompts, pick the best one for your use case
- **Prompt Optimization** - Measure how well models follow your prompts across dimensions
- **Regression Detection** - Catch quality drops when models or prompts update
- **CI/CD Quality Gate** - Block deploys when image quality falls below threshold
- **7 Semantic Dimensions** - visual_quality, prompt_adherence, text_rendering, input_fidelity, transformation_quality, artifact_detection, identity_preservation
- **Consensus Judging** - Multi-judge scoring with automatic agreement analysis

## Quickstart

### 1. Install

```bash
pip install evalytic
```

### 2. See Real Examples (no API key needed)

```bash
evaly demo              # Opens showcase with 4 real benchmark case studies
evaly demo face         # Face identity preservation comparison
evaly demo flagship     # Flux Schnell vs Dev vs Pro cost/quality
```

### 3. Score an Existing Image

```bash
# Local metrics only (free, no API key)
evaly eval --image output.png --prompt "A sunset over mountains" --no-judge

# With VLM judge
export GEMINI_API_KEY=your_gemini_key
evaly eval --image output.png --prompt "A sunset over mountains" --yes
```

### 4. Benchmark Models

```bash
export FAL_KEY=your_fal_key

# Text-to-image
evaly bench -m flux-schnell -m flux-dev -m flux-pro \
  -p "A cat sitting on a windowsill" --yes

# Image-to-image
evaly bench -m flux-kontext -m seedream-edit -m reve-edit \
  --inputs product.jpg -p "Place on a marble countertop" --yes

# Metrics only, no VLM judge
evaly bench -m flux-schnell -m flux-dev -p "A cat" --no-judge
```

### 5. Interactive Setup

```bash
evaly init   # Guided setup: use case, API keys, config file
```

## CLI Commands

| Command | Description |
|---------|-------------|
| `evaly init` | Interactive setup wizard |
| `evaly demo` | Browse real benchmark showcases (no API key needed) |
| `evaly bench` | Generate, score, and report in one command |
| `evaly eval` | Score a single image without generation |
| `evaly gate` | CI/CD quality gate with pass/fail exit codes |

## Judges

Any VLM that can analyze images works as a judge:

```bash
evaly bench -m flux-schnell -p "A cat" -j gemini-2.5-flash            # Default
evaly bench -m flux-schnell -p "A cat" -j openai/gpt-5.2              # OpenAI
evaly bench -m flux-schnell -p "A cat" -j anthropic/claude-sonnet-4-6 # Anthropic
evaly bench -m flux-schnell -p "A cat" -j fal/gemini-2.5-flash        # Via fal.ai (single key)
evaly bench -m flux-schnell -p "A cat" -j ollama/qwen2.5-vl:7b        # Local
```

### Consensus Mode

Use multiple judges for more reliable scores:

```bash
evaly bench -m flux-schnell -p "A cat" \
  --judges "gemini-2.5-flash,openai/gpt-5.2"
```

Two judges score in parallel. If they disagree, a third breaks the tie.

## Optional Extras

```bash
pip install "evalytic[metrics]"  # CLIP Score + LPIPS + ArcFace (~2GB)
pip install "evalytic[all]"      # Everything
```

## Configuration

Create `evalytic.toml` in your project root:

```toml
[keys]
fal = "your_fal_key"
gemini = "your_gemini_key"

[bench]
judge = "gemini-2.5-flash"
dimensions = ["visual_quality", "prompt_adherence"]
concurrency = 4

[bench.dimension_weights]
input_fidelity = 0.5
visual_quality = 0.1
```

## Documentation

Full docs at [docs.evalytic.ai](https://docs.evalytic.ai)

## License

MIT
