Metadata-Version: 2.4
Name: what-changed
Version: 0.2.2
Summary: A universal CLI tool that compares anything and explains what changed and what might break
Author: what-changed contributors
License-Expression: MIT
License-File: LICENSE
Keywords: breaking-changes,cli,compare,diff
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Quality Assurance
Requires-Python: >=3.11
Requires-Dist: deepdiff>=6.0.0
Requires-Dist: networkx>=3.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: textual>=0.40.0
Requires-Dist: typer>=0.9.0
Provides-Extra: all
Requires-Dist: graphql-core>=3.2.0; extra == 'all'
Requires-Dist: jsonschema>=4.0.0; extra == 'all'
Requires-Dist: openapi-spec-validator>=0.7.0; extra == 'all'
Requires-Dist: openpyxl>=3.1.0; extra == 'all'
Requires-Dist: pillow>=10.0.0; extra == 'all'
Requires-Dist: protobuf>=4.0.0; extra == 'all'
Requires-Dist: pymupdf>=1.23.0; extra == 'all'
Requires-Dist: python-docx>=1.1.0; extra == 'all'
Requires-Dist: sqlparse>=0.5.0; extra == 'all'
Requires-Dist: tree-sitter>=0.21.0; extra == 'all'
Provides-Extra: code
Requires-Dist: tree-sitter>=0.21.0; extra == 'code'
Provides-Extra: dev
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: docx
Requires-Dist: python-docx>=1.1.0; extra == 'docx'
Provides-Extra: excel
Requires-Dist: openpyxl>=3.1.0; extra == 'excel'
Provides-Extra: graphql
Requires-Dist: graphql-core>=3.2.0; extra == 'graphql'
Provides-Extra: image
Requires-Dist: pillow>=10.0.0; extra == 'image'
Provides-Extra: openapi
Requires-Dist: jsonschema>=4.0.0; extra == 'openapi'
Requires-Dist: openapi-spec-validator>=0.7.0; extra == 'openapi'
Provides-Extra: pdf
Requires-Dist: pymupdf>=1.23.0; extra == 'pdf'
Provides-Extra: proto
Requires-Dist: protobuf>=4.0.0; extra == 'proto'
Provides-Extra: sql
Requires-Dist: sqlparse>=0.5.0; extra == 'sql'
Description-Content-Type: text/markdown

<p align="center">
  <img src="https://img.shields.io/badge/python-3.10+-blue.svg" alt="Python 3.10+">
  <img src="https://img.shields.io/badge/license-MIT-green.svg" alt="MIT License">
  <img src="https://img.shields.io/badge/platform-Windows%20%7C%20macOS%20%7C%20Linux-lightgrey.svg" alt="Cross-platform">
</p>

<h1 align="center">what-changed</h1>

<p align="center">
  <strong>Compare anything. Understand everything. Break nothing.</strong>
</p>

<p align="center">
  A universal CLI tool that compares files, folders, configs, documents, APIs, and dependencies—then explains <b>what changed</b> and <b>what could break</b>.
</p>

<p align="center">
  <img src="https://raw.githubusercontent.com/aayushadhikari7/what-changed/main/demo.gif" alt="what-changed demo" width="800">
</p>

---

## Why what-changed?

Traditional diff tools show you *what* is different. **what-changed** tells you *why it matters*:

```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Traditional Diff              │ what-changed                                │
├───────────────────────────────┼─────────────────────────────────────────────┤
│ < host: localhost             │ [X] BREAKING: database.host changed         │
│ > host: prod-db.example.com   │                                             │
│                               │ Impact: Database connection points to prod  │
│                               │ Could break: Local dev environments         │
│                               │ Risk Score: 9/10                            │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Key Features

| Feature | Description |
|---------|-------------|
| **Smart Detection** | Auto-identifies 25+ file formats |
| **Semantic Diffing** | Understands structure, not just text |
| **Risk Scoring** | Predicts what might break (0-10 scale) |
| **OCR Support** | Extracts text from scanned PDFs & images |
| **Git Integration** | Compare commits, branches, and tags |
| **Interactive TUI** | Explore changes in a terminal UI |
| **CI/CD Ready** | Exit codes for automation pipelines |

---

## Installation

```bash
pip install what-changed
```

### Optional Extras

```bash
# Full installation with all formats
pip install what-changed[all]

# Individual extras
pip install what-changed[pdf]      # PDF support (PyMuPDF)
pip install what-changed[ocr]      # OCR for scanned docs (Tesseract)
pip install what-changed[openapi]  # OpenAPI/Swagger specs
pip install what-changed[excel]    # Excel spreadsheets
pip install what-changed[image]    # Image comparison
pip install what-changed[sql]      # SQL schema analysis
```

### From Source

```bash
git clone https://github.com/aayushadhikari7/what-changed
cd what-changed
pip install -e ".[all]"
```

---

## Quick Start

```bash
# Compare any two files
what-changed compare old.json new.json

# Compare directories
what-changed compare ./v1 ./v2

# Show only breaking changes
what-changed compare config.yaml config.prod.yaml --breaking-only

# Interactive graph view
what-changed compare api-v1.yaml api-v2.yaml --graph

# Visual stats dashboard
what-changed compare before/ after/ --stats

# Git integration
what-changed git HEAD~1..HEAD
what-changed git main..feature-branch
```

---

## Supported Formats

### Configuration Files
| Format | Extensions | What's Analyzed |
|--------|------------|-----------------|
| JSON | `.json` | Keys, values, nested structures |
| YAML | `.yaml`, `.yml` | Full YAML including anchors |
| TOML | `.toml` | Sections, tables, arrays |
| INI | `.ini`, `.cfg` | Sections and key-value pairs |
| ENV | `.env`, `.env.*` | Environment variables |

### Documents
| Format | Extensions | What's Analyzed |
|--------|------------|-----------------|
| PDF | `.pdf` | Pages, text, TOC, metadata, **OCR for scanned docs** |
| Markdown | `.md` | Headings, links, code blocks, sections |
| Plain Text | `.txt` | Line-by-line diff |

### API Specifications
| Format | Extensions | What's Analyzed |
|--------|------------|-----------------|
| OpenAPI 3.x | `.yaml`, `.json` | Endpoints, schemas, parameters, responses |
| Swagger 2.0 | `.yaml`, `.json` | Full spec comparison |

### Data Files
| Format | Extensions | What's Analyzed |
|--------|------------|-----------------|
| CSV | `.csv` | Schema, columns, rows, statistics |
| TSV | `.tsv` | Tab-separated values |
| Excel | `.xlsx`, `.xls` | Multiple sheets, cells |

### Infrastructure
| Format | Files | What's Analyzed |
|--------|-------|-----------------|
| Dockerfile | `Dockerfile` | Base image, instructions, ports, health checks |
| Docker Compose | `compose.yaml` | Services, volumes, networks |
| SQL | `.sql` | Tables, columns, indexes, constraints |

### Dependencies
| Ecosystem | Files | What's Analyzed |
|-----------|-------|-----------------|
| Node.js | `package.json` | Dependencies, scripts, versions |
| Python | `requirements.txt`, `pyproject.toml` | Packages, version constraints |
| Ruby | `Gemfile` | Gems, sources |
| Rust | `Cargo.toml` | Crates, features |
| Go | `go.mod` | Modules, versions |

### Media & Archives
| Format | Extensions | What's Analyzed |
|--------|------------|-----------------|
| Images | `.png`, `.jpg`, `.gif`, `.webp` | Dimensions, metadata, **OCR text extraction** |
| Archives | `.zip`, `.tar`, `.tar.gz` | File listing, content changes |

---

## Output Modes

### Human-Readable (Default)
```bash
what-changed compare config.old.json config.new.json
```

```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Configuration Comparison                                                    │
│   A: config.old.json                                                        │
│   B: config.new.json                                                        │
└─────────────────────────────────────────────────────────────────────────────┘

[X] 1 breaking  [!] 2 risky  [o] 3 safe

┌─────────── Changes ───────────┐
│ config                        │
│ ├── database (2 changes)      │
│ │   ├── - host <str>          │
│ │   │     localhost           │
│ │   │     -> prod-db.com      │
│ │   └── ~ port <int>          │
│ │         5432 -> 5433        │
│ └── + cache (3 changes)       │
│     └── enabled = true        │
└───────────────────────────────┘
```

### Interactive Graph TUI
```bash
what-changed compare api-v1.yaml api-v2.yaml --graph
```
Navigate changes with keyboard shortcuts:
- `j/k` or arrows: Navigate
- `Enter`: Expand/collapse
- `b`: Show breaking only
- `q`: Quit

### Visual Stats Dashboard
```bash
what-changed compare before/ after/ --stats
```
Shows ASCII charts with change distribution, risk breakdown, and file type statistics.

### JSON Output
```bash
what-changed compare config.json config.prod.json --json
```

### Summary Mode
```bash
what-changed compare src/ dist/ --summary
```

---

## Git Integration

Compare any git references directly:

```bash
# Compare with previous commit
what-changed git HEAD~1..HEAD

# Compare branches
what-changed git main..feature-branch

# Compare tags
what-changed git v1.0.0..v2.0.0

# Compare specific file across commits
what-changed git HEAD~5..HEAD -p src/config.json

# Show only breaking changes
what-changed git main..develop --breaking-only
```

---

## OCR Support

### Scanned PDFs
Automatically extracts text from scanned PDF documents using Tesseract OCR:

```bash
what-changed compare scanned_v1.pdf scanned_v2.pdf
```

```
┌─────────────────────────────────────────────────────────────────────────────┐
│ PDF Document Comparison                                                     │
│   A: scanned_v1.pdf (OCR detected)                                          │
│   B: scanned_v2.pdf (OCR detected)                                          │
└─────────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────── page_1  ──────────────────────────────────┐
│ -Invoice #12345                                                             │
│ +Invoice #12345 - REVISED                                                   │
│ -Total: $450                                                                │
│ +Total: $550                                                                │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Images with Text
Compare images and extract text changes:

```bash
what-changed compare screenshot_old.png screenshot_new.png
```

**Requirements:** Install [Tesseract OCR](https://github.com/tesseract-ocr/tesseract) for your platform.

---

## Risk Scoring

Changes are scored 0-10 based on potential impact:

| Score | Level | Examples |
|-------|-------|----------|
| 8-10 | **BREAKING** | Removed endpoints, deleted dependencies, schema type changes |
| 5-7 | **RISKY** | Major version bumps, config value changes, deprecated features |
| 0-4 | **SAFE** | Added optional fields, new dependencies, comment changes |

### Common Risk Patterns

| Change Pattern | Score | Why |
|----------------|-------|-----|
| API endpoint removed | 10 | Clients will fail |
| Required field added | 9 | Existing data won't validate |
| Database host changed | 9 | Connection failures |
| Dependency removed | 9 | Runtime errors |
| Major version bump | 7 | Breaking API changes |
| Port number changed | 6 | Connection issues |
| New optional field | 2 | Backward compatible |
| Comment updated | 1 | No runtime impact |

---

## Exit Codes

Perfect for CI/CD pipelines:

| Code | Meaning | Action |
|------|---------|--------|
| 0 | All changes safe | Deploy freely |
| 1 | Risky changes found | Review recommended |
| 2 | Breaking changes found | Block deployment |
| 3 | Error occurred | Check logs |

```bash
# In CI/CD pipeline
what-changed compare staging.env production.env --breaking-only
if [ $? -eq 2 ]; then
  echo "Breaking changes detected! Blocking deployment."
  exit 1
fi
```

---

## CLI Reference

```
what-changed compare [OPTIONS] SOURCE_A SOURCE_B

Arguments:
  SOURCE_A    First source (file, directory, or URL)
  SOURCE_B    Second source (file, directory, or URL)

Options:
  -s, --summary        Brief summary only
  -b, --breaking-only  Show only breaking changes
  -j, --json           Output as JSON
  -v, --verbose        Show rule matches and scores
  --no-color           Disable colors
  --graph              Interactive TUI graph view
  --stats              Visual statistics dashboard
  --help               Show help message

what-changed git [OPTIONS] REF_SPEC

Arguments:
  REF_SPEC    Git reference (e.g., HEAD~1..HEAD, main..feature)

Options:
  -p, --path PATH      Compare specific file only
  -b, --breaking-only  Show only breaking changes
  --help               Show help message

what-changed detect FILE

  Detect and display the file type.
```

---

## Programmatic Usage

```python
from what_changed import compare

# Simple comparison
result = compare("old.json", "new.json")

print(f"Breaking changes: {result.breaking_count}")
print(f"Risky changes: {result.risky_count}")
print(f"Safe changes: {result.safe_count}")

for change in result.breaking_changes:
    print(f"  - {change.path}: {change.summary}")
```

### Advanced Usage

```python
from what_changed.normalize import create_default_registry
from what_changed.diff import SemanticDiff
from what_changed.graph import GraphBuilder
from what_changed.rules import create_default_engine
from what_changed.explain import Explainer

# Full pipeline
registry = create_default_registry()
obj_a = registry.normalize("config.old.json")
obj_b = registry.normalize("config.new.json")

differ = SemanticDiff()
diff_result = differ.diff(obj_a, obj_b)

builder = GraphBuilder()
graph = builder.build(diff_result)

engine = create_default_engine()
rule_results = engine.apply(graph)

explainer = Explainer()
for exp in explainer.explain_graph(graph, rule_results):
    print(f"{exp.risk_level.name}: {exp.change_summary}")
```

---

## Edge Case Handling

what-changed handles edge cases gracefully:

| Scenario | Behavior |
|----------|----------|
| Password-protected PDF | Clear error message |
| Malformed JSON/YAML | Parse error with line number |
| Empty files | Shows structure changes |
| Binary files | Falls back to size comparison |
| Missing files | Clear "not found" error |
| Identical files | "No changes detected" |
| 100+ page PDFs | Paginated output (20 items max) |
| Unicode/Emoji content | Full UTF-8 support |
| Files without extension | Attempts content-based detection |

---

## Architecture

```
what-changed/
├── cli/              # Command-line interface (Typer)
├── detect/           # File type detection
├── normalize/        # Format-specific parsers
│   ├── config.py     # JSON, YAML, TOML, INI, ENV
│   ├── pdf.py        # PDF with OCR support
│   ├── openapi.py    # OpenAPI/Swagger
│   ├── data.py       # CSV, Excel
│   ├── docker.py     # Dockerfile, Compose
│   ├── dependencies.py # package.json, requirements.txt
│   ├── image.py      # Images with OCR
│   ├── sql.py        # SQL schemas
│   └── archive.py    # ZIP, TAR
├── diff/             # Semantic diff engine
├── graph/            # Change graph construction
├── rules/            # Risk scoring heuristics
├── explain/          # Human-readable explanations
├── output/           # Renderers (terminal, JSON)
│   └── formats/      # Format-specific renderers
├── tui/              # Interactive terminal UI
└── git/              # Git integration
```

---

## Philosophy

- **Clarity over cleverness** — Readable, maintainable code
- **Extensible by design** — Easy to add formats and rules
- **Deterministic results** — No ML/AI, fully reproducible
- **Offline-first** — No external API calls
- **Cross-platform** — Windows, macOS, Linux

---

## Contributing

```bash
# Clone and install
git clone https://github.com/aayushadhikari7/what-changed
cd what-changed
pip install -e ".[dev,all]"

# Run tests
pytest

# Type checking
mypy what_changed

# Linting
ruff check what_changed
```

---

## License

MIT License — see [LICENSE](LICENSE) for details.

---

<p align="center">
  <sub>Built with care for developers who hate surprises in production.</sub>
  <br>
  <sub>Demo recorded with <a href="https://github.com/aayushadhikari7/termgif">termgif</a></sub>
</p>
