Metadata-Version: 2.4
Name: hld-generator
Version: 0.3.0
Summary: Language-agnostic High-Level Design generator powered by Tree-sitter + LLM
Author: Harsh
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: tree-sitter>=0.22.0
Requires-Dist: tree-sitter-python>=0.23.0
Requires-Dist: tree-sitter-javascript>=0.23.0
Requires-Dist: tree-sitter-java>=0.23.0
Requires-Dist: tree-sitter-go>=0.23.0
Requires-Dist: tree-sitter-rust>=0.23.0
Requires-Dist: tree-sitter-typescript>=0.23.0
Requires-Dist: tree-sitter-c>=0.23.0
Requires-Dist: tree-sitter-cpp>=0.23.0
Requires-Dist: tree-sitter-ruby>=0.23.0
Requires-Dist: rich>=13.7.0
Requires-Dist: networkx>=3.2
Provides-Extra: llm
Requires-Dist: anthropic>=0.39.0; extra == "llm"
Requires-Dist: openai>=1.50.0; extra == "llm"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Provides-Extra: all
Requires-Dist: hld-generator[llm]; extra == "all"

# HLD Generator

**Language-agnostic High-Level Design document generator** powered by Tree-sitter + LLM.

Point it at any codebase — Python, JavaScript, TypeScript, Java, Go, Rust, C/C++, Ruby, C#, Kotlin, Swift, and more — and get a complete HLD with architecture diagrams, component breakdowns, and dependency analysis.

## How It Works

```
Codebase → Scanner → Tree-sitter AST Parser → Code Graph → LLM Analysis → HLD Document
                          ↓ (fallback)
                     Regex Parser
```

1. **Scan** — Discovers source files, filters out vendor/generated code
2. **Parse** — Extracts classes, functions, imports, endpoints using Tree-sitter (with regex fallback). Parsing runs in parallel using a thread pool
3. **Graph** — Builds a dependency graph with NetworkX, identifies entry points and hub modules
4. **Analyse** — Sends structured context to an LLM (Claude or GPT) for semantic understanding, or runs static fallback analysis with `--provider none`
5. **Render** — Generates Markdown report, interactive HTML viewer, and/or structured JSON with a React-based viewer

## Installation

```bash
# Install from PyPI (coming soon) or directly from GitHub
pip install git+https://github.com/harsh-vishnoi/hld-generator.git

# With LLM support (Anthropic Claude / OpenAI GPT)
pip install "hld-generator[llm] @ git+https://github.com/harsh-vishnoi/hld-generator.git"

# For development
git clone https://github.com/harsh-vishnoi/hld-generator.git
cd hld-generator
pip install -e ".[dev,llm]"
```

**Requirements:** Python 3.9+

> **`hld` not found after install?** pip may install the script to a directory not on your PATH (e.g. `~/.local/bin` or `~/Library/Python/3.x/bin`). Either add that directory to your PATH:
> ```bash
> # macOS
> echo 'export PATH="$PATH:$HOME/Library/Python/3.9/bin"' >> ~/.zshrc && source ~/.zshrc
>
> # Linux
> echo 'export PATH="$PATH:$HOME/.local/bin"' >> ~/.bashrc && source ~/.bashrc
> ```
> Or run via Python directly:
> ```bash
> python -m hld_generator ./my-project --provider none
> ```

**Updating to latest version:**
```bash
pip install --upgrade git+https://github.com/harsh-vishnoi/hld-generator.git
```

## Quick Start

```bash
# With Anthropic Claude (recommended)
export ANTHROPIC_API_KEY=sk-ant-...
hld ./my-project

# With OpenAI
export OPENAI_API_KEY=sk-...
hld ./my-project --provider openai

# Without LLM (graph-only static analysis, no API key needed)
hld ./my-project --provider none

# Generate interactive HTML viewer
hld ./my-project --html

# Generate JSON data + React viewer
hld ./my-project --json

# Generate all output formats
hld ./my-project --html --json

# Scan a single file
hld ./main.py --provider none

# Custom output directory
hld ./my-project -o ./docs/architecture

# Include test files + verbose logging
hld ./my-project --include-tests -v

# Disable plugin auto-loading
hld ./my-project --no-plugins
```

## Output

By default, the tool generates three files in `./hld_output/`:

| File | Description |
|------|-------------|
| `hld_report.md` | Complete HLD document with overview, components, data flow, tech stack, and architecture diagram |
| `architecture.mmd` | Standalone Mermaid diagram file (renderable in GitHub, VS Code, etc.) |
| `graph_summary.md` | Raw code graph statistics — modules, packages, entry points, hub modules, API endpoints |

### Interactive Viewers

#### HTML Viewer (`--html`)

Generates a **single self-contained HTML file** with an interactive dashboard:

```bash
hld ./my-project --html
open ./hld_output/hld_viewer.html
```

- Zero dependencies — works offline, just open in a browser
- Tabbed interface: Report view + Architecture diagram
- Mermaid diagrams with pan/zoom
- Dark mode toggle
- Collapsible sections and search

#### React Viewer (`--json`)

Generates structured JSON data **plus a full React-based interactive viewer**:

```bash
hld ./my-project --json
open ./hld_output/viewer.html
```

| Output File | Description |
|-------------|-------------|
| `hld_data.json` | Raw JSON data (for programmatic use, CI/CD pipelines, external tools) |
| `viewer.html` | Interactive React viewer with data embedded |
| `assets/` | JS/CSS bundles for the viewer |

The React viewer includes:
- **Mind Map** — expandable tree with click-to-expand nodes, side panel details, and keyboard navigation
- **File View** — Mermaid architecture diagram with pan/zoom
- **Package View** — Package-level dependency diagram
- Full keyboard accessibility (arrow keys, Enter/Space, Tab focus trap)
- Dark mode support
- Search with keyboard shortcut (`/`)

> **Note:** The React viewer is bundled with the pip package — no Node.js or npm required.

## CLI Reference

```
hld <target> [options]
```

| Flag | Default | Description |
|------|---------|-------------|
| `target` | *(required)* | Path to a source file or directory to analyse |
| `-o`, `--output` | `./hld_output` | Output directory |
| `--provider` | `anthropic` | LLM provider: `anthropic`, `openai`, `none`, or a plugin-registered name |
| `--model` | auto | LLM model name (defaults: `claude-sonnet-4-20250514` / `gpt-4o`) |
| `--format` | `markdown` | Output format: built-in `markdown` or a plugin-registered renderer name |
| `--html` | off | Generate an interactive HTML viewer (`hld_viewer.html`) |
| `--json` | off | Generate structured JSON (`hld_data.json`) + React viewer (`viewer.html`) |
| `--include-tests` | off | Include test files in the analysis |
| `--max-files` | `500` | Maximum number of files to scan |
| `--max-file-size` | `512000` | Maximum file size in bytes |
| `--plugins-dir` | none | Directory containing plugin `.py` files to load |
| `--no-plugins` | off | Disable all plugin loading (explicit and auto-discovery) |
| `-v`, `--verbose` | off | Enable debug logging |
| `--version` | — | Show version |

**Exit codes:** `0` = success, `1` = no files found, `2` = degraded (LLM failed but fallback output was produced).

**API keys:** Set via environment variables `ANTHROPIC_API_KEY` or `OPENAI_API_KEY`. The `--api-key` flag is deprecated (visible in process list).

## Supported Languages

| Language | Tree-sitter | Regex Fallback | Endpoint Detection |
|----------|:-----------:|:--------------:|:-----------------:|
| Python | yes | yes | Flask, FastAPI, Django |
| JavaScript | yes | yes | Express |
| TypeScript | yes | yes | Express |
| Java | yes | yes | Spring Boot |
| Go | yes | yes | net/http, Gin, Chi |
| Rust | yes | yes | — |
| C | yes | yes | — |
| C++ | yes | yes | — |
| Ruby | yes | yes | — |
| C# | — | yes | — |
| Kotlin | — | yes | — |
| Swift | — | yes | — |
| PHP | — | yes | — |
| Scala | — | yes | — |

Languages without Tree-sitter grammars automatically use the regex fallback parser. Even completely unknown languages get basic extraction via generic patterns.

## Testing

```bash
# Run the full test suite (170 tests)
python -m pytest tests/ -q

# Run with verbose output
python -m pytest tests/ -v

# Lint check
python -m ruff check hld_generator/
```

You can also run a full end-to-end validation:

```bash
# Static analysis (no API key needed)
hld ./my-project --provider none --json -v

# Verify output files were created
ls -la ./hld_output/

# Open the interactive viewer
open ./hld_output/viewer.html
```

### Test Coverage

| Suite | Tests | Scope |
|-------|------:|-------|
| `test_quick.py` | 7 | Language map, config defaults, regex parser (Python/JS/Go/Java), file scanner |
| `test_comprehensive_core.py` | — | Config, scanner, regex parser, tree-sitter parser, graph builder, LLM fallback, renderer |
| `test_comprehensive_fixes.py` | — | Audit fixes, plugin system, entry point detection, integration, edge cases |
| `test_json_renderer.py` | 16 | JSON renderer output validation, edge cases, Unicode, graph serialization |
| **Total** | **170** | Full-stack coverage: parsing, graph, analysis, rendering, plugins |

## Architecture

```
hld_generator/
├── __init__.py            # Package version
├── __main__.py            # python -m hld_generator entry point
├── cli.py                 # CLI entry point & pipeline orchestrator
├── config.py              # Configuration, language maps, constants
├── scanner.py             # File discovery & filtering
├── parsers/
│   ├── base.py            # Data structures (ParsedFile, ParsedEntity, ImportInfo)
│   ├── manager.py         # Parser facade (auto-selects tree-sitter or regex, parallel parse_all)
│   ├── tree_sitter_parser.py   # Tree-sitter AST parser
│   └── regex_parser.py    # Regex fallback parser
├── graph.py               # NetworkX dependency graph builder
├── llm.py                 # LLM client (Anthropic + OpenAI + plugin dispatch)
├── fallback.py            # Static fallback analyser (no LLM needed)
├── renderer.py            # Markdown + Mermaid output renderer
├── html_renderer.py       # Self-contained interactive HTML viewer
├── json_renderer.py       # JSON data + React viewer output
├── plugins.py             # Plugin registry & hook system
├── _networkx_shim/        # Lightweight NetworkX fallback for offline use
│   └── __init__.py
└── viewer/                # Bundled React frontend (built from frontend/)
    ├── index.html
    ├── favicon.svg
    └── assets/            # JS/CSS bundles
```

### Frontend Development

The React viewer source lives in `frontend/` and is built with Vite + React + TypeScript + Tailwind CSS. To develop the frontend:

```bash
cd frontend
npm install
npm run dev          # Start dev server at http://localhost:5173
npm run build        # Build for production → dist/
```

After building, copy the output to the Python package:

```bash
rm -rf hld_generator/viewer/assets
cp -R frontend/dist/* hld_generator/viewer/
```

## Plugin System

HLD Generator supports plugins for custom parsers, LLM providers, renderers, and pipeline hooks.

### Loading Plugins

Plugins are loaded from:
1. `--plugins-dir <path>` — explicitly specified directory
2. `.hld_plugins/` — auto-discovered next to the target directory

Each `.py` file in the plugin directory is loaded automatically (files starting with `_` are skipped).

Use `--no-plugins` to disable all plugin loading.

### Plugin Types

**Custom parser** — add support for a new language:
```python
from hld_generator.plugins import registry

@registry.register_parser("swift")
class SwiftParser:
    def parse_file(self, file_path, language):
        # Return a ParsedFile
        ...
```

**Custom LLM provider** — use a different LLM backend:
```python
@registry.register_llm_provider("ollama")
class OllamaProvider:
    def call(self, context: str, system_prompt: str) -> str:
        # Return raw LLM response text
        ...
```

Then use it: `hld ./project --provider ollama`

**Custom renderer** — output in a different format:
```python
@registry.register_renderer("html")
class HTMLRenderer:
    def render(self, analysis, code_graph, output_dir) -> list[Path]:
        # Write files and return their paths
        ...
```

Then use it: `hld ./project --format html`

**Pipeline hooks** — modify data between pipeline stages:
```python
@registry.register_hook("post_parse")
def enrich(parsed_files):
    # Modify and return parsed_files
    return parsed_files
```

### Available Hook Points

| Hook | Receives | Returns |
|------|----------|---------|
| `pre_scan` | `config` | `config` |
| `post_scan` | `scanned_files` | `scanned_files` |
| `pre_parse` | `scanned_files` | `scanned_files` |
| `post_parse` | `parsed_files` | `parsed_files` |
| `pre_graph` | `parsed_files` | `parsed_files` |
| `post_graph` | `code_graph` | `code_graph` |
| `pre_llm` | `code_graph` | `code_graph` |
| `post_llm` | `analysis` | `analysis` |
| `pre_render` | `analysis, code_graph` | `(analysis, code_graph)` |
| `post_render` | `output_files` | `output_files` |

### Other Extension Points

**Custom endpoint patterns:**
```python
registry.register_endpoint_pattern(
    r'@MyFramework\.route\("([^"]+)"\)',
    name="MyFramework"
)
```

**Custom language file extensions:**
```python
registry.register_language(".hx", "haxe")
```

## Extending (Without Plugins)

**Add a new language:**
1. Add extension mapping in `config.py` → `LANGUAGE_MAP`
2. Add regex patterns in `parsers/regex_parser.py` → `_PATTERNS`
3. (Optional) Add tree-sitter grammar to `config.py` → `TREE_SITTER_GRAMMARS` and queries to `tree_sitter_parser.py` → `_QUERIES`

**Add framework endpoint detection:**
Add patterns to `parsers/regex_parser.py` → `_ENDPOINT_PATTERNS`

## License

MIT
