Metadata-Version: 2.4
Name: emdash-ai
Version: 0.1.1
Summary: Graph-based coding intelligence system - The 'Senior Engineer' Context Engine
Author: Em Dash Team
Requires-Python: >=3.10,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Provides-Extra: local
Requires-Dist: astroid (>=3.0.1,<4.0.0)
Requires-Dist: click (>=8.1.7,<9.0.0)
Requires-Dist: gitpython (>=3.1.40,<4.0.0)
Requires-Dist: httpx (>=0.25.0)
Requires-Dist: kuzu (>=0.4.0)
Requires-Dist: loguru (>=0.7.2,<0.8.0)
Requires-Dist: networkx (>=3.2.1,<4.0.0)
Requires-Dist: numpy (>=1.26.0)
Requires-Dist: openai (>=1.0.0)
Requires-Dist: pillow (>=10.0.0,<11.0.0)
Requires-Dist: prompt_toolkit (>=3.0.43,<4.0.0)
Requires-Dist: pydantic (>=2.5.0,<3.0.0)
Requires-Dist: pygithub (>=2.1.1,<3.0.0)
Requires-Dist: python-dotenv (>=1.0.0,<2.0.0)
Requires-Dist: python-louvain (>=0.16,<0.17)
Requires-Dist: rich (>=13.7.0)
Requires-Dist: scipy (>=1.11.4,<2.0.0)
Requires-Dist: sentence-transformers (>=2.2.0)
Requires-Dist: supabase (>=2.0.0)
Requires-Dist: textual (>=0.47.0)
Requires-Dist: tqdm (>=4.66.1,<5.0.0)
Description-Content-Type: text/markdown

# EmDash - The "Senior Engineer" Context Engine

Transform your codebase into a living knowledge graph that combines static analysis (AST), social dynamics (Git history), and graph analytics to provide "senior engineer" level insights.

> **Why "EmDash"?** The em dash (—) has become a telltale signature of AI-generated text—appearing frequently in LLM outputs as a stylistic connector. We embraced this quirk as our name: EmDash is an AI-native tool, built for the era of AI-assisted development, where humans and LLMs collaborate on code.

## What is EmDash?

EmDash moves beyond flat RAG (Retrieval Augmented Generation) by building a multi-layered knowledge graph that captures:

- **Layer A: Structural Skeleton** - AST analysis of code structure (classes, functions, calls, inheritance)
- **Layer B: Social & Historical Fabric** - Git history showing who changed what and why
- **Layer C: Analytical Overlay** - Graph metrics (PageRank, Betweenness, Clustering) for impact analysis

## Installation

### Install globally

Install directly from PyPI without cloning:

```bash
pip install emdash-ai
```

Then navigate to any repository and get started:

```bash
cd your-project

# Start onboarding (recommended for first-time setup)
emdash onboard

# Or jump straight into the coding agent
em
```

To update to the latest version:

```bash
pip install --upgrade emdash-ai
```

---

*Only continue below if you want to contribute or modify EmDash itself.*

### Install from source

```bash
git clone <repo_url>
cd emdash

python3 -m venv venv
source venv/bin/activate

pip install -r requirements.txt
pip install -e .
npm install

cp .env.example .env
# Edit .env for your some credentials
```

**Prerequisites:** Python 3.10+, Node.js 14+, Git

**Optional:** `ANTHROPIC_API_KEY` or `OPENAI_API_KEY` for LLM features

## Configuration

EmDash works out of the box for basic indexing and querying. For LLM-powered features (agent chat, PROJECT.md generation, specs), you'll need an API key.

### Quick Setup

```bash
# Create a config file with documentation
emdash config init --global

# Edit the file to add your API key
nano ~/.config/emdash/config

# Verify your configuration
emdash config show
```

### Configuration Options

EmDash loads configuration from two locations (in order):
1. `~/.config/emdash/config` - User-level defaults
2. `.env` in the current directory - Project-level overrides

**LLM API Keys** (at least one required for agent features):
- `ANTHROPIC_API_KEY` - Anthropic Claude models
- `OPENAI_API_KEY` - OpenAI GPT models
- `FIREWORKS_API_KEY` - Fireworks AI (default, most affordable)

**GitHub Integration** (optional, for PR analysis):
- `GITHUB_TOKEN` - Or run `emdash auth login` for browser-based auth

## Quick Start

```bash
source venv/bin/activate

# Start onboarding (recommended for first-time setup)
emdash onboard
```

The `onboard` command walks you through the complete setup:
1. **Index the repository** - Parses AST and Git history into the knowledge graph
2. **GitHub authentication** - Sets up PAT for PR fetching (optional)
3. **Generate PROJECT.md** - Creates comprehensive documentation for LLM context

After onboarding, start exploring with:
```bash
em                              # Interactive code exploration
```

### Manual Setup (Alternative)

```bash
emdash index https://github.com/user/repo  # Index from GitHub
emdash index /path/to/local/repo           # Index local repo
emdash analyze pagerank --top 20           # Compute analytics
emdash db stats                            # View statistics
```

## CLI Reference

| Command | Description |
|---------|-------------|
| `emdash onboard` | Full setup: index + auth + PROJECT.md |
| `emdash index <repo>` | Index a repository (Python, TS, JS) |
| `emdash analyze <type>` | Run analytics (pagerank, betweenness, community, areas) |
| `emdash query <type>` | Query the graph (find-class, find-function, knowledge-silos) |
| `emdash agent chat` | Interactive LLM chat with graph tools |
| `emdash projectmd --save` | Generate PROJECT.md documentation |
| `emdash spec "feature" --save` | Generate feature specification |
| `emdash implement` | Generate implementation plan from spec |
| `emdash team focus` | LLM summary of team's recent work |
| `emdash review <pr>` | Generate PR review |
| `emdash db stats/clear/test` | Database management |
| `emdash config show` | Show current configuration status |
| `emdash config init` | Create a configuration file with docs |
| `emdash auth login` | Authenticate with GitHub (browser-based) |

## Chat Commands

**In-chat:**
- `reset` - Clear conversation history
- `session` - Show session statistics
- `exit` / `quit` / `q` - Exit

**Slash commands** (type `/` to see all):

| Command | Description |
|---------|-------------|
| `/mcp` | Manage MCP servers - list, add, remove, or describe available servers |
| `/rules` | View custom rules loaded from `.emdash/rules/*.md` files |
| `/add-rule` | Add a new rule based on conversation context |
| `/create-agent` | Create a new custom agent interactively |
| `/agents` | Switch to a custom agent or list available agents |
| `/help` | Show all available commands |

**Custom Rules:** Create markdown files in `.emdash/rules/` to add instructions to the LLM system prompt.

## Features

### Graph Analytics
- **PageRank** - Most called/referenced functions
- **Betweenness Centrality** - Bridge entities connecting codebase parts
- **Community Detection** - Natural clusters and modules
- **Area Importance** - Hot spots with recent activity

### Senior Engineer Queries
- **Knowledge Silos** - Critical code with single owner
- **Domain Experts** - Who knows each module best
- **Change Impact** - What will be affected by changes
- **Dead Code** - Functions never called

### LLM-Powered Tools
- Interactive chat with 15+ graph exploration tools
- Auto-generated PROJECT.md documentation
- Feature specs from natural language descriptions
- Implementation plans with task breakdowns
- Team focus summaries

## Context Frame & Re-ranker

EmDash maintains a **Context Frame** - a dynamic window of relevant code entities that follows your exploration session.

```
┌─────────────────────────────────────────────────────────────────────┐
│                      CONTEXT FRAME PIPELINE                         │
└─────────────────────────────────────────────────────────────────────┘

  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
  │   Git Diff   │     │    Agent     │     │   Session    │
  │  (modified)  │     │ Exploration  │     │    State     │
  └──────┬───────┘     └──────┬───────┘     └──────┬───────┘
         │                    │                    │
         ▼                    ▼                    ▼
  ┌──────────────────────────────────────────────────────────┐
  │                   CONTEXT PROVIDERS                       │
  │  ┌─────────────────────┐  ┌─────────────────────────┐    │
  │  │ TouchedAreasProvider│  │ ExploredAreasProvider   │    │
  │  │  • Modified files   │  │  • Tool call results    │    │
  │  │  • AST neighbors    │  │  • Search hits          │    │
  │  │  • 2-hop graph walk │  │  • Relevance scores     │    │
  │  └─────────────────────┘  └─────────────────────────┘    │
  └──────────────────────────┬───────────────────────────────┘
                             │
                             ▼
  ┌──────────────────────────────────────────────────────────┐
  │                      RE-RANKER                            │
  │                                                          │
  │   Cross-encoder scores items against current query       │
  │   Keeps top 20 most relevant → LLM system prompt         │
  │                                                          │
  │   Model: mixedbread-ai/mxbai-rerank-xsmall-v1            │
  └──────────────────────────────────────────────────────────┘
```

**TouchedAreasProvider** - Tracks modified code via `git diff`, queries AST graph for related entities

**ExploredAreasProvider** - Records entities from agent tool calls with relevance scores

**Re-ranker** - Filters context to query-relevant items, reducing tokens while improving quality

## Development

```bash
pytest                    # Run tests
black emdash/             # Format
mypy emdash/              # Type check
ruff check emdash/        # Lint
```

## License

MIT

