Metadata-Version: 2.4
Name: databridge-discovery
Version: 0.43.0
Summary: DataBridge AI Model Discovery Engine - Automated SQL parsing, CASE extraction, and hierarchy generation
Author: DataBridge AI Team
License: MIT
Requires-Python: >=3.10
Requires-Dist: databridge-core
Requires-Dist: databridge-models
Requires-Dist: networkx>=3.0
Requires-Dist: pandas>=2.0
Requires-Dist: pydantic-settings>=2.0
Requires-Dist: pydantic>=2.0
Requires-Dist: rapidfuzz>=3.0
Requires-Dist: sqlglot>=20.0
Provides-Extra: all
Requires-Dist: chromadb>=0.4; extra == 'all'
Requires-Dist: fastmcp>=0.1; extra == 'all'
Requires-Dist: pytest-asyncio>=0.21; extra == 'all'
Requires-Dist: pytest-cov>=4.0; extra == 'all'
Requires-Dist: pytest>=7.0; extra == 'all'
Requires-Dist: sentence-transformers>=2.2; extra == 'all'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Provides-Extra: embeddings
Requires-Dist: chromadb>=0.4; extra == 'embeddings'
Requires-Dist: sentence-transformers>=2.2; extra == 'embeddings'
Provides-Extra: mcp
Requires-Dist: fastmcp>=0.1; extra == 'mcp'
Description-Content-Type: text/markdown

# DataBridge Discovery Engine

Automated SQL parsing, CASE statement extraction, and hierarchy generation for data warehouse modeling.

## Features

- **SQL Parsing**: Multi-dialect SQL parsing using sqlglot (Snowflake, PostgreSQL, T-SQL, MySQL, BigQuery)
- **CASE Extraction**: Automatic extraction of CASE WHEN statements with hierarchy detection
- **Semantic Graph**: Graph-based semantic modeling with NetworkX
- **Entity Detection**: Detects 12 standard entity types (account, cost_center, department, etc.)
- **Librarian Integration**: Direct export to Librarian hierarchy project format

## Installation

```bash
# Basic installation
pip install databridge-discovery

# With embeddings support
pip install databridge-discovery[embeddings]

# With MCP tools
pip install databridge-discovery[mcp]

# Full installation
pip install databridge-discovery[all]
```

## Quick Start

```python
from databridge_discovery import SQLParser, CaseExtractor, DiscoverySession

# Parse SQL
parser = SQLParser(dialect="snowflake")
ast = parser.parse(sql_query)

# Extract CASE statements
extractor = CaseExtractor()
cases = extractor.extract(ast)

# Start discovery session
session = DiscoverySession()
session.add_sql_source(sql_query)
session.analyze()

# Get proposed hierarchies
hierarchies = session.get_proposed_hierarchies()
```

## MCP Tools

The library provides 50 MCP tools across 7 phases:

### Phase 1: SQL Parser & Session (6 tools)
- `parse_sql` - Parse SQL and return AST
- `extract_case_statements` - Extract CASE WHEN logic
- `analyze_sql_complexity` - Query complexity metrics
- `start_discovery_session` - Initialize discovery session
- `get_discovery_session` - Get session state
- `export_discovery_evidence` - Export evidence

### Phase 2: Semantic Graph (8 tools)
- `build_semantic_graph` - Build from schema
- `add_graph_relationship` - Add edge
- `find_join_paths` - Find join candidates
- And more...

### Phase 3-7: See full documentation

## Entity Types

The discovery engine detects 12 standard entity types:

| Entity | Description |
|--------|-------------|
| account | GL accounts, chart of accounts |
| cost_center | Cost centers, profit centers |
| department | Organizational departments |
| entity | Legal entities, companies |
| project | Projects, work orders |
| product | Products, SKUs |
| customer | Customers, clients |
| vendor | Vendors, suppliers |
| employee | Employees, workers |
| location | Geographic locations |
| time_period | Time periods, fiscal periods |
| currency | Currencies |

## License

MIT
