Metadata-Version: 2.4
Name: databridge-core
Version: 1.0.0
Summary: Upload your Chart of Accounts. Get a production-ready financial hierarchy and dbt models. Zero config.
Project-URL: Homepage, https://github.com/datanexum/databridge-core
Project-URL: Documentation, https://github.com/datanexum/databridge-core#readme
Project-URL: Repository, https://github.com/datanexum/databridge-core
Project-URL: Issues, https://github.com/datanexum/databridge-core/issues
Author-email: DataBridge AI <hello@databridgeai.com>
License-Expression: MIT
License-File: LICENSE
Keywords: csv,data,diff,etl,finance,fuzzy-match,profiling,reconciliation
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.10
Requires-Dist: click>=8.0
Requires-Dist: pandas>=1.5
Requires-Dist: pydantic>=2.0
Requires-Dist: rich>=13.0
Provides-Extra: all
Requires-Dist: pillow>=9.0; extra == 'all'
Requires-Dist: pypdf>=3.0; extra == 'all'
Requires-Dist: pytesseract>=0.3; extra == 'all'
Requires-Dist: rapidfuzz>=3.0; extra == 'all'
Requires-Dist: sqlalchemy>=2.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1; extra == 'dev'
Provides-Extra: fuzzy
Requires-Dist: rapidfuzz>=3.0; extra == 'fuzzy'
Provides-Extra: ocr
Requires-Dist: pillow>=9.0; extra == 'ocr'
Requires-Dist: pytesseract>=0.3; extra == 'ocr'
Provides-Extra: pdf
Requires-Dist: pypdf>=3.0; extra == 'pdf'
Provides-Extra: sql
Requires-Dist: sqlalchemy>=2.0; extra == 'sql'
Description-Content-Type: text/markdown

# DataBridge Core

**Your finance team just spent 4 hours on VLOOKUP. This takes 5 seconds.**

DataBridge Core is a Python toolkit for data reconciliation, profiling, and ingestion. Compare CSV files, find fuzzy matches, detect schema drift, and clean messy data -- from the command line or Python.

```bash
pip install databridge-core
```

## 5-Second Demo

```bash
# Profile a file
databridge profile sales.csv

# Compare two sources -- find orphans, conflicts, match rate
databridge compare source.csv target.csv --keys id

# Fuzzy match names across systems
databridge fuzzy erp_accounts.csv gl_accounts.csv --column name --threshold 80
```

## Python API

```python
from databridge_core import compare_hashes, profile_data, load_csv

# Profile your data
profile = profile_data("chart_of_accounts.csv")
print(f"{profile['rows']} rows, {profile['columns']} columns")
print(f"Potential keys: {profile['potential_key_columns']}")

# Compare two sources
result = compare_hashes("source.csv", "target.csv", key_columns="account_id")
stats = result["statistics"]
print(f"Match rate: {stats['match_rate_percent']}%")
print(f"Conflicts: {stats['conflicts']}, Orphans: {stats['total_orphans']}")
```

## Commands

| Command | Description |
|---------|-------------|
| `databridge profile <file>` | Profile data: structure, quality, cardinality |
| `databridge compare <a> <b> --keys <col>` | Hash comparison: orphans, conflicts, match rate |
| `databridge fuzzy <a> <b> -c <col>` | Fuzzy match columns across two files |
| `databridge diff <a> <b>` | Text diff between two files |
| `databridge drift <old> <new>` | Detect schema drift between CSVs |
| `databridge transform <file> -c <col> --op upper` | Clean a column (upper/lower/strip/trim/remove_special) |
| `databridge merge <a> <b> --keys <col>` | Merge two CSVs on key columns |
| `databridge find "*.csv"` | Find files matching a pattern |
| `databridge parse <text>` | Parse tabular data from messy text |

## Optional Extras

```bash
pip install 'databridge-core[fuzzy]'   # Fuzzy matching (rapidfuzz)
pip install 'databridge-core[pdf]'     # PDF text extraction (pypdf)
pip install 'databridge-core[ocr]'     # OCR image extraction (pytesseract)
pip install 'databridge-core[sql]'     # Database queries (sqlalchemy)
pip install 'databridge-core[all]'     # Everything
pip install 'databridge-core[dev]'     # Development tools (pytest, ruff, build)
```

## Built for Finance

DataBridge Core is the open-source foundation of [DataBridge AI](https://github.com/datanexum/databridge-ai) -- a full platform for financial hierarchy management, dbt model generation, and enterprise data reconciliation.

**How it works:** Upload your Chart of Accounts. Get a production-ready financial hierarchy and dbt models. Zero config.

## License

MIT
