Metadata-Version: 2.4
Name: datus-agent
Version: 0.2.0
Summary: AI-powered SQL Agent for data engineering (Compiled Version)
Author-email: Datus Team <harrison.zhao@datus.ai>
Maintainer-email: Datus Team <harrison.zhao@datus.ai>
License: Apache-2.0
Project-URL: Homepage, https://datus.ai/
Project-URL: Documentation, https://github.com/datus-ai/datus-agent#readme
Project-URL: Repository, https://github.com/datus-ai/datus-agent
Project-URL: Bug Tracker, https://github.com/datus-ai/datus-agent/issues
Keywords: sql,ai,agent,database,nlp,natural-language
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: python-dotenv==1.0.0
Requires-Dist: pandas==2.1.4
Requires-Dist: sqlalchemy==2.0.23
Requires-Dist: sqlglot>=26.12.0
Requires-Dist: snowflake-connector-python>=3.6.0
Requires-Dist: pyyaml==6.0.1
Requires-Dist: langsmith>=0.0.77
Requires-Dist: structlog>=23.1.0
Requires-Dist: openai>=1.95.0
Requires-Dist: httpx[socks]==0.27.2
Requires-Dist: tantivy>=0.22.2
Requires-Dist: aiohttp>=3.11.16
Requires-Dist: xlsxwriter>=3.2.2
Requires-Dist: tiktoken>=0.9.0
Requires-Dist: openai-agents==0.2.1
Requires-Dist: pydantic<3.0,>=2.11.7
Requires-Dist: Markdown==3.8
Requires-Dist: lancedb==0.18.0
Requires-Dist: pylance==0.22.0
Requires-Dist: datasets>=3.5.1
Requires-Dist: transformers>=4.51.3
Requires-Dist: sentence-transformers==4.1.0
Requires-Dist: pyarrow<19.0.0
Requires-Dist: rich==14.0.0
Requires-Dist: prompt_toolkit>=3.0.51
Requires-Dist: textual[syntax]==5.1.1
Requires-Dist: anthropic==0.51.0
Requires-Dist: duckdb-engine>=0.17.0
Requires-Dist: duckdb<1.4.0,>=1.3.0
Requires-Dist: snowflake-sqlalchemy>=1.7.3
Requires-Dist: opentelemetry-api>=1.33.1
Requires-Dist: mcp>=1.11.0
Requires-Dist: anyio>=4.9.0
Requires-Dist: torch==2.2.2; sys_platform != "darwin" or platform_machine != "arm64"
Requires-Dist: torch>=2.3.0; sys_platform == "darwin" and platform_machine == "arm64"
Requires-Dist: pymysql>=1.1.1
Requires-Dist: json-repair>=0.47.6
Requires-Dist: fastapi>=0.104.0
Requires-Dist: uvicorn>=0.24.0
Requires-Dist: google-generativeai>=0.8.0
Requires-Dist: pyperclip==1.9.0
Requires-Dist: streamlit>=1.38.0
Dynamic: license-file

## 🎯 Overview

**Datus** is an AI-powered agent that transforms data engineering and metric management into a conversational experience.

![DatusArchitecure](assets/datus_architecture.svg)

With the **Datus Agent** you can:

- **Simplify Data Engineering Development:**
    - Enable data engineers to develop and debug using natural language, reducing entry barriers and increasing productivity.
- **Standardize and Manage Metrics:**
    - Extract and unify metrics consistently, ensuring your BI and AI tools always access accurate and reliable definitions.
- **Self-Improving:**
    - Convert iterative CoT reasoning workflows into structured datasets, enabling SFT and RL for ongoing, automatic improvements in model accuracy and performance.


## ✨ Why Choose Datus Agent?

## 🚀 Key Features

### 💬 **Conversational Data Engineering**

- **Natural Language Workflows** - Use `/` to execute complex task in plain language
- **Intelligent SQL Generation** - `!gen` creates optimized SQL with `!fix` for instant corrections
- **Live Workflow Monitoring** - `!darun_screen` shows real-time execution status
- **Schema Intelligence** - `!sl` provides smart table and column recommendations

### 📈 **Smart Metrics Management**

- **Automated Metric Generation** - `!gen_metrics` extracts business metrics from your queries
- **Semantic Model Creation** - `!gen_semantic_model` builds comprehensive data models
- **Streaming Analytics** - Real-time metric generation with `!gen_metrics_stream` variants
- **Context-Aware Operations** - `!set` manages different workflow contexts

### 🔄 **Self-Improving AI System**

- **Reasoning Mode** - `!reason` provides step-by-step analysis with detailed CoT for complex problems
- **Standard log Output -** Comprehensively record the user’s reasoning process to generate high-value data for subsequent model refinement and evolution


## 💡 Use Cases

Data Pipeline Development

```bash
# Natural language query execution
!reason "create a pipeline that aggregates daily sales by region"

# View recommended tables
!sl
# Schema linking found: sales_data, regions, daily_transactions

# Generate and refine SQL
!gen
# Generated: SELECT region_id, DATE(sale_date) as day, SUM(amount)...

!fix add product category grouping
# Updated SQL with category dimension added

```

Metric Standardization

```bash
# Check existing metrics
@subject

# Generate new metrics from analysis
!gen_metrics_stream
# Streaming metric generation...
# ✓ Monthly Active Users (MAU)
# ✓ Average Order Value (AOV)
# ✓ Customer Lifetime Value (CLV)

# Create semantic model
!gen_semantic_model
# Generated comprehensive data model with relationships

```

Intelligent Debugging

```
# Start debugging session
!dastart "debug ETL memory error"

# Explore context
@context_screen
# Visual display of current tables, schemas, and resources

# Run reasoning analysis
!reason_stream
# Analyzing: Large dataset (10TB) without partitioning detected
# Suggesting: Date-based partitioning, chunked processing

# Apply fix
!fix implement suggested partitioning stratege
```

## Get more

* 🚦 [Quick Start ](Quickstart.md)
* 🤝 [Contribution](Contribute.md)
* 📝 [Release Notes](Release_notes.md)
* 🌱 [Good First Issue](good_first_issue.md)
* 🏗️ [Architecture](Architecture.md)
