Metadata-Version: 2.4
Name: datanaut
Version: 0.1.2
Summary: A CLI SQL agent that converts natural language to optimized BigQuery SQL using GPT-4
Author-email: Your Name <you@example.com>
License: MIT
Project-URL: Homepage, https://github.com/yourname/datanaut
Project-URL: Issues, https://github.com/yourname/datanaut/issues
Keywords: bigquery,sql,ai,gpt4,cli,nlp,data
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: openai>=1.30.0
Requires-Dist: google-cloud-bigquery>=3.11.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: wcwidth>=0.2.0
Requires-Dist: colorama>=0.4.6
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Dynamic: license-file

<div align="center">

```
  ✦ · ✧ · ✦ · ✧ · ✦ · ✧ · ✦ · ✧ · ✦ · ✧ · ✦ · ✧ · ✦ · ✧

  ██████╗  █████╗ ████████╗ █████╗ ███╗   ██╗ █████╗ ██╗   ██╗████████╗
  ██╔══██╗██╔══██╗╚══██╔══╝██╔══██╗████╗  ██║██╔══██╗██║   ██║╚══██╔══╝
  ██║  ██║███████║   ██║   ███████║██╔██╗ ██║███████║██║   ██║   ██║
  ██║  ██║██╔══██║   ██║   ██╔══██║██║╚██╗██║██╔══██║██║   ██║   ██║
  ██████╔╝██║  ██║   ██║   ██║  ██║██║ ╚████║██║  ██║╚██████╔╝   ██║
  ╚═════╝ ╚═╝  ╚═╝   ╚═╝   ╚═╝  ╚═╝╚═╝  ╚═══╝╚═╝  ╚═╝ ╚═════╝    ╚═╝

  🚀  SQL Explorer v1.0  ·  Navigating the BigQuery galaxy...

  ✦ · ✧ · ✦ · ✧ · ✦ · ✧ · ✦ · ✧ · ✦ · ✧ · ✦ · ✧ · ✦ · ✧
```

**Ask questions in plain English. Get optimized BigQuery SQL instantly.**

[![PyPI version](https://badge.fury.io/py/datanaut.svg)](https://pypi.org/project/datanaut/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

</div>

---

## What is Datanaut?

Datanaut is a CLI agent that converts plain English questions into optimized BigQuery SQL using GPT-4. No SQL knowledge required — just ask your question and Datanaut handles schema retrieval, query generation, cost estimation, and execution.

```
  🧑‍🚀  You: how many sessions were completed yesterday?

  🗺️   Retrieving relevant schema...
  🤖   Generating SQL with GPT-4...
       GPT-4: Counts completed healing sessions for the previous day.
       Optimizations: partition pruning, specific column selection
  ⚗️   Running BigQuery dry run...

  ╔══════════════════════════════════════════════════════════╗
  ║  >>  MISSION QUERY                                       ║
  ╠══════════════════════════════════════════════════════════╣
  ║  SELECT COUNT(*) AS completed_sessions                   ║
  ║  FROM `wehealanalysis`.`asia_south1_healing`             ║
  ║       .healing_sessions(                                 ║
  ║         DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY),        ║
  ║         DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)         ║
  ║       )                                                  ║
  ║  WHERE completedTimestamp IS NOT NULL                    ║
  ╚══════════════════════════════════════════════════════════╝

  ✅  Estimated scan: 43.2 MB
  🚀  Launch this query? (y/n):
```

---

## Installation

```bash
pip install datanaut
```

---

## Setup Within Few Minutes

### Step 1 — Authenticate with Google Cloud

```bash
gcloud auth application-default login
```

> Don't have the Google Cloud SDK? Download it from [cloud.google.com/sdk](https://cloud.google.com/sdk)

### Step 2 — Set your OpenAI API key

```bash
datanaut config --api-key sk-...
```

This saves your key globally to `~/.datanaut/config.json` — you only need to do this once. No `.env` file needed in every folder.

> Don't have an OpenAI API key? Get one at [platform.openai.com/api-keys](https://platform.openai.com/api-keys)

### Step 3 — Launch

```bash
datanaut
```

That's it. Start asking questions.

---

## Commands

| Command | Description |
|---|---|
| `datanaut` | Start interactive REPL |
| `datanaut config --api-key sk-...` | Save OpenAI API key globally |
| `datanaut config` | Show current config status |
| `datanaut ask "your question"` | One-shot query without entering REPL |
| `datanaut ask "..." --dry-run` | Generate and validate SQL without executing |
| `datanaut init` | Scaffold `schema.json` in current directory |
| `datanaut test` | Run batch SQL accuracy tests |
| `datanaut --version` | Print version |

---

## Interactive REPL commands

Once inside the REPL, these special commands are available:

| Type | Action |
|---|---|
| `history` | Show previous questions and generated SQL |
| `clear` | Clear conversation memory |
| `exit` | Quit Datanaut |

---

## Example questions you can ask

**Sessions**
```
how many sessions were completed yesterday?
show me total session duration by listener this week
which session types are most popular this month?
what is the average session duration for chat vs call?
```

**Payments & Revenue**
```
total revenue today
show payments by medium for last 7 days
which users made their first payment this week?
what is revenue by UTM campaign this month?
```

**Users**
```
how many new users signed up yesterday?
show me DAU for the last 30 days
which users have been active but never paid?
```

**Listeners**
```
which listeners had the most cancellations last week?
show listener online duration for today
who are the top 10 listeners by session count this month?
```

**Marketing**
```
which ad campaign brought the most users last month?
show revenue per user by ad category this week
what is CTR by marketing channel?
```

---

## Features

- **GPT-4 with function calling** — structured, reliable SQL output every time
- **BigQuery dry run** — validates SQL and estimates cost before execution
- **Cost guardrail** — automatically rejects queries that would scan more than 5 GB
- **Self-correcting retry** — passes BigQuery errors back to GPT-4 for auto-fix (up to 3 attempts)
- **Conversation memory** — refine queries across turns ("now filter by last 7 days", "add region breakdown")
- **Result cache** — identical queries skip BigQuery entirely, zero cost
- **Datanaut theme** — space explorer animations, syntax-highlighted SQL, color-coded cost badges

---

## How it works

```
Your question
     │
     ▼
Schema retrieval ──── Finds the most relevant table functions from schema.json
     │
     ▼
GPT-4 generation ──── Injects schema + BigQuery rules + your question
     │
     ▼
Dry run ────────────── Validates SQL, estimates bytes scanned, checks cost limit
     │
     ▼
Your confirmation ──── Shows you the SQL and cost before running
     │
     ▼
BigQuery execution ─── Streams results back to your terminal
```

---

## Requirements

| Requirement | Version |
|---|---|
| Python | 3.10+ |
| Google Cloud SDK | Latest |
| GCP access | BigQuery read permissions on `wehealanalysis` |
| OpenAI API key | Any GPT-4 enabled key |

---

## Troubleshooting

**`ModuleNotFoundError: No module named 'datanaut'`**
```bash
pip install --upgrade datanaut
```

**`google.auth.exceptions.DefaultCredentialsError`**
```bash
gcloud auth application-default login
```

**`OPENAI_API_KEY is not set`**
```bash
datanaut config --api-key sk-...
```

**Query exceeds 5 GB scan limit**
Add a tighter date range to your question:
```
instead of: "show all sessions"
ask:        "show sessions from last 7 days"
```

---

## License

MIT © 2024 WeHeal
