Metadata-Version: 2.4
Name: QuerySUTRA
Version: 0.1.4
Summary: SUTRA: Structured-Unstructured-Text-Retrieval-Architecture | Natural Language to SQL with MySQL/PostgreSQL export
Home-page: https://github.com/adityabatta/QuerySUTRA
Author: Aditya Batta
Author-email: Aditya Batta <b05aditya@gmail.com>
Project-URL: Homepage, https://github.com/adityabatta/QuerySUTRA
Project-URL: Repository, https://github.com/adityabatta/QuerySUTRA
Project-URL: Bug Reports, https://github.com/adityabatta/QuerySUTRA/issues
Keywords: natural-language,sql,database,query,nlp,ai,openai,mysql,postgresql,pdf,docx,text-retrieval
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: openai>=1.12.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: plotly>=5.0.0
Requires-Dist: matplotlib>=3.7.0
Requires-Dist: PyPDF2>=3.0.0
Requires-Dist: python-docx>=1.0.0
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: sqlalchemy>=2.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Provides-Extra: mysql
Requires-Dist: mysql-connector-python>=8.0.0; extra == "mysql"
Provides-Extra: postgres
Requires-Dist: psycopg2-binary>=2.9.0; extra == "postgres"
Provides-Extra: all
Requires-Dist: mysql-connector-python>=8.0.0; extra == "all"
Requires-Dist: psycopg2-binary>=2.9.0; extra == "all"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# 🚀 QuerySUTRA v0.1.3

## **Structured-Unstructured-Text-Retrieval-Architecture**

**Natural Language to SQL with Cloud Export | PDF, DOCX, TXT Support**

A comprehensive Python library that converts natural language questions into SQL queries, with support for multiple file formats and cloud database export.

---

## ✨ Key Features

✅ **Natural Language to SQL** - Ask questions in plain English  
✅ **Multiple Formats** - CSV, Excel, JSON, SQL, **PDF, DOCX, TXT**, DataFrame  
✅ **Cloud Export** - MySQL, PostgreSQL (local & cloud)  
✅ **Direct SQL** - No API cost option  
✅ **Auto Visualization** - Plotly/Matplotlib charts  
✅ **Interactive Mode** - Ask user for visualization choice  
✅ **Complete Backup** - Export to SQLite, JSON, Excel  
✅ **Jupyter Ready** - Perfect for notebooks  

---

## 📦 Installation

```bash
# Basic installation
pip install QuerySUTRA

# With MySQL support
pip install QuerySUTRA[mysql]

# With PostgreSQL support
pip install QuerySUTRA[postgres]

# With all database support
pip install QuerySUTRA[all]
```

---

## 🎯 Quick Start

```python
from sutra import SUTRA

# Initialize with OpenAI API key
sutra = SUTRA(api_key="your-openai-key")

# Upload any format
sutra.upload("data.csv")      # CSV
sutra.upload("report.pdf")    # PDF ✨
sutra.upload("doc.docx")      # Word ✨
sutra.upload("data.xlsx")     # Excel
sutra.upload(dataframe)       # DataFrame

# Query with natural language
result = sutra.ask("What are the top 5 products?", viz=True)
print(result.data)

# Export to cloud
sutra.save_to_mysql("localhost", "root", "pass", "mydb")
sutra.save_to_postgres("host", "user", "pass", "db")

# Complete backup
sutra.backup()
```

---

## 📄 Supported File Formats

| Format | Extension | Example |
|--------|-----------|---------|
| CSV | `.csv` | `sutra.upload("data.csv")` |
| Excel | `.xlsx`, `.xls` | `sutra.upload("data.xlsx")` |
| JSON | `.json` | `sutra.upload("data.json")` |
| SQL | `.sql` | `sutra.upload("schema.sql")` |
| **PDF** | `.pdf` | `sutra.upload("report.pdf")` ✨ |
| **Word** | `.docx` | `sutra.upload("document.docx")` ✨ |
| **Text** | `.txt` | `sutra.upload("data.txt")` ✨ |
| DataFrame | `pd.DataFrame` | `sutra.upload(df, name="sales")` |

---

## 🔥 New in v0.1.3

### 1. PDF Support
```python
# Upload PDF files
sutra.upload("annual_report.pdf")

# Query the content
result = sutra.ask("What are the key findings in this report?")
print(result.data)
```

### 2. Word Document Support
```python
# Upload DOCX files with tables
sutra.upload("sales_report.docx")

# Query the data
result = sutra.ask("Show me sales by region", viz=True)
```

### 3. Cloud Database Export

#### MySQL (Local or Cloud)
```python
# Local MySQL
sutra.save_to_mysql("localhost", "root", "password", "mydb")

# AWS RDS MySQL
sutra.save_to_mysql(
    host="mydb.xxxx.us-east-1.rds.amazonaws.com",
    user="admin",
    password="cloudpass",
    database="production"
)

# Google Cloud SQL
sutra.save_to_mysql(
    host="35.123.456.789",
    user="admin",
    password="pass",
    database="mydb"
)
```

#### PostgreSQL (Local or Cloud)
```python
# Local PostgreSQL
sutra.save_to_postgres("localhost", "postgres", "password", "mydb")

# Heroku PostgreSQL
sutra.save_to_postgres(
    host="ec2-xxx.compute-1.amazonaws.com",
    user="user",
    password="pass",
    database="dbname"
)

# AWS RDS PostgreSQL
sutra.save_to_postgres(
    host="mydb.xxxx.us-west-2.rds.amazonaws.com",
    user="admin",
    password="pass",
    database="prod"
)
```

### 4. Complete Export & Backup
```python
# Export entire database
sutra.export_db("backup.db", format="sqlite")
sutra.export_db("dump.sql", format="sql")
sutra.export_db("data.json", format="json")
sutra.export_db("data.xlsx", format="excel")

# Export schema only
sutra.save_schema("schema.sql", format="sql")
sutra.save_schema("schema.json", format="json")
sutra.save_schema("schema.md", format="markdown")

# Complete backup (creates 3 files)
sutra.backup()  # Creates .db, .sql, .json files with timestamp
```

---

## 📖 Complete Examples

### Example 1: PDF Analysis
```python
from sutra import SUTRA

sutra = SUTRA(api_key="your-openai-key")

# Upload PDF
sutra.upload("financial_report.pdf")

# View extracted data
sutra.peek(n=10)

# Query the content
result = sutra.ask("What are the total revenues?")
print(result.data)

# Visualize
result = sutra.ask("Show revenue by quarter", viz=True)
```

### Example 2: Multi-Format Analysis
```python
sutra = SUTRA(api_key="your-key")

# Upload multiple formats
sutra.upload("sales.csv")
sutra.upload("report.docx")
sutra.upload("data.xlsx")

# List all tables
print(sutra.tables())

# Query across data
result = sutra.ask("What are total sales?")
print(result.data)
```

### Example 3: Cloud Deployment
```python
# Analyze in Colab/Jupyter
sutra = SUTRA(api_key="your-key")
sutra.upload("local_analysis.csv")

# Query and analyze
result = sutra.ask("Show top performers", viz=True)

# Deploy to production MySQL
sutra.save_to_mysql(
    host="production.mysql.com",
    user="admin",
    password="prod_password",
    database="analytics_db"
)

# Backup everything
sutra.backup("/backups")
```

### Example 4: Direct SQL (No API Cost)
```python
# Execute SQL directly - FREE!
result = sutra.sql("""
    SELECT region, 
           SUM(sales) as total_sales,
           AVG(sales) as avg_sales
    FROM sales_data 
    GROUP BY region
    ORDER BY total_sales DESC
""")

print(result.data)
```

### Example 5: Interactive Mode
```python
# Ask user for visualization preference
result = sutra.interactive("What are sales trends?")
# Prompts: "Do you want visualization? (yes/no):"

if result.success:
    print(result.data)
```

---

## 🛠️ API Reference

### Initialization
```python
sutra = SUTRA(api_key="your-openai-key", db="sutra.db")
```

### Upload Data
```python
sutra.upload(data, name="table_name")
# data = file path (str) or DataFrame
```

### View Database
```python
sutra.tables()          # List all tables
sutra.schema()          # Show database schema
sutra.peek(n=10)       # Preview data
```

### Query Data
```python
# Direct SQL (no API cost)
result = sutra.sql("SELECT * FROM table", viz=False)

# Natural language (uses API)
result = sutra.ask("question", viz=False)

# Interactive (prompts user)
result = sutra.interactive("question")
```

### Export & Backup
```python
# Export results
sutra.export(dataframe, "output.csv", format="csv")

# Export database
sutra.export_db("backup.db", format="sqlite")

# Save to cloud
sutra.save_to_mysql(host, user, password, database)
sutra.save_to_postgres(host, user, password, database)

# Complete backup
sutra.backup("/backup/path")
```

### QueryResult Object
```python
result.success   # bool - query succeeded
result.sql       # str - generated SQL
result.data      # DataFrame - query results
result.viz       # figure - visualization (if viz=True)
result.error     # str - error message (if failed)
```

---

## 💡 Use Cases

### Data Analysis
```python
sutra.upload("sales_data.csv")
result = sutra.ask("What products have declining sales?", viz=True)
```

### Document Processing
```python
sutra.upload("contract.pdf")
result = sutra.ask("What are the key terms and dates?")
```

### Multi-Source Integration
```python
sutra.upload("sales.csv")
sutra.upload("inventory.xlsx")
sutra.upload("report.docx")
result = sutra.ask("Combine all data sources")
```

### Cloud Migration
```python
# Local analysis
sutra.upload("data.csv")
result = sutra.ask("Analyze trends")

# Deploy to cloud
sutra.save_to_postgres("cloud-db.com", "user", "pass", "prod")
```

---

## 🎨 Features Comparison

| Feature | Available | Cost |
|---------|-----------|------|
| CSV/Excel/JSON Upload | ✅ | Free |
| PDF Upload | ✅ | Free |
| DOCX Upload | ✅ | Free |
| Direct SQL Queries | ✅ | Free |
| Natural Language Queries | ✅ | ~$0.001/query |
| Visualization | ✅ | Free |
| MySQL Export | ✅ | Free |
| PostgreSQL Export | ✅ | Free |
| Backup & Export | ✅ | Free |

---

## 💰 Cost Optimization

```python
# FREE - Direct SQL (no API calls)
result = sutra.sql("SELECT * FROM data WHERE sales > 1000")

# PAID - Natural language (uses OpenAI API)
result = sutra.ask("Show products with sales over 1000")

# Tip: Use direct SQL when you know the query!
```

---

## 🧪 Testing

```bash
# Install
pip install QuerySUTRA

# Test
python -c "from sutra import SUTRA; print('✅ Success!')"
```

---

## 📚 Documentation

- **Full Guide**: See `SUTRA_Complete_Guide.ipynb`
- **Publishing**: See `PUBLISHING_GUIDE.md`
- **Examples**: See `complete_example.py`

---

## 🤝 Contributing

Contributions welcome! The main code is in `sutra/sutra.py` - a single, well-documented file.

---

## 📄 License

MIT License - Free to use in your projects!

---

## 🏆 Why QuerySUTRA?

- **SUTRA** = **S**tructured-**U**nstructured-**T**ext-**R**etrieval-**A**rchitecture
- Single-file design for simplicity
- Production-ready with error handling
- Cloud-native with export capabilities
- Comprehensive format support (PDF, DOCX, CSV, Excel, JSON)
- Cost-effective with free SQL mode

---

## 🌟 Credits

**Author**: Aditya Batta  
**Version**: 0.1.3  
**License**: MIT  

---

## 📞 Support

- **Issues**: [GitHub Issues](https://github.com/adityabatta/QuerySUTRA/issues)
- **PyPI**: [https://pypi.org/project/QuerySUTRA/](https://pypi.org/project/QuerySUTRA/)

---

**Made with ❤️ for data analysts and developers**

**Start analyzing with natural language today!** 🚀
