Metadata-Version: 2.4
Name: iqtoolkit-analyzer
Version: 0.2.6
Summary: AI-powered multi-database performance analyzer for PostgreSQL and MongoDB with privacy-first intelligence
License-Expression: MIT
License-File: LICENSE
Keywords: postgresql,mongodb,database,performance,optimization,sql,ai,openai,slow-queries,analyzer
Author: Giovanni Martinez
Author-email: gio@iqtoolkit.ai
Maintainer: Giovanni Martinez
Maintainer-email: gio@iqtoolkit.ai
Requires-Python: >=3.11,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: SQL
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Systems Administration
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Provides-Extra: all
Provides-Extra: cloud-ai
Provides-Extra: docs
Requires-Dist: anthropic (>=0.18.0) ; extra == "all"
Requires-Dist: anthropic (>=0.18.0) ; extra == "cloud-ai"
Requires-Dist: boto3 (>=1.42.16) ; extra == "all"
Requires-Dist: boto3 (>=1.42.16) ; extra == "cloud-ai"
Requires-Dist: fastapi (>=0.100.0)
Requires-Dist: google-generativeai (>=0.3.0) ; extra == "all"
Requires-Dist: google-generativeai (>=0.3.0) ; extra == "cloud-ai"
Requires-Dist: mkdocs (>=1.4.0) ; extra == "docs"
Requires-Dist: mkdocs-material (>=9.0.0) ; extra == "docs"
Requires-Dist: mkdocs-mermaid2-plugin (>=1.0.0) ; extra == "docs"
Requires-Dist: ollama
Requires-Dist: openai (>=1.0.0)
Requires-Dist: pandas (>=2.0.0)
Requires-Dist: psycopg (>=3.1.18)
Requires-Dist: pymongo (>=4.6.0) ; extra == "all"
Requires-Dist: python-dotenv (>=0.19.0)
Requires-Dist: pyyaml (>=6.0.0)
Requires-Dist: ruamel.yaml (>=0.17.21)
Requires-Dist: tqdm (>=4.0.0)
Requires-Dist: uvicorn (>=0.23.2)
Project-URL: Homepage, https://github.com/iqtoolkit/iqtoolkit-analyzer
Project-URL: Repository, https://github.com/iqtoolkit/iqtoolkit-analyzer
Description-Content-Type: text/markdown

# 🧠 IQToolkit Analyzer

An intelligent database performance analyzer that uses AI to diagnose slow queries and provide actionable optimization recommendations.

## 🎯 **Current Support: PostgreSQL + MongoDB + 6 AI Providers**
**✅ Production Ready**: PostgreSQL slow query analysis with comprehensive AI-powered recommendations  
**✅ Production Ready**: MongoDB slow query analysis with real-time profiler integration and multi-format reporting  
**✅ Multi-Cloud AI**: Google Gemini, AWS Bedrock, Anthropic Claude, Azure OpenAI, OpenAI, Ollama  
**🚧 Traditional SQL**: MySQL and SQL Server support in v0.4.0 (Q3 2026)

> **🚀 NEW in v0.2.6**: **Database-Direct EXPLAIN Analysis** — Run EXPLAIN against live PostgreSQL databases using IQToolkit config file. No log files needed! Plus contextual AI recommendations that acknowledge efficient queries.

> **🚀 NEW in v0.2.4**: Governance & version-sync patch — simplified PR/commit rules and aligned all version strings to 0.2.4. Multi-cloud AI support from v0.2.3 remains available (Gemini, Bedrock, Claude, Azure, OpenAI, Ollama).

> **🚀 NEW in v0.2.0**: MongoDB support is now fully available! Use `iqtoolkit-analyzer mongodb` to analyze your MongoDB performance with real-time profiler integration, comprehensive indexing recommendations, and multi-format reports.
## 🎯 Current Support
**✅ Production Ready**: PostgreSQL slow query analysis with AI-powered recommendations  
**✅ Production Ready**: MongoDB slow query analysis with real-time profiler integration and multi-format reporting  
**🚧 Planned**: MySQL and SQL Server support (see Roadmap)

> **🚀 Current repo version**: `0.2.6`. MongoDB support, 6 AI providers (Ollama + OpenAI + Gemini + Bedrock + Claude + Azure), and **database-direct EXPLAIN analysis** are available. Analyze PostgreSQL queries without log files!

![Python](https://img.shields.io/badge/python-3.13+-blue.svg)
![License](https://img.shields.io/badge/license-MIT-green.svg)
![AI Providers](https://img.shields.io/badge/AI-6%20Providers-brightgreen.svg)
![Ollama](https://img.shields.io/badge/AI-Ollama%20Default-blue.svg)
![Google](https://img.shields.io/badge/AI-Gemini-4285F4?logo=google&logoColor=white)
![AWS](https://img.shields.io/badge/AI-Bedrock-FF9900?logo=amazonaws&logoColor=white)
![Anthropic](https://img.shields.io/badge/AI-Claude-8A6BFF)
![Azure](https://img.shields.io/badge/AI-Azure%20OpenAI-0078D4?logo=microsoftazure&logoColor=white)
![AI Providers](https://img.shields.io/badge/AI-Ollama%20%26%20OpenAI-blue.svg)
![PostgreSQL](https://img.shields.io/badge/database-PostgreSQL%20Ready-336791?logo=postgresql&logoColor=white)
![MongoDB](https://img.shields.io/badge/database-MongoDB%20Ready-47A248?logo=mongodb&logoColor=white)
![MySQL](https://img.shields.io/badge/database-MySQL%20Planned%20v0.4.0-4479A1?logo=mysql&logoColor=white)
![SQL Server](https://img.shields.io/badge/database-SQL%20Server%20Planned%20v0.4.0-CC2927?logo=microsoftssqlserver&logoColor=white)

![Docker](https://img.shields.io/badge/docker-ready-blue.svg)


## 📚 Table of Contents

- [Overview](#-overview)
- [Key Features](#-key-features)
- [Quick Start](#-quick-start)
  - [Installation](#installation)
  - [Basic Usage](#basic-usage)
    - [Try with Sample Data](#try-with-sample-data)
    - [With Your Own Logs](#with-your-own-logs)
- [Sample Log Files](#-sample-log-files)
  - [Available Sample Files](#available-sample-files)
  - [Why .txt Extension?](#why-txt-extension)
  - [Sample Data Features](#sample-data-features)
  - [Sample Query Types Included](#sample-query-types-included)
- [Project Architecture](#-project-architecture)
  - [Data Flow](#data-flow)
- [Configuration](#-configuration)
  - [PostgreSQL Setup](#postgresql-setup)
  - [Environment Variables](#environment-variables)
  - [Configuration File](#configuration-file)
- [Slow Query Log Setup](#-slow-query-log-setup)
- [Sample Output](#-sample-output)
- [Command Line Options](#-command-line-options)
- [Troubleshooting](#-troubleshooting)
  - [Common Issues](#common-issues)
  - [Log File Locations](#log-file-locations)
- [Development](#-development)
  - [Running Tests](#running-tests)
  - [Code Quality](#code-quality)
    - [What does mypy do?](#what-does-mypy-do)
  - [Testing with Sample Data](#testing-with-sample-data)
  - [What is htmlcov and is it excluded?](#what-is-htmlcov-and-is-it-excluded)
- [System Requirements](#-system-requirements)
  - [Dependencies](#dependencies)
- [License](#-license)
  - [Development Setup](#development-setup)
- [Documentation](#-documentation)
- [Roadmap, Technical Debt & Contributing](#-roadmap-technical-debt--contributing)

## 🎯 Overview

IQToolkit Analyzer automatically analyzes your **PostgreSQL** and **MongoDB** slow query logs and provides intelligent, AI-powered optimization recommendations. It identifies performance bottlenecks, calculates impact scores, and generates detailed reports with specific suggestions for improving database performance.

### 🗄️ **Database & AI Support Status**

**Database Support:**
| Database | Status | Version | Timeline |
|----------|--------|---------|----------|
| **PostgreSQL** | ✅ **Fully Supported** | v0.1.x+ | Available now |
| **MongoDB** | ✅ **Fully Supported** | v0.2.0+ | Available now |
| **MySQL** | 🚧 Planned | v0.4.0 | Q3 2026 |
| **SQL Server** | 🚧 Planned | v0.4.0 | Q3 2026 |

**AI Provider Support:**
| AI Provider | Status | Version | Privacy | Speed | Cost |
|------------|--------|---------|---------|-------|------|
| **Ollama (Local)** | ✅ **Default** | v0.2.2+ | ⭐⭐⭐⭐⭐ | Fast | Free |
| **Google Gemini** | ✅ **Supported** | v0.2.4+ | ⭐⭐⭐ | Fast | ~$0.50/1K |
| **AWS Bedrock** | ✅ **Supported** | v0.2.4+ | ⭐⭐⭐⭐ | Medium | ~$3/1K |
| **Anthropic Claude** | ✅ **Supported** | v0.2.4+ | ⭐⭐⭐⭐ | Medium | ~$3/1K |
| **Azure OpenAI** | ✅ **Supported** | v0.2.4+ | ⭐⭐⭐⭐ | Fast | ~$0.50/1K |
| **OpenAI GPT** | ✅ **Supported** | v0.1.x+ | ⭐⭐⭐ | Fast | ~$0.15/1K |
| AI Provider | Status | Version |
|------------|--------|---------|
| **Ollama (Local/Remote)** | ✅ Supported | See the repo `VERSION` file |
| **OpenAI GPT** | ✅ Supported | v0.1.x+ |

> **📢 Multi-Cloud AI**: Now supporting 6 AI providers! Use Ollama (default) for free local analysis, or switch to cloud providers (Gemini, Bedrock, Claude, Azure) when you have API credits. Perfect for enterprise cloud deployments!

> **v0.1.6 Release Note**: This is the **final v0.1.x release with new features**. It includes comprehensive architecture documentation and prepares the codebase for multi-database support coming in v0.4.0. All references have been updated from "PostgreSQL-specific" to "database log analyzer" to reflect our roadmap for MySQL and SQL Server support. Future v0.1.x releases (v0.1.7+) will contain **bug fixes only** - all new features move to v0.2.0+.


## 🧩 Monorepo Overview

This repository now hosts a modular structure to support future services while keeping development fast:

```
iqtoolkit-analyzer/
├── iqtoolkit_analyzer/      # Current CLI package (to be service-ized)
├── iqtoolkit-contracts/     # Shared Pydantic models (Poetry package)
├── iqtoolkit-iqai/          # AI Copilot service (Poetry package)
├── iqtoolkithub/            # Orchestration gateway (Poetry package)
├── iqtoolkit-deployment/    # Helm charts and deployment assets
└── docs/                    # Documentation and samples
```

See [ROADMAP.md](ROADMAP.md) for the phase-by-phase plan.


### Key Features

- 🔍 **Smart Log Parsing**: 
  - **PostgreSQL**: Extracts slow queries from log files, supports multi-line queries and unusual characters
  - **MongoDB**: Real-time profiler integration for live slow query detection
- 📊 **Impact Analysis**: Calculates query impact using duration × frequency scoring
- 🤖 **AI-Powered Recommendations**: 
  - **6 AI Providers**: Ollama (default), Google Gemini, AWS Bedrock, Anthropic Claude, Azure OpenAI, OpenAI GPT
  - **Privacy Options**: Local Ollama for sensitive data, cloud providers for convenience
  - **Enterprise Ready**: Support for Azure and AWS enterprise deployments
- 📝 **Comprehensive Reports**: 
  - **PostgreSQL**: Detailed Markdown reports with statistics and recommendations
  - **MongoDB**: Multi-format reports (JSON, HTML, Markdown) with collection-level insights
- 📂 **Sample Data Included**: Ready-to-use sample log files for both PostgreSQL and MongoDB
- 🗂️ **Multiple Formats**: 
  - **PostgreSQL**: Plain, CSV, and JSON log formats
  - **MongoDB**: Direct profiler integration with configurable thresholds
- ⚙️ **Config File Support**: 
  - **PostgreSQL**: Use `.iqtoolkit-analyzer.yml` for analysis options
  - **MongoDB**: Use `.mongodb-config.yml` for connection and profiling settings
- 🔒 **Privacy & Flexibility**: 
  - **Local AI**: Ollama for privacy-first analysis (default)
  - **Cloud AI**: 5 cloud providers for enterprise deployments
  - **Your Choice**: Switch providers based on your infrastructure and requirements

## 🤖 AI Provider Setup

IQToolkit Analyzer supports 6 AI providers. Choose based on your needs:

### Ollama (Default - Local & Free)
```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull recommended model
ollama pull arctic-text2sql-r1:7b

# Use (no API key needed!)
python -m iqtoolkit_analyzer postgresql your.log --llm-provider ollama
```

### Google Gemini (Cloud - Fast & Affordable)
```bash
# Install dependencies
pip install iqtoolkit-analyzer[cloud-ai]

# Set API key (get from: https://makersuite.google.com/app/apikey)
export GEMINI_API_KEY="your-api-key"

# Use
python -m iqtoolkit_analyzer postgresql your.log --llm-provider gemini
```

### AWS Bedrock (Cloud - Enterprise)
```bash
# Install dependencies
pip install iqtoolkit-analyzer[cloud-ai]

# Configure AWS credentials
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
export AWS_DEFAULT_REGION="us-east-1"

# Use
python -m iqtoolkit_analyzer postgresql your.log --llm-provider bedrock
```

### Anthropic Claude (Cloud - Latest Models)
```bash
# Install dependencies
pip install iqtoolkit-analyzer[cloud-ai]

# Set API key (get from: https://console.anthropic.com/)
export ANTHROPIC_API_KEY="your-api-key"

# Use
python -m iqtoolkit_analyzer postgresql your.log --llm-provider claude
```

### Azure OpenAI (Cloud - Enterprise Azure)
```bash
# Set Azure credentials
export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"

# Use
python -m iqtoolkit_analyzer postgresql your.log --llm-provider azure
```

### OpenAI GPT (Cloud - Classic)
```bash
# Set API key
export OPENAI_API_KEY="your-api-key"

# Use
python -m iqtoolkit_analyzer postgresql your.log --llm-provider openai
```

**💡 Tip**: Check [`demos-youtube/`](demos-youtube/) for complete working examples with each provider!
- 🔧 **Extensible**: Future-ready architecture supports multiple databases and AI providers

## 🚀 Quick Start (Monorepo + Poetry)

> **⚡ Ready to analyze PostgreSQL or MongoDB slow queries right now?** Follow the installation below.  
> **🔮 Planning for MySQL/SQL Server?** [Join the early feedback program](https://github.com/iqtoolkit/iqtoolkit-analyzer/discussions) to shape v0.4.0 development!

### Installation

#### Preferred: Poetry

```bash
git clone https://github.com/iqtoolkit/iqtoolkit-analyzer.git
cd iqtoolkit-analyzer

# Install Poetry (pick one)
## macOS (Homebrew)
brew update && brew install poetry

## macOS/Linux (Official installer)
curl -sSL https://install.python-poetry.org | python3 -
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc && source ~/.zshrc

## Windows (PowerShell)
powershell -ExecutionPolicy Bypass -NoProfile -Command "(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -"

## Cross-platform (pipx)
pipx install poetry

# Install dependencies for shared/contracts and services
cd iqtoolkit-contracts && poetry install && cd -
cd iqtoolkit-iqai && poetry install && cd -
cd iqtoolkithub && poetry install && cd -

# Analyzer CLI (root package)
poetry install --with dev,test
```

#### AI Provider Setup (Both Options)

**Option A: Ollama (Recommended - Local or remote, private, no API key needed) ⭐**
```bash
# Local setup (see docs/5-minute-ollama-setup.md for details)
curl -LsSf https://ollama.com/install.sh | sh
ollama serve
ollama pull a-kore/Arctic-Text2SQL-R1-7B  # SQL-specialized model (recommended)

# Copy example config and customize
cp .iqtoolkit-analyzer.yml.example .iqtoolkit-analyzer.yml
# Edit: set llm_provider: ollama

# OR use remote Ollama server
export OLLAMA_HOST=http://your-server-ip:11434
# Or add to .iqtoolkit-analyzer.yml:
#   ollama_host: http://your-server-ip:11434
```

**Option B: OpenAI (Cloud, requires API key)**
```bash
export OPENAI_API_KEY="your-openai-api-key-here"
# Config will use OpenAI by default if no .iqtoolkit-analyzer.yml exists
```

> **💡 Tip**: Ollama can run locally or on a remote server—your queries stay within your infrastructure. Perfect for sensitive production data. See [Ollama Setup Guide](docs/ollama-local.md) for local and remote configuration details.

### Basic Usage

#### PostgreSQL Analysis

**🎯 Three Ways to Analyze:**

1. **📊 Analyze Log Files** (Traditional)
```bash
# With Poetry
poetry run python -m iqtoolkit_analyzer postgresql sample_logs/postgresql-2025-10-28_192816.log.txt --output report.md

# Or traditional venv/pip
python -m iqtoolkit_analyzer postgresql sample_logs/postgresql-2025-10-28_192816.log.txt --output report.md
```

2. **🔥 Run Live EXPLAIN Against Database** (NEW!) 
```bash
# Using IQToolkit config file (recommended)
poetry run python -m iqtoolkit_analyzer \
  --config ~/.iqtoolkit/config.yaml \
  postgresql \
  --db-name my_database \
  --sql "SELECT * FROM orders WHERE customer_email = 'user@example.com'" \
  --output analysis.md

# Or from a SQL file
poetry run python -m iqtoolkit_analyzer \
  --config ~/.iqtoolkit/config.yaml \
  postgresql \
  --db-name my_database \
  --query-file slow_query.sql \
  --output analysis.md
```

3. **📄 Analyze Standalone EXPLAIN JSON Files**
```bash
# Analyze pre-generated EXPLAIN output
poetry run python -m iqtoolkit_analyzer postgresql --plan explain_output.json --output report.md
```

#### MongoDB Analysis
```bash
# With Poetry
poetry run python -m iqtoolkit_analyzer mongodb --connection-string "mongodb://localhost:27017" --output ./reports

# With configuration file (Poetry)
poetry run python -m iqtoolkit_analyzer mongodb --config .mongodb-config.yml --output ./reports

# Or traditional venv/pip
python -m iqtoolkit_analyzer mongodb --connection-string "mongodb://localhost:27017" --output ./reports
```

#### Advanced Usage Examples
```bash
# PostgreSQL: Analyze top 5 slowest queries (Poetry)
poetry run python -m iqtoolkit_analyzer postgresql docs/sample_logs/postgresql/postgresql-2025-10-28_192816.log.txt --output report.md --top-n 5

# MongoDB: Generate multiple report formats
poetry run python -m iqtoolkit_analyzer mongodb --connection-string "mongodb://localhost:27017" --output ./reports --format json html markdown

# PostgreSQL: Get more detailed AI analysis
poetry run python -m iqtoolkit_analyzer postgresql docs/sample_logs/postgresql/postgresql-2025-10-28_192816.log.txt --output report.md --max-tokens 200

# MongoDB: Enable verbose debug output
poetry run python -m iqtoolkit_analyzer mongodb --connection-string "mongodb://localhost:27017" --output ./reports --verbose

# Traditional approach for any of the above
python -m iqtoolkit_analyzer postgresql sample_logs/postgresql-2025-10-28_192816.log.txt --output report.md --top-n 5
```

#### With Your Own Logs
```bash
# Basic analysis (Poetry)
poetry run python -m iqtoolkit_analyzer /path/to/your/postgresql.log --output analysis_report.md

# Advanced options (Poetry)
poetry run python -m iqtoolkit_analyzer /path/to/your/postgresql.log \
  --output detailed_report.md \
  --top-n 10 \
  --min-duration 1000 \
  --max-tokens 150 \
  --verbose

# Traditional venv/pip
python -m iqtoolkit_analyzer /path/to/your/postgresql.log --output analysis_report.md
```

## 📂 Sample Log Files

The `docs/sample_logs/` directory contains database slow query log examples for testing and demonstration:

### ✅ **Current Support**
- **PostgreSQL**: Real sample logs from 100M record database operations with authentic slow queries → [View samples](docs/sample_logs/postgresql/)
- **MongoDB**: Complete profiler integration with real-time slow query detection and comprehensive optimization recommendations → [View samples](docs/sample_logs/mongodb/)

### 🚧 **Future Support (v0.4.0 - Q3 2026)**
- **MySQL**: Placeholder directory with configuration examples and feedback collection → [View samples](docs/sample_logs/mysql/)
- **SQL Server**: Placeholder directory with Extended Events samples and configuration → [View samples](docs/sample_logs/sqlserver/)

> 🎯 **Early Feedback Opportunities**: 
> - **MySQL Users**: [Share your slow query log formats and challenges](https://github.com/iqtoolkit/iqtoolkit-analyzer/issues/new?labels=mysql-feedback&title=MySQL%20Requirements)
> - **SQL Server DBAs**: [Tell us about your Extended Events setup and pain points](https://github.com/iqtoolkit/iqtoolkit-analyzer/issues/new?labels=sqlserver-feedback&title=SQL%20Server%20Requirements)

### Available Sample Files

- **`postgresql-2025-10-28_192816.log.txt`**: Contains authentic slow queries from a 100M record database including:
  - **Complex aggregation queries** (15.5+ seconds): Statistical calculations across 40M records
  - **Expensive correlated subqueries** (109+ seconds): Text pattern matching with per-row subqueries  
  - **Mathematical operations with window functions** (209+ seconds): Multiple window functions with trigonometric calculations
  - **Multiple query patterns** that benefit from different optimization strategies (indexes, query rewrites, JOIN optimizations)

### Why `.txt` Extension?

Sample log files use the `.txt` extension instead of `.log` to prevent them from being excluded by `.gitignore` patterns that typically ignore `*.log` files. This ensures the sample data remains available in the repository for testing and demonstration purposes.

### Sample Data Features

- **Real Performance Issues**: Authentic slow queries from actual 100M record database operations
- **Variety of Problems**: Different types of performance bottlenecks (missing indexes, correlated subqueries, expensive window functions)
- **AI-Ready**: Perfect for testing AI recommendation quality with real optimization opportunities
- **Educational**: Great examples for learning PostgreSQL performance optimization techniques
- **Range of Complexity**: From 2-second queries to 209-second extreme cases

### Sample Query Types Included

1. **Aggregation with Mathematical Functions** (15.5s)
   - `AVG`, `STDDEV`, `COUNT` operations on large datasets
   - Range filtering across 40M records
   - Perfect for testing index recommendations

2. **Correlated Subqueries with Pattern Matching** (109s)
   - `LIKE` operations with multiple patterns
   - Correlated subquery executing for each row
   - Demonstrates JOIN optimization opportunities

3. **Window Functions with Mathematical Operations** (209s)
   - Multiple `ROW_NUMBER()`, `RANK()`, `LAG()`, `LEAD()` functions
   - Complex mathematical calculations (`SQRT`, `SIN`, `COS`, `LOG`)
   - Heavy sorting and partitioning operations

## 🏗️ Project Architecture (Monorepo)

```
iqtoolkit-analyzer/
├── iqtoolkit_analyzer/      # Current CLI package (to be service-ized)
├── iqtoolkit-contracts/     # Shared Pydantic models (Poetry package)
├── iqtoolkit-iqai/          # AI Copilot service (Poetry package)
├── iqtoolkithub/            # Orchestration gateway (Poetry package)
├── iqtoolkit-deployment/    # Helm charts and deployment assets
└── docs/                    # Documentation and samples
```

### Data Flow

1. **Parse** → Extract slow queries from database logs (currently PostgreSQL)
2. **Analyze** → Calculate impact scores and normalize queries  
3. **AI Analysis** → Generate optimization recommendations using AI models
4. **Report** → Create comprehensive Markdown analysis report

> See [ROADMAP.md](ROADMAP.md) for milestones and central-plan phases. For the exact current version, see the repo `VERSION` file.


## ⚙️ Configuration

### 🐘 **PostgreSQL Setup**

See the full guide: [docs/getting-started.md](docs/getting-started.md)

Enable slow query logging in your `postgresql.conf`:

```conf
# Log queries taking longer than 1 second
log_min_duration_statement = 1000

# Enable logging collector
logging_collector = on

# Set log directory (relative to data_directory)
log_directory = 'log'

# Log file naming pattern
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'

# What to log
log_statement = 'none'
log_duration = off
```

Or configure dynamically:
```sql
-- Enable for current session
SET log_min_duration_statement = 1000;

-- Enable globally (requires restart)
ALTER SYSTEM SET log_min_duration_statement = 1000;
SELECT pg_reload_conf();
```
## 🐢 Slow Query Log Setup

For a step-by-step guide to enabling slow query logging, running example queries, and analyzing logs, see:

- [docs/getting-started.md](docs/getting-started.md)

This guide covers:
- Editing postgresql.conf
- Session-level logging
- Running example slow queries
- Collecting and analyzing logs with IQToolkit Analyzer

### 🍃 **MongoDB Setup**

MongoDB analysis uses the built-in profiler to collect slow operation data. Enable profiling for your databases:

```javascript
// Enable profiling for operations slower than 100ms
db.setProfilingLevel(2, {slowms: 100})

// Check profiling status
db.getProfilingStatus()

// View recent slow operations
db.system.profile.find().limit(5).sort({ts: -1}).pretty()
```

Create a `.mongodb-config.yml` configuration file:

```yaml
# MongoDB Connection
connection:
  connection_string: "mongodb://localhost:27017"
  connection_timeout_ms: 5000
  
# Performance Thresholds
thresholds:
  slow_threshold_ms: 100.0
  very_slow_threshold_ms: 1000.0
  critical_threshold_ms: 5000.0
  
# Analysis Settings
databases_to_monitor: ["myapp", "analytics"]
exclude_databases: ["admin", "config", "local"]

# Report Settings
reporting:
  formats: ["json", "html", "markdown"]
  include_query_samples: true
  max_query_samples: 5
```

For complete MongoDB setup instructions, see: [docs/mongodb-guide.md](docs/mongodb-guide.md)

### Environment Variables

| Variable | Description | Default | Required |
|----------|-------------|---------|----------|
| `OPENAI_API_KEY` | OpenAI API key | None | For OpenAI provider |
| `OPENAI_MODEL` | GPT model to use | `gpt-4o-mini` | Optional |
| `OPENAI_BASE_URL` | Custom OpenAI endpoint | `https://api.openai.com/v1` | Optional |

### Configuration File

Create a `.iqtoolkit-analyzer.yml` file to customize behavior:

```yaml
# AI Provider Selection
llm_provider: ollama  # or 'openai'
ollama_model: arctic-text2sql-r1:7b
ollama_host: http://localhost:11434  # optional, for custom hosts

# OpenAI (if using)
openai_api_key: sk-xxx  # optional, can use env var instead
openai_model: gpt-4o-mini

# Analysis Options
log_format: csv
top_n: 10
output: reports/report.md
min_duration: 1000

# LLM Configuration
llm_temperature: 0.3
max_tokens: 300
llm_timeout: 30
```

See [Configuration Guide](docs/configuration.md) for all options and [Ollama Local Setup](docs/ollama-local.md) for local AI setup.

## 📊 Sample Output

```markdown
# Slow Query Analysis Report

## Summary
- **Total queries analyzed**: 8
- **Slow queries found**: 4  
- **Total duration**: 336,175.06 ms
- **Most impactful query**: Mathematical operations with window functions

## Top Slow Queries

### Query #1: Mathematical Operations with Window Functions (Impact Score: 209,297.06)
**Duration**: 209,297.06 ms | **Frequency**: 1 | **First seen**: 2025-10-28 20:04:57

```sql
SELECT id, random_number, random_text, created_at,
    SQRT(ABS(random_number)::numeric) as sqrt_abs_number,
    LOG(GREATEST(random_number, 1)::numeric) as log_number,
    SIN(random_number::numeric / 180000.0 * PI()) as sin_degrees,
    ROW_NUMBER() OVER (ORDER BY random_number) as row_num_asc,
    AVG(random_number) OVER (ROWS BETWEEN 1000 PRECEDING AND 1000 FOLLOWING) as moving_avg
FROM large_test_table 
WHERE random_number BETWEEN 250000 AND 750000
  AND (id % 7 = 0 OR id % 11 = 0 OR id % 13 = 0)
ORDER BY SQRT(ABS(random_number)::numeric) DESC
LIMIT 200;
```

**🤖 AI Recommendation:**
This query suffers from expensive mathematical operations and multiple window functions. Create a composite index on `(random_number, id)` and consider materializing complex calculations. The multiple window functions could be optimized by combining operations. Expected improvement: 70-85% faster execution.

### Query #2: Correlated Subquery with Pattern Matching (Impact Score: 109,234.02)
**Duration**: 109,234.02 ms | **Frequency**: 1 | **First seen**: 2025-10-28 19:31:23

```text
SELECT DISTINCT l1.random_number, l1.random_text, l1.created_at,
    (SELECT COUNT(*) FROM large_test_table l2 WHERE l2.random_number = l1.random_number)
FROM large_test_table l1
WHERE l1.random_text LIKE '%data_555%' OR l1.random_text LIKE '%data_777%'
ORDER BY l1.random_number DESC LIMIT 30;
```

**🤖 AI Recommendation:**
Replace the correlated subquery with a JOIN or window function. Create indexes on `random_text` (consider GIN for pattern matching) and `random_number`. The LIKE operations with leading wildcards are expensive - consider full-text search if applicable. Expected improvement: 60-80% faster execution.
```

## 🔧 Command Line Options

| Option | Description | Default |
|--------|-------------|---------|
| `LOG_FILE` | Path to PostgreSQL log file (optional if using --db-name) | - |
| `--plan` | Treat `LOG_FILE` as an EXPLAIN (ANALYZE, FORMAT JSON) output | - |
| **`--db-name`** | **🔥 Database key from config file (enables live EXPLAIN)** | - |
| **`--sql`** | **🔥 Inline SQL query to run EXPLAIN against (requires --db-name)** | - |
| **`--query-file`** | **🔥 Path to SQL file to run EXPLAIN against (requires --db-name)** | - |
| `--config` | Path to IQToolkit config file (required for --db-name) | `.iqtoolkit-analyzer.yml` |
| `--output`, `-o` | Output report file path | `slow_query_report.md` |
| `--top-n`, `-n` | Number of top queries to analyze | `10` |
| `--min-duration` | Minimum duration (ms) to consider | `1000` |
| `--max-tokens` | Max tokens for AI analysis | `150` |
| `--model` | OpenAI model to use | `gpt-4o-mini` |
| `--verbose` | Enable verbose (debug) output for troubleshooting and progress tracking | - |
| `--help`, `-h` | Show help message | - |

### MongoDB Analysis
```bash
# With Poetry (recommended)
poetry run python -m iqtoolkit_analyzer mongodb [OPTIONS]

# Traditional approach
python -m iqtoolkit_analyzer mongodb [OPTIONS]
```

| Option | Description | Default |
|--------|-------------|---------|
| `--connection-string` | MongoDB connection string | Required |
| `--config`, `-c` | Configuration file path (YAML format) | None |
| `--output`, `-o` | Output directory for reports | `./reports` |
| `--format`, `-f` | Report formats: json, markdown, html | `json` |
| `--databases` | Databases to analyze (comma-separated) | All accessible |
| `--verbose` | Enable verbose (debug) output | - |
| `--help`, `-h` | Show help message | - |

## 🐛 Troubleshooting

### Common Issues

#### PostgreSQL Issues

**"No slow queries found"**
```bash
# Check if log file contains duration entries
grep -i "duration:" your_log_file.log

# Verify PostgreSQL logging is enabled
psql -c "SHOW log_min_duration_statement;"
```

**"Permission denied on log file"**
```bash
# Fix file permissions
chmod 644 /path/to/postgresql.log
```

#### MongoDB Issues

**"Connection failed"**
```bash
# Test MongoDB connection
mongosh "mongodb://localhost:27017" --eval "db.adminCommand('ismaster')"

# Check if profiler is enabled
mongosh "mongodb://localhost:27017/mydb" --eval "db.getProfilingStatus()"
```

**"No profiler data found"**
```bash
# Enable MongoDB profiling for slow operations (>100ms)
mongosh "mongodb://localhost:27017/mydb" --eval "db.setProfilingLevel(2, {slowms: 100})"

# Check system.profile collection
mongosh "mongodb://localhost:27017/mydb" --eval "db.system.profile.count()"
```

#### AI/General Issues

**"OpenAI API Error" (v0.1.x Only)**
```bash
# Verify API key is set
echo $OPENAI_API_KEY

# Test API connectivity
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
     https://api.openai.com/v1/models
```

> **💡 Alternative**: If you prefer local AI processing for privacy, consider waiting for v0.2.0 with Ollama support (Nov 2025 - Q1 2026).

# Or copy to accessible location
cp /var/log/postgresql/postgresql.log ~/my_log.log
```

### Log File Locations

| Installation Method | Typical Log Location |
|-------------------|---------------------|
| **Homebrew (macOS)** | `/opt/homebrew/var/postgresql@*/log/` |
| **Ubuntu/Debian** | `/var/log/postgresql/` |
| **CentOS/RHEL** | `/var/lib/pgsql/*/data/log/` |
| **Docker** | `/var/lib/postgresql/data/log/` |
| **Windows** | `C:\Program Files\PostgreSQL\*\data\log\` |

## 🧪 Development

### Quick Development Setup
```bash
# Clone and setup (prefers Poetry, falls back to venv/pip)
git clone https://github.com/iqtoolkit/iqtoolkit-analyzer.git
cd iqtoolkit-analyzer
make setup

# Or traditional approach (manual)
python -m venv .venv
source .venv/bin/activate
pip install -e .[dev,test]
```

### Running Tests
```bash
make test          # Run all tests
make lint          # Run linting
make format        # Format code

# Traditional approach
pytest tests/ -v
pytest tests/ --cov=iqtoolkit_analyzer --cov-report=html
```

### What is htmlcov and is it excluded?
htmlcov is the folder where the HTML coverage report is generated when you run tests with coverage reporting. In this project:

- How it’s generated:
  - Pytest is configured in pyproject.toml to produce coverage reports, including HTML, via addopts:
    --cov=iqtoolkit_analyzer --cov-report=term-missing --cov-report=html --cov-report=xml
  - The HTML output directory is configured under [tool.coverage.html] as directory = "htmlcov".
  - You’ll typically get it by running make test (which runs pytest with those flags) or pytest ... --cov-report=html.
- Where to view it:
  - Open htmlcov/index.html in your browser to see per-file and line-level coverage.
- Is it excluded from Git?
  - Yes. .gitignore contains htmlcov/ so the generated report is not committed.
- How to clean it up:
  - make clean removes htmlcov/ along with other build/test artifacts.

### Code Quality
```bash
# With Makefile (recommended)
make format        # Format with ruff
make lint          # Lint with ruff + mypy
make validate      # Full validation suite

# Traditional approach
ruff format .
ruff check .
poetry run mypy iqtoolkit_analyzer
```

#### What does mypy do?
Mypy is a static type checker for Python. It analyzes your code without executing it to catch type-related errors early and to make the codebase easier to maintain.

In this repository, mypy helps to:
- Prevent common bugs by verifying function inputs/outputs match their annotations
- Enforce consistent, explicit types (useful in a data-heavy tool like this)
- Improve editor/IDE auto-completion and refactoring safety

How it’s configured here:
- Configuration lives in pyproject.toml under [tool.mypy]
- We enable a relatively strict set of options:
  - disallow-untyped-defs, disallow-incomplete-defs, disallow-untyped-decorators
  - no_implicit_optional, warn_redundant_casts, warn_unused_ignores, warn_no_return, warn_unreachable
  - strict_equality and check_untyped_defs
- Third‑party modules with incomplete type hints (like openai, dotenv) are allowed via ignore_missing_imports overrides.

How to run it:
- Recommended: make lint (runs ruff check then mypy)
- Directly: poetry run mypy iqtoolkit_analyzer

Common fixes:
- Add or refine type hints: parameters, return types, and local variables when useful
- Use Optional[T] (or | None) when something can be None
- Narrow types with isinstance checks before using values
- For one-off unavoidable cases, use a targeted suppression:  # type: ignore[code]

Type stubs:
- If a dependency lacks types, prefer installing its types (e.g., types-pyyaml)
- If none exist, consider adding minimal annotations around your usage or a local stub package later

### Testing with Sample Data
```bash
# Test the parser
python -c "from iqtoolkit_analyzer import parse_postgres_log; print(len(parse_postgres_log('sample_logs/postgresql-2025-10-28_192816.log.txt')))"

# Test full pipeline with sample data
python -m iqtoolkit_analyzer sample_logs/postgresql-2025-10-28_192816.log.txt --output test_report.md

# Verify AI recommendations are generated
grep -A 5 "🤖 AI Recommendation" test_report.md
```

## 📋 System Requirements

- **Python**: 3.11 or higher
- **Memory**: 512MB+ available RAM
- **Storage**: 50MB+ free space
- **Network**: Internet connection for OpenAI API
- **Platforms**: macOS, Linux, Windows


### Dependencies

- `openai>=1.0.0` - OpenAI API client
- `python-dotenv>=0.19.0` - Environment variable management
- `pandas>=2.0.0` - Data analysis and CSV/JSON log support
- `pyyaml>=6.0.0` - YAML config file support
- `tqdm>=4.0.0` - Progress bars for large log analysis
- `pytest`, `pytest-cov` - Testing and coverage (dev)
- `ruff`, `mypy`, `pre-commit` - Code quality (dev)
- `argparse` - Command line parsing (built-in)
- `re`, `json`, `logging` - Standard library modules

## 📄 License

MIT License - see [LICENSE](LICENSE) file for details.

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit changes (`git commit -m 'Add amazing feature'`)
4. Push to branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

### Development Setup (Poetry)
```bash
# Clone your fork
git clone https://github.com/yourusername/iqtoolkit-analyzer.git
cd iqtoolkit-analyzer

# Complete development environment setup
bash scripts/setup-dev-environment.sh

# Install git hooks for automated version management
bash scripts/setup-hooks.sh

# Verify everything works
make check-version
make test
```
## 📈 Roadmap, Technical Debt & Contributing

**Database Support Roadmap:**
- **v0.1.6** (Nov 2025): Final v0.1.x feature release - Documentation & architecture updates 🔒
- **v0.1.7+**: Bug fixes only (feature freeze for v0.1.x branch)
- **v0.2.0** (Nov 2025 - Q1 2026): Configurable AI providers (Ollama default), enhanced config system, EXPLAIN plans, HTML reports 🔧
- **v0.3.0** (Q2 2026): ML/self-learning features 📋
- **v0.4.0** (Q3 2026): **MySQL and SQL Server support** 📋

**AI Provider Evolution:**
- **v0.1.x CURRENT**: OpenAI GPT models only (requires OPENAI_API_KEY)
- **v0.2.0 COMING**: Configurable providers with Ollama as privacy-first default
- **v0.2.0+ FUTURE**: Extensible to Claude, Gemini, custom endpoints

> **⚠️ Privacy Note for v0.1.x**: Current version sends query data to OpenAI's public API. For sensitive production logs, consider waiting for v0.2.0 with local Ollama support.

**When asked about new features**: 
- **For v0.1.x**: "v0.1.6 is the final feature release. New features go to v0.2.0+ roadmap."
- **For MySQL/SQL Server**: "Added to v0.4.0 roadmap (Q3 2026) - we're focusing on perfecting PostgreSQL analysis first with v0.2.0 configurable AI providers."

## 📚 Documentation

For complete documentation and guides, see our [**Documentation Index**](DOCUMENTATION_INDEX.md) 📖

**Quick Links:**
- 🚀 [Getting Started](docs/getting-started.md) - New user tutorial
- 🤝 [Contributing Guide](CONTRIBUTING.md) - How to contribute
- ⚙️ [Configuration](docs/configuration.md) - Setup and config options  
- 💡 [PostgreSQL Examples](docs/pg_examples.md) - Real usage examples
- ❓ [FAQ](docs/faq.md) - Common questions and troubleshooting

## 🤝 Roadmap, Technical Debt & Contributing

- See [ROADMAP.md](ROADMAP.md) for the full project roadmap, timeline, and community requests.
- See [TECHNICAL_DEBT.md](TECHNICAL_DEBT.md) for known limitations and areas for future improvement.
- See [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines and code standards.
- See [VERSION_MANAGEMENT.md](VERSION_MANAGEMENT.md) for automated version synchronization.
- See [BRANCH_PROTECTION.md](BRANCH_PROTECTION.md) for repository governance and branch protection rules.
- See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed system architecture and extension points.

**Made with ❤️ for Database performance optimization**
