Metadata-Version: 2.4
Name: smartkdb
Version: 4.0.2
Summary: SmartKDB – Cognitive & AI-Training-Aware Embedded Database
Author-email: Karrar <alhdrawykrar@gmail.com>
License: MIT License
        
        Copyright (c) 2025 Karrar Team
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/karrar/smartkdb
Project-URL: Bug Tracker, https://github.com/karrar/smartkdb/issues
Keywords: embedded-database,ai-ready,smart-index,local-db,realtime-db
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Database
Classifier: Topic :: Database :: Database Engines/Servers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# SmartKDB 🧠
**The Cognitive, AI-Native Embedded Database for Python.**

[![PyPI version](https://badge.fury.io/py/smartkdb.svg)](https://badge.fury.io/py/smartkdb)
[![Python](https://img.shields.io/pypi/pyversions/smartkdb.svg)](https://pypi.org/project/smartkdb/)

SmartKDB is not just a database; it's a **data engine for the AI era**. It combines a local-first NoSQL store with a "Brain" that learns from your queries, an "Agent" you can chat with, and a "Training Hub" to manage ML datasets.

---

## 📚 Table of Contents
1.  [Quick Start (The 5-Minute Crash Course)](#1-quick-start-the-5-minute-crash-course)
2.  [Core Database Operations (CRUD)](#2-core-database-operations-crud)
3.  [The Query Engine](#3-the-query-engine)
4.  [Cognitive Layer (Chat & Agent)](#4-cognitive-layer-chat--agent)
5.  [AI Training Hub (Datasets & Logs)](#5-ai-training-hub-datasets--logs)
6.  [Vector Search & Embeddings](#6-vector-search--embeddings)
7.  [Architecture & Internals](#7-architecture--internals)

---

## 1. Quick Start (The 5-Minute Crash Course)

Let's build a simple **Hospital System** to demonstrate SmartKDB.

```bash
pip install smartkdb
```

```python
from kdb import SmartKDB

# 1. Initialize the DB (Creates a folder 'hospital.kdb')
db = SmartKDB("hospital.kdb")

# 2. Setup Security (First run only)
# SmartKDB enforces RBAC. You must create an admin.
db.auth.create_user("admin", "admin123", "admin")
db.login("admin", "admin123")

# 3. Create a Table
# Note: SmartKDB is schema-less (NoSQL), but you define the Primary Key (pk)
# and any initial indexes for speed.
patients = db.create_table("patients", pk="id", indexes=["age", "diagnosis"])

# 4. Insert Data
patients.insert({
    "id": "P001",
    "name": "Ahmad",
    "age": 54,
    "diagnosis": "pneumonia",
    "status": "admitted"
})
patients.insert({
    "id": "P002",
    "name": "Sarah",
    "age": 29,
    "diagnosis": "flu",
    "status": "discharged"
})

# 5. Run a Query
# Find all patients older than 40
results = patients.query().where("age", ">", 40).execute()
print(f"Found {len(results)} patients: {results}")
```

---

## 2. Core Database Operations (CRUD)

SmartKDB provides a simple, Pythonic API for all standard operations.

### Create (Insert)
```python
# Returns the inserted document (including the generated PK if missing)
doc = patients.insert({
    "id": "P003",
    "name": "Ali",
    "age": 12,
    "diagnosis": "fracture"
})
```

### Read (Get by ID)
```python
# Extremely fast O(1) lookup by Primary Key
patient = patients.get("P001")
if patient:
    print(patient["name"])
```

### Update
```python
# Updates specific fields. Merges with existing data.
# Returns the updated document.
updated_doc = patients.update("P001", {
    "status": "discharged",
    "notes": "Recovered fully"
})
```

### Delete
```python
# Removes the record permanently
patients.delete("P003")
```

---

## 3. The Query Engine

SmartKDB's query engine is designed for readability and speed.

### Basic Filtering
```python
# Syntax: .where(field, operator, value)
# Operators: "==", "!=", ">", "<", ">=", "<=", "in", "contains"

# Find patients with flu OR pneumonia
results = patients.query() \
    .where("diagnosis", "in", ["flu", "pneumonia"]) \
    .execute()
```

### Chaining & Sorting
```python
results = patients.query() \
    .where("age", ">", 20) \
    .where("status", "==", "admitted") \
    .sort_by("age", ascending=False) \
    .limit(10) \
    .execute()
```

### Semantic Query (AI-Powered)
Don't want to write code? Use natural language.
```python
# The DB understands "older than", "active", "limit", etc.
results = db.semantic_query("patients", "patients older than 50 limit 5")
```

---

## 4. Cognitive Layer (Chat & Agent)

SmartKDB v4 introduces an **internal agent** that monitors your database.

### Chat with your DB
You can ask the database about its own state.
```python
response = db.chat("How many tables do we have?")
print(response['message'])
# Output: "You have 1 table: patients."

response = db.chat("Which tables are hot?")
print(response['message'])
# Output: "The 'patients' table is experiencing high read volume."
```

### Predictive Advice
The agent analyzes query patterns and suggests optimizations.
```python
# Ask for advice
advice = db.chat("Do you recommend any indexes?")
# If you query 'status' often but it's not indexed, the Brain will suggest it.
```

---

## 5. AI Training Hub (Datasets & Logs)

SmartKDB is built to be the **backend for your AI models**. It manages the chaos of training data.

### Step 1: Create a Dataset
Instead of dumping CSVs, define datasets dynamically from your tables.
```python
# Create a dataset of 'pneumonia' cases for an X-Ray model
db.datasets.create_dataset(
    name="pneumonia_cases",
    table="patients",
    filter_query={"diagnosis": "pneumonia"}
)
```

### Step 2: Define Splits
Reproducibility is key. Define your Train/Test/Val splits once.
```python
# 70% Train, 15% Validation, 15% Test
db.datasets.define_split("pneumonia_cases", 0.7, 0.15, 0.15)
```

### Step 3: Log Training Experiments
Keep your training metrics right next to your data.
```python
# Start a session
session_id = db.training_logger.start_session(
    model_name="xray_v1",
    dataset_name="pneumonia_cases",
    config={"epochs": 10, "lr": 0.001}
)

# Log metrics (e.g., inside your PyTorch/TensorFlow loop)
db.training_logger.log_metric(session_id, 1, {"loss": 0.8, "acc": 0.6})
db.training_logger.log_metric(session_id, 2, {"loss": 0.5, "acc": 0.8})

# Finish
db.training_logger.end_session(session_id, "success")
```

---

## 6. Vector Search & Embeddings

SmartKDB has a built-in vector store. You don't need a separate vector DB.

```python
# 1. Enable Vector Index on a text field
db.enable_vector_index("patients", "notes")

# 2. Add data (Vectors are updated automatically if you hook up an embedder, 
# or you can push them manually - *Automatic embedding coming in v4.1*)
# For now, v4 supports storing and searching pre-computed vectors or text-similarity 
# if using the default TF-IDF fallback.

# 3. Search
similar_patients = db.vector_search("patients", "patient has breathing issues", "notes")
```

---

## 7. Architecture & Internals

For the curious engineer:

*   **Storage**: Append-only log structure (`BlockStorage`). Extremely robust against corruption.
*   **Indexing**: In-memory B-Tree for Primary Keys, Hash Maps for Secondary Indexes.
*   **The Brain**: A JSON-based state machine that tracks query statistics (`kdb_brain.json`).
*   **Concurrency**: Single-writer, Multi-reader (SWMR) model using file locks.
*   **Format**: All data is stored as human-readable(ish) JSON lines or binary blocks depending on config.

---

**License**: MIT  
**Author**: Alhdrawi
