Metadata-Version: 2.4
Name: gurulearn
Version: 5.0.1
Summary: Comprehensive AI/ML library integrating machine learning, computer vision, audio processing, and conversational AI
Author-email: Guru Dharsan T <gurudharsan123@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/guru-dharsan-git/gurulearn
Project-URL: Documentation, https://github.com/guru-dharsan-git/gurulearn#readme
Project-URL: Repository, https://github.com/guru-dharsan-git/gurulearn
Project-URL: Issues, https://github.com/guru-dharsan-git/gurulearn/issues
Keywords: machine-learning,deep-learning,computer-vision,audio-processing,nlp,ai,image-classification,medical-imaging,chatbot,rag
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Processing
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<3,>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: scipy>=1.10
Requires-Dist: scikit-learn>=1.3
Requires-Dist: matplotlib>=3.7
Requires-Dist: pillow>=10.0
Requires-Dist: joblib>=1.3
Provides-Extra: vision
Requires-Dist: torch>=2.0; extra == "vision"
Requires-Dist: torchvision>=0.15; extra == "vision"
Provides-Extra: audio
Requires-Dist: tensorflow>=2.16; extra == "audio"
Requires-Dist: librosa>=0.10; extra == "audio"
Requires-Dist: seaborn>=0.13; extra == "audio"
Provides-Extra: medical
Requires-Dist: opencv-python-headless>=4.8; extra == "medical"
Provides-Extra: agent
Requires-Dist: langchain>=0.3; extra == "agent"
Requires-Dist: langchain-ollama>=0.3; extra == "agent"
Requires-Dist: langchain-community>=0.3; extra == "agent"
Requires-Dist: faiss-cpu>=1.8; extra == "agent"
Provides-Extra: ml-extra
Requires-Dist: xgboost>=2.0; extra == "ml-extra"
Requires-Dist: lightgbm>=4.0; extra == "ml-extra"
Requires-Dist: plotly>=5.18; extra == "ml-extra"
Provides-Extra: gpu
Requires-Dist: torch>=2.0; extra == "gpu"
Requires-Dist: torchvision>=0.15; extra == "gpu"
Requires-Dist: tensorflow[and-cuda]>=2.16; extra == "gpu"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: black>=24.0; extra == "dev"
Requires-Dist: ruff>=0.3; extra == "dev"
Requires-Dist: mypy>=1.8; extra == "dev"
Requires-Dist: pre-commit>=3.6; extra == "dev"
Provides-Extra: full
Requires-Dist: gurulearn[agent,audio,medical,ml-extra,vision]; extra == "full"
Dynamic: license-file

<p align="center">
  <img src="https://img.shields.io/badge/version-5.0.0-blue.svg" alt="Version">
  <img src="https://img.shields.io/badge/python-3.9+-green.svg" alt="Python">
  <img src="https://img.shields.io/badge/license-MIT-orange.svg" alt="License">
  <img src="https://img.shields.io/badge/typed-yes-brightgreen.svg" alt="Typed">
</p>

# Gurulearn

> **A unified AI/ML toolkit for deep learning, computer vision, audio processing, and conversational AI.**

Built with lazy loading for minimal import overhead (~0.001s). Production-ready with type hints.

---

## 📦 Installation

```bash
pip install gurulearn              # Core only
pip install gurulearn[vision]      # + PyTorch image classification
pip install gurulearn[audio]       # + TensorFlow audio recognition  
pip install gurulearn[agent]       # + LangChain RAG agent
pip install gurulearn[full]        # All features
```

---

## 🖼️ ImageClassifier

PyTorch-based image classification with 9 model architectures.

### Data Loading Options

```python
from gurulearn import ImageClassifier

clf = ImageClassifier()

# Option 1: From directory structure (data/train/{class_name}/*.jpg)
model, history = clf.train(train_dir="data/train", test_dir="data/test")

# Option 2: From CSV file
model, history = clf.train(
    csv_file="data.csv",
    img_column="image_path",
    label_column="class"
)
```

### Training Parameters

```python
model, history = clf.train(
    train_dir="data/train",
    epochs=20,
    batch_size=32,
    model_name="resnet50",     # See models below
    finetune=True,             # Finetune all layers
    learning_rate=0.001,
    use_amp=True,              # Mixed precision (GPU)
    save_path="model.pth"
)
```

### Available Models

| Model | Best For | Parameters |
|-------|----------|------------|
| `simple_cnn` | Small datasets (<1K) | 3M |
| `vgg16` | General purpose | 138M |
| `resnet50` | Large datasets | 25M |
| `mobilenet` | Mobile deployment | 3.5M |
| `inceptionv3` | Fine-grained | 23M |
| `densenet` | Feature reuse | 8M |
| `efficientnet` | Accuracy/size balance | 5M |
| `convnext` | Modern CNN | 28M |
| `vit` | Vision Transformer | 86M |

### Prediction

```python
# Load saved model
clf.load("model.pth", model_name="resnet50")

# Single image prediction
result = clf.predict("image.jpg", top_k=3)
print(result.class_name)       # "cat"
print(result.probability)      # 0.95
print(result.top_k)            # [("cat", 0.95), ("dog", 0.03), ...]

# From PIL Image
from PIL import Image
result = clf.predict(image=Image.open("image.jpg"))

# Export for production
clf.export_onnx("model.onnx")
```

---

## 🎵 AudioRecognition

TensorFlow/Keras CNN-LSTM for audio classification.

### Data Loading

```python
from gurulearn import AudioRecognition

audio = AudioRecognition(sample_rate=16000, n_mfcc=20)

# From directory structure (data/{class_name}/*.wav)
# Supports: .wav, .mp3, .flac, .ogg, .m4a
history = audio.audiotrain(
    data_path="data/audio",
    epochs=50,
    batch_size=32,
    augment=True,              # Time stretch, pitch shift, noise
    model_dir="models"
)
```

### Training Output

- `models/audio_recognition_model.keras` - Trained model
- `models/label_mapping.json` - Class labels
- `models/confusion_matrix.png` - Evaluation plot
- `models/training_history.png` - Loss/accuracy curves

### Prediction

```python
# Single file
result = audio.predict("sample.wav", model_dir="models")
print(result.label)            # "speech"
print(result.confidence)       # 0.92
print(result.all_probabilities)  # [0.92, 0.05, 0.03]

# Batch prediction
results = audio.predict_batch(
    ["file1.wav", "file2.wav"], 
    model_dir="models"
)
```

---

## 📊 MLModelAnalysis

AutoML for regression and classification with 10+ algorithms.

### Data Loading

```python
from gurulearn import MLModelAnalysis

ml = MLModelAnalysis(
    task_type="auto",              # "auto", "regression", "classification"
    auto_feature_engineering=True  # Extract date features
)

# From CSV
result = ml.train_and_evaluate(
    csv_file="data.csv",
    target_column="price",
    test_size=0.2,
    model_name=None,               # Auto-select best model
    save_path="model.joblib"
)
```

### Available Models

**Regression**: `linear_regression`, `decision_tree`, `random_forest`, `gradient_boosting`, `svm`, `knn`, `ada_boost`, `mlp`, `xgboost`*, `lightgbm`*

**Classification**: `logistic_regression`, `decision_tree`, `random_forest`, `gradient_boosting`, `svm`, `knn`, `ada_boost`, `mlp`, `xgboost`*, `lightgbm`*

*Optional dependencies

### Prediction

```python
# Load and predict
ml.load_model("model.joblib")

# From dictionary
prediction = ml.predict({"feature1": 42, "category": "A"})

# From DataFrame
predictions = ml.predict(test_df)

# Compare all models
comparison = ml.compare_models("data.csv", "target", cv=5)
```

---

## 💬 FlowBot

Guided conversation flows with real-time data filtering.

### Data Loading

```python
from gurulearn import FlowBot
import pandas as pd

# From DataFrame
bot = FlowBot(pd.read_csv("hotels.csv"), data_dir="user_sessions")

# From list of dicts
bot = FlowBot([
    {"city": "Paris", "price": "$$$", "name": "Le Grand"},
    {"city": "Tokyo", "price": "$$", "name": "Sakura Inn"}
])
```

### Building Flows

```python
# Add filter steps
bot.add("city", "Select destination:", required=True)
bot.add("price", "Choose budget:")

# Define output columns
bot.finish("name", "price")

# Validate flow
errors = bot.validate()
```

### Processing & Prediction

```python
# Process user input (maintains session state)
response = bot.process("user123", "Paris")

# Response structure
{
    "message": "Choose budget:",
    "suggestions": ["$$$", "$$"],
    "completed": False
}

# Final response
{
    "completed": True,
    "results": [{"name": "Le Grand", "price": "$$$"}],
    "message": "Found 1 matching options"
}

# Async support
response = await bot.aprocess("user123", "Paris")

# Export history
history_df = bot.export_history("user123", format="dataframe")
```

---

## 🤖 QAAgent

RAG-based question answering with LangChain + Ollama.

### Data Loading

```python
from gurulearn import QAAgent
import pandas as pd

# From DataFrame
agent = QAAgent(
    data=pd.read_csv("docs.csv"),
    page_content_fields=["title", "content"],
    metadata_fields=["category", "date"],
    llm_model="llama3.2",
    embedding_model="mxbai-embed-large",
    db_location="./vector_db"
)

# From list of dicts
agent = QAAgent(
    data=[{"title": "Policy", "content": "..."}],
    page_content_fields="content"
)

# Load existing index (no data needed)
agent = QAAgent(db_location="./existing_db")
```

### Querying

```python
# Simple query
answer = agent.query("What is the refund policy?")

# With source documents
result = agent.query("What is the refund policy?", return_sources=True)
print(result["answer"])
print(result["sources"])

# Direct similarity search (no LLM)
docs = agent.similarity_search("refund", k=5)

# Interactive mode
agent.interactive_mode()

# Add more documents
agent.add_documents(new_df, "content", ["category"])
```

---

## 🏥 CTScanProcessor

Medical image enhancement with quality metrics.

### Processing

```python
from gurulearn import CTScanProcessor

processor = CTScanProcessor(
    kernel_size=5,
    clip_limit=2.0,
    tile_grid_size=(8, 8)
)

# Single image - supports .jpg, .png, .dcm, .nii
result = processor.process_ct_scan(
    "scan.jpg",
    output_folder="output/",
    compare=True               # Save side-by-side comparison
)

# Batch processing
results = processor.process_batch(
    input_folder="scans/",
    output_folder="processed/"
)
```

### Quality Metrics

```python
# result.metrics contains:
print(result.metrics.mse)      # Mean Squared Error
print(result.metrics.psnr)     # Peak Signal-to-Noise Ratio (dB)
print(result.metrics.snr)      # Signal-to-Noise Ratio (dB)
print(result.metrics.detail_preservation)  # Percentage
```

### Individual Operations

```python
import numpy as np

# Apply individual filters
sharpened = processor.sharpen(image)
denoised = processor.median_denoise(image)
enhanced = processor.enhance_contrast(image)
bilateral = processor.bilateral_denoise(image)

# Compare quality
metrics = processor.evaluate_quality(original, processed)
```

---

## ⚡ Performance

- **Lazy Loading**: ~0.001s import time
- **GPU Auto-Detection**: CUDA for PyTorch/TensorFlow
- **Mixed Precision**: Automatic FP16 on compatible GPUs
- **Batch Processing**: All modules support batch inference

---

## 📄 License

MIT License - [Guru Dharsan T](https://github.com/guru-dharsan-git)
