Metadata-Version: 2.3
Name: pyquery-polars
Version: 2.1.0
Summary: Enterprise-grade Headless ETL Engine with Interactive UI
Keywords: PyQuery,Polars,ETL,Big Data,Excel,Power BI,Automation,Analytics,Audit
Author: Shan
Author-email: Shan <tksudharshan@gmail.com>
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Information Technology
Classifier: Topic :: Office/Business :: Financial :: Spreadsheet
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: MacOS
Classifier: Natural Language :: English
Requires-Dist: polars>=1.0.0
Requires-Dist: streamlit>=1.30.0
Requires-Dist: fastapi>=0.109.0
Requires-Dist: uvicorn>=0.25.0
Requires-Dist: pydantic>=2.5.0
Requires-Dist: rich>=13.7.0
Requires-Dist: questionary>=2.0.0
Requires-Dist: xlsxwriter>=3.1.0
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: connectorx>=0.3.3
Requires-Dist: fastexcel>=0.16.0
Requires-Dist: python-multipart>=0.0.20
Requires-Dist: matplotlib>=3.9.4
Requires-Dist: seaborn>=0.13.2
Requires-Dist: plotly>=6.5.0
Requires-Dist: statsmodels>=0.14.6
Requires-Dist: scikit-learn>=1.6.1
Requires-Dist: scipy>=1.13.1
Requires-Python: >=3.9
Project-URL: Changelog, https://github.com/tks18/pyquery/releases
Project-URL: Documentation, https://github.com/tks18/pyquery#readme
Project-URL: Homepage, https://github.com/tks18/pyquery
Project-URL: Issues, https://github.com/tks18/pyquery/issues
Project-URL: Repository, https://github.com/tks18/pyquery
Description-Content-Type: text/markdown

<div align="center">

# ⚡ PyQuery: The Main Character of Data Stacks 💫

### _ETL. EDA. ML. Domination._

[![Status](https://img.shields.io/badge/Status-Goated_with_the_Sauce-%23FF0055?style=for-the-badge&logo=appveyor)](https://github.com/tks18/pyquery)
[![Performance](https://img.shields.io/badge/Performance-Infinite_Aura-%2300ffa3?style=for-the-badge&logo=nvidia)](https://pola.rs)
[![Engine](https://img.shields.io/badge/Engine-Polars_Supremacy_🐻‍❄️-%23ffcc00?style=for-the-badge&logo=polars)](https://pola.rs)
[![Vibe](https://img.shields.io/badge/Vibe-Immaculate_✨-%238A2BE2?style=for-the-badge)](https://github.com/tks18/pyquery)
[![Backend](https://img.shields.io/badge/Backend-FastAPI_🚀-%23009688?style=for-the-badge&logo=fastapi)](https://fastapi.tiangolo.com)
[![PyPI Version](https://img.shields.io/pypi/v/pyquery-polars.svg?color=4CAF50&logo=python&logoColor=white)](https://pypi.org/project/pyquery-polars/)
[![Python Versions](https://img.shields.io/pypi/pyversions/pyquery-polars.svg?color=blue)](https://pypi.org/project/pyquery-polars/)
[![License](https://img.shields.io/github/license/tks18/pyquery.svg?color=orange)](LICENSE)

**Stop letting Pandas hold you back. It's giving toxic ex.** 🚩<br>
**PyQuery** is the full-stack data operating system that processes **100GB files** while you doomscroll. No cap. 🧢

[Feature Request](https://github.com/tks18/pyquery/issues) · [Report Bug](https://github.com/tks18/pyquery/issues)

</div>

---

## ⛩️ The Awakening (Lore)

Long ago, the Data World was mid. Analysts lived in fear of the `MemoryError`. They bowed before the single-threaded tyranny of the Old Gods (Pandas). They accepted their fate of freezing screens, crashing kernels, and waiting 4 hours for a simple `groupby`.

**But I refused.**

From the depths of the Rusty abyss, **PyQuery** has awakened. I am not just an ETL tool anymore. I am the entire war room. I am here to **obliterate** your bottlenecks and **ratio** your old benchmarks.

While they study the blade (Excel), I studied the **Lazy Frame**.
While they manage memory, I **devour** it.
While they draw primitive charts, I **simulate the future**.

The age of waiting is over. **Total Domination** is the only metric that matters.

**Welcome to your Villain Arc.** 👹

---

## 💪 The Flex (Why We Are Him)

We built an empire so you can rule yours.

### 🎯 EDA: The Crystal Ball (New!)

> _"Most tools describe the past. PyQuery predicts the future."_

We dropped a massive update. EDA is no longer just "looking at data". It's **hunting**. Check the [**📘 EDA Field Manual**](EDA%20Guide.md) for the full grimoire.

#### 1. 🧠 The Action Engine (ML Strategist)

- **Strategic Brief**: A "Top 3 Insights" card that ranks every signal in your data. It whispers: _"The money is here."_
- **Automated Drivers**: It finds the hidden variables controlling your target.
  - _"Why is Churn high? It's not Price. It's Customer Support Wait Time > 5m."_ -> **Boom. Solved.**

#### 2. 🧪 ML Laboratory (The Brain)

- **Auto-Pilot Mode**: Trains an army of models (Random Forest, Lasso, Ridge) to find the best fit. You sit back and look busy.
- **Clustering (Unsupervised Rizz)**: Finds hidden segments. "Users who buy organic kale also buy crypto." -> **New Marketing Segment Unlocked.** 🔓
- **Anomaly Detection**: Uses Isolation Forests to catch the weirdos and fraudsters essentially instantly.

#### 3. 🎮 Decision Simulator (The Time Machine)

- **"What-If" Sliders**: Change variables in real-time. "If I raise Price by 10% and lower ad spend, do I still profit?"
- **Waterfall Analysis**: The Model breaks down exactly _why_ the prediction changed.
- **Live Digital Twin**: It creates a mathematical clone of your business logic.

#### 4. 📈 Visuals That Don't Miss

- **Sunburst Charts**: For hierarchy dominance.
- **3D Scatters**: Rotate your data in the 4th dimension.
- **Correlation Heatmaps**: Spot the links instantly.

---

### 🧪 SQL Lab: The Codex (God Mode)

For when the GUI is too easy and you want to flex your raw SQL skills. This isn't SQLite. This is **High-Performance Lazy SQL**.

- **Zero-Lag Querying**: Run `SELECT *` on a **50GB file**? It pulls a preview instantly. The engine effectively cheats physics.
- **Materialize**: Execute a complex query, then save it as a new dataset to continue the torture.
- **Schema Explorer**: Searchable view of all your columns. Know your enemy.

---

### 🧹 The Forge (Ruthless ETL)

Messy data is a weakness. We purge it with extreme prejudice.

- **✨ Auto-Typecast**: One click scans `String` columns and forcibly converts them to `Int`, `Float`, or `Date`. It uses regex heuristics to crush inconsistency.
- **🎭 PII Incinerator**: Detects and obfuscates credit cards and social security numbers. Secrets remain secret.
- **🩹 Smart Impute**: We fill the voids. Forward fill, backward fill, or specific value injection. No null survives.

---

## 🗺️ The Roadmap (Manifesting Destiny) 🔮

We aren't stopping here. We are aiming for the moon. 🚀

### Phase 1: Native App Supremacy (Rust + Tauri) 🦀

The browser has limits (RAM, Sandbox). The Native App will have **none**.

- **Hardware Acceleration**: GPU-accelerated plotting. 10 Million points at 144Hz.
- **Zero-Copy**: Access 100GB files directly from disk without loading _anything_.
- **Vibe**: Dark mode by default. OLED black.

### Phase 2: Big Data Devourer ☁️

- **Cloud Connectors**: S3, GCS, Azure. We drink their milkshakes.
- **Distributed Compute**: If one core isn't enough, we take them all.

---

## 🧾 The Receipts (Benchmarks)

We don't post without proof. We mog the competition.

| Metric            | 🐼 Pandas (Legacy)       | ⚡ PyQuery (Polars)     | The Diff       |
| :---------------- | :----------------------- | :---------------------- | :------------- |
| **Load 10GB CSV** | `MemoryError` (Crash) 💥 | **0.2s** (Lazy Scan) ⚡ | **Infinite**   |
| **Filter Rows**   | 15.4s (Slow)             | **0.5s** (Parallel)     | **30x Faster** |
| **Group By**      | 45s (Painful)            | **2.1s** (Instant)      | **20x Faster** |
| **RAM Usage**     | 12GB+ (Bloated)          | **500MB** (Lean)        | **95% Less**   |

_Benchmarks run on a standard dev laptop. Results may vary but the vibe remains consistent._

---

## 🧠 The Tech Stack (Forbidden Knowledge) 🐐

This isn't just a library. It's a weapon system.

### 1. 🌊 The "Infinite Stream" Glitch (Lazy Execution)

The Old Gods (Pandas) are **Eager**. They try to swallow the ocean (RAM) whole. They choke.
**PyQuery is Lazy.** It waits. It plans.

- **Scan**: "It's a 100GB file. Interesting."
- **Plan**: Filters, joins, math. Nothing executes until the final blow.
- **Stream**: Data flows in chunks. Process. Write. Destroy.
- **Result**: Processing 100GB on a MacBook Air. The laws of physics are optional.

### 2. 🛡️ Type Safety (Absolute Order)

Python is dynamic (chaotic). PyQuery imposes **Order**.

- Every step is backed by a **Pydantic Model**.
- If a `String` tries to infiltrate a `Float` column, it is terminated **before** execution.
- There are no runtime surprises. Only calculated victories.

---

## 🎮 Choose Your Fighter (4 Paths to Power)

We don't limit you. Dominate however you choose.

### 📦 Installation

```bash
pip install pyquery
```

### 1. 🌊 The GUI (God Mode)

For when you want to click things, see pretty charts, and feel like a data scientist in a sci-fi movie.

- **Visual Recipe Builder**: nodes and edges of pure logic.
- **The Action Engine**: AI-driven insights at your fingertips.
- **Native File Picker**: Accessing the local filesystem directly. No barriers.

```bash
pyquery ui
# Launches the Web App on localhost:8501 🚀
```

### 2. 💻 The Interactive CLI (Shadow Mode)

For when you operate in the dark. ☕
This isn't a command line. It's a cockpit.

- **Dynamic Menus**: Use arrow keys to select transforms.
- **Rich Tables**: Beautiful, colorful ASCII dataframes.
- **Validation**: It yells (politely) if you type a string for an integer.

```bash
pyquery interactive
# Enter the Matrix. 🕶️
```

### 3. 🤖 The API (Headless Beast)

Building a machine? Run PyQuery as the engine.

- **Swagger Docs**: Auto-generated at `/docs`.
- **RESTful**: `/load`, `/transform`, `/export` endpoints.
- **Async**: Fire and forget jobs.

```bash
pyquery api
# Serving high-performance ETL over HTTP at localhost:8000 📡
```

### 4. ⚡ The Batch Runner (Speedrun)

For automation. No interface. Just speed.

```bash
pyquery run -s input.csv -r recipe.json -o output.parquet
# Task complete. ⚡
```

---

## 🧰 The Loadout (Arsenal)

Packed with every tool needed to clear the map.

| Category      | The Tools                                 | Why it slaps                            |
| :------------ | :---------------------------------------- | :-------------------------------------- |
| **Cleaning**  | `Fill Nulls`, `Mask PII`, `Smart Extract` | Turns garbage data into gold. ✨        |
| **Analytics** | `Rolling Agg`, `Time Bin`, `Rank`, `Diff` | High-frequency trading vibes. 📈        |
| **Combining** | `Smart Join`, `Concat`, `Pivot`           | Merge datasets without the headache. 🤝 |
| **Math**      | `Log`, `Exp`, `Clip`, `Date Offset`       | For the scientific girlies. 👩‍🔬          |
| **Text**      | `Slice`, `Case`, `Replace`                | String manipulation on steroids. 💪     |
| **I/O**       | `CSV`, `Parquet`, `Excel`, `JSON`, `IPC`  | Speaks every language. 🗣️               |

---

## 🧑‍💻 Join the Cult (Developer Guide)

You want to contribute? Good. We need strong allies.

### The Blooding (Adding a Transform) 🖐️

#### Backend Implementation

1.  **Define Params**: Create a Pydantic model (`src/pyquery_polars/core/params.py`).
2.  **Backend Logic**: Write a pure polars function (`src/pyquery_polars/backend/transforms/`).
3.  **Register**: Add your step to `register_all_steps()` in `src/pyquery_polars/backend/engine/registry.py`.

#### Frontend Implementation

1. **Frontend Renderer**: Create a Renderer Function (`src/pyquery_polars/frontend/steps/`).
2. **Register**: Add your step to `register_frontend()` in `src/pyquery_polars/frontend/registry_init.py`.

It appears in the CLI, API, and UI **automatically**. 🤯

```python
# Only certified ballers contribute code.
# Are you up for it?
```

---

## 📜 License

**GPL-3.0**. Open source forever. 💖

---

<div align="center">

_Made with ☕, 🦀 (Rust), and 💖 by [Sudharshan TK](https://github.com/tks18)_

</div>
