Metadata-Version: 2.3
Name: pldatacli
Version: 0.1.3
Summary: A data analysis cli tool using polars lazyframes
Author: utkarshg1
Author-email: utkarshg1 <gaikwadujg@gmail.com>
Requires-Dist: polars>=1.39.0
Requires-Dist: rich>=14.3.3
Requires-Dist: typer>=0.24.1
Requires-Python: >=3.12
Description-Content-Type: text/markdown

# pldatacli

A simple command-line tool for quick CSV data analysis using Polars, with lazy execution for efficiency.

---

# Tech Stack

* **Polars** – fast DataFrame engine with lazy execution for efficient data processing
* **Typer** – modern CLI framework for building command-line interfaces
* **Rich** – beautiful terminal rendering for clean table output

---

# PyPi Repository

Check the Repository on PyPI - [https://pypi.org/project/pldatacli/](https://pypi.org/project/pldatacli/)

---

# Installation

* Option 1: Directly with pip
```bash
pip install pldatacli
```
* Option2 : with uv package manager (Requires uv to be installed)
```bash
uv tool install pldatacli
```

# Usage

### Basic query command

```bash
pldatacli query FILE [OPTIONS]
```

Example file:

```bash
SampleSuperstore.csv
```

---

### Filter rows

Single filter:

```bash
pldatacli query SampleSuperstore.csv \
  --filter "State=Texas"
```

Multiple filters:

```bash
pldatacli query SampleSuperstore.csv \
  --filter "State=Texas" \
  --filter "Category=Furniture"
```

---

### Group by columns

Single column:

```bash
pldatacli query SampleSuperstore.csv \
  --groupby Region
```

Multiple columns:

```bash
pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --groupby Category
```

---

### Aggregations

Single aggregation:

```bash
pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit=sum"
```

Multiple aggregations:

```bash
pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit=sum,mean"
```

Multiple columns with aggregations:

```bash
pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --groupby Category \
  --agg "Sales=sum,mean" \
  --agg "Profit=sum"
```

---

### Sorting

Single sort:

```bash
pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit=sum" \
  --sort "Profit_sum:desc"
```

Multiple sorts:

```bash
pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit=sum" \
  --sort "Region:asc" \
  --sort "Profit_sum:desc"
```

---

### Rounding results

Round float columns to 2 digits:

```bash
pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit=mean" \
  --round 2
```

Custom rounding:

```bash
pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit=mean" \
  --round 4
```

---

### Limiting rows

Head:

```bash
pldatacli query SampleSuperstore.csv \
  --head 5
```

Tail:

```bash
pldatacli query SampleSuperstore.csv \
  --tail 10
```

---

### Full query example

```bash
pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --groupby Category \
  --agg "Profit=sum,mean" \
  --sort "Profit_sum:desc" \
  --head 5 \
  --round 2
```

---

### Schema inspection

Get columns, dtypes, and null counts without processing the full dataset:

```bash
pldatacli schema SampleSuperstore.csv
```

Example output:

```text
LazyFrame Schema
┏━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┓
┃ Column       ┃ Dtype   ┃ Nulls ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━┩
│ Ship Mode    │ String  │     0 │
│ Segment      │ String  │     0 │
│ Country      │ String  │     0 │
│ City         │ String  │     0 │
│ State        │ String  │     0 │
│ Postal Code  │ Int64   │     0 │
│ Region       │ String  │     0 │
│ Category     │ String  │     0 │
│ Sub-Category │ String  │     0 │
│ Sales        │ Float64 │     0 │
│ Quantity     │ Int64   │     0 │
│ Discount     │ Float64 │     0 │
│ Profit       │ Float64 │     0 │
└──────────────┴─────────┴───────┘
Rows: 9994, Columns: 13
```

> ⚡ Tip: Use `schema` before running queries to quickly inspect columns, types, and missing values.

---
