Metadata-Version: 2.4
Name: superstore
Version: 0.3.2
Summary: Data generation
Project-URL: Repository, https://github.com/1kbgz/superstore
Project-URL: Homepage, https://github.com/1kbgz/superstore
Author-email: the superstore authors <dev@1kbgz.com>
License: Apache-2.0
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Programming Language :: Rust
Requires-Python: >=3.10
Requires-Dist: pydantic>=2.0
Provides-Extra: develop
Requires-Dist: asv>=0.6; extra == 'develop'
Requires-Dist: build; extra == 'develop'
Requires-Dist: bump-my-version; extra == 'develop'
Requires-Dist: check-dist; extra == 'develop'
Requires-Dist: cibuildwheel; extra == 'develop'
Requires-Dist: codespell; extra == 'develop'
Requires-Dist: hatch-rs; extra == 'develop'
Requires-Dist: hatchling; extra == 'develop'
Requires-Dist: mdformat; extra == 'develop'
Requires-Dist: mdformat-tables>=1; extra == 'develop'
Requires-Dist: mypy>=1.0; extra == 'develop'
Requires-Dist: pandas; extra == 'develop'
Requires-Dist: pandas-stubs; extra == 'develop'
Requires-Dist: polars; extra == 'develop'
Requires-Dist: pytest; extra == 'develop'
Requires-Dist: pytest-cov; extra == 'develop'
Requires-Dist: ruff; extra == 'develop'
Requires-Dist: twine; extra == 'develop'
Requires-Dist: ty; extra == 'develop'
Requires-Dist: uv; extra == 'develop'
Requires-Dist: virtualenv; extra == 'develop'
Requires-Dist: wheel; extra == 'develop'
Description-Content-Type: text/markdown

# superstore

High-performance synthetic data generation library for testing and development.

[![Build Status](https://github.com/1kbgz/superstore/actions/workflows/build.yaml/badge.svg?branch=main&event=push)](https://github.com/1kbgz/superstore/actions/workflows/build.yaml)
[![codecov](https://codecov.io/gh/1kbgz/superstore/branch/main/graph/badge.svg)](https://codecov.io/gh/1kbgz/superstore)
[![License](https://img.shields.io/github/license/1kbgz/superstore)](https://github.com/1kbgz/superstore)
[![PyPI](https://img.shields.io/pypi/v/superstore.svg)](https://pypi.python.org/pypi/superstore)

## Overview

superstore is a Rust-powered Python library for generating realistic synthetic datasets. It provides:

### Data Generators

| Generator                                 | Description                                | Use Cases                        |
| ----------------------------------------- | ------------------------------------------ | -------------------------------- |
| **[Retail](docs/src/retail.md)**          | Sales transactions, employees              | BI dashboards, forecasting       |
| **[Time Series](docs/src/timeseries.md)** | Financial-style series with regimes, jumps | Quant research, backtesting      |
| **[Weather](docs/src/weather.md)**        | Sensor data with seasonal/diurnal patterns | IoT analytics, anomaly detection |
| **[Logs](docs/src/logs.md)**              | Web server & application logs              | Observability, alerting          |
| **[Finance](docs/src/finance.md)**        | Stock prices, OHLCV, options chains        | Trading systems, risk analysis   |
| **[Telemetry](docs/src/telemetry.md)**    | Machine metrics, anomalies, failures       | DevOps dashboards, ML training   |

### Statistical Tools

| Tool                                           | Description                           | Use Cases                         |
| ---------------------------------------------- | ------------------------------------- | --------------------------------- |
| **[Distributions](docs/src/distributions.md)** | Sample from statistical distributions | Simulation, Monte Carlo           |
| **[Copulas](docs/src/copulas.md)**             | Correlated multivariate data          | Risk modeling, portfolio analysis |
| **[Temporal Models](docs/src/temporal.md)**    | AR, Markov chains, random walks       | Time series simulation            |

### Key Features

- **Rust-powered**: High-performance generation, 10-100x faster than pure Python
- **Flexible output**: pandas DataFrame, polars DataFrame, or Python dicts
- **Configurable**: Pydantic config classes for validated, structured configuration
- **Reproducible**: Seed support for deterministic generation
- **Scalable**: Streaming and parallel generation for large datasets

## Installation

```bash
pip install superstore
```

For development with polars support:

```bash
pip install superstore[develop]
```

## Quick Start

```python
from superstore import superstore, employees, timeseries, weather

# Generate 1000 retail records as a pandas DataFrame
df = superstore(count=1000)

# Generate as polars DataFrame
df_polars = superstore(count=1000, output="polars")

# Generate as list of dicts
records = superstore(count=1000, output="dict")
```

## Reproducibility with Seeds

All data generators support an optional `seed` parameter for reproducible random data generation:

```python
from superstore import superstore, employees, getTimeSeries, machines

# Same seed produces identical data
df1 = superstore(count=100, seed=42)
df2 = superstore(count=100, seed=42)
assert df1.equals(df2)  # True

# Works with all generators
employees_df = employees(count=50, seed=123)
timeseries_df = timeseries(nper=30, seed=456)
weather_df = weather(count=100, seed=789)
machine_list = machines(count=10, seed=321)

# No seed means random data each time
df3 = superstore(count=100)  # Different each call
```

## Development

### Setup

```bash
# Clone the repository
git clone https://github.com/1kbgz/superstore.git
cd superstore

# Install development dependencies
make develop
```

### Building

```bash
# Build Python wheel
make build
```

### Testing

```bash
# Run all tests
make test
```

### Linting

```bash
# Run linters
make lint

# Fix formatting
make fix
```

## Architecture

superstore uses a hybrid Rust/Python architecture:

- **rust/**: Core Rust library with all data generation logic
- **src/**: PyO3 bindings exposing Rust functions to Python
- **superstore/**: Python package with native module

The core data generation is implemented in Rust for performance, with PyO3 providing seamless Python integration. Output format conversion (pandas/polars/dict) happens in the Rust bindings layer.

## License

This library is released under the [Apache 2.0 license](./LICENSE)

> [!NOTE]
> This library was generated using [copier](https://copier.readthedocs.io/en/stable/) from the [Base Python Project Template repository](https://github.com/python-project-templates/base).
