Metadata-Version: 2.1
Name: datapasta
Version: 0.1.1
Summary: Paste data as Python DataFrame definitions
Author-Email: Louis Maddox <louismmx@gmail.com>
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python
Project-URL: Bug Tracker, https://github.com/lmmx/datapasta/issues
Project-URL: Homepage, https://github.com/lmmx/datapasta
Requires-Python: >=3.10
Requires-Dist: pyperclip
Provides-Extra: dev
Requires-Dist: pdm-bump>=0.9.10; extra == "dev"
Requires-Dist: pdm>=2.22.3; extra == "dev"
Requires-Dist: pre-commit>=4.1.0; extra == "dev"
Requires-Dist: pytest>=8.3.4; extra == "dev"
Provides-Extra: docs
Requires-Dist: livereload>=2.7.1; extra == "docs"
Requires-Dist: mkdocs-extra-sass-mathshim>=0.1.0; extra == "docs"
Requires-Dist: mkdocs-material[imaging,recommended]>=9.5.2; extra == "docs"
Requires-Dist: mkdocs-section-index>=0.3.8; extra == "docs"
Requires-Dist: mkdocs>=1.5.3; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == "docs"
Requires-Dist: ruff>=0.9.5; extra == "docs"
Requires-Dist: urllib3<2; extra == "docs"
Provides-Extra: polars
Requires-Dist: polars>=1.21.0; extra == "polars"
Provides-Extra: polars-lts-cpu
Requires-Dist: polars-lts-cpu>=1.21.0; extra == "polars-lts-cpu"
Provides-Extra: pandas
Requires-Dist: pandas>=2.2.3; extra == "pandas"
Description-Content-Type: text/markdown

# datapasta

[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)
[![pdm-managed](https://img.shields.io/badge/pdm-managed-blueviolet)](https://pdm.fming.dev)
[![PyPI](https://img.shields.io/pypi/v/datapasta.svg)](https://pypi.org/project/datapasta)
[![Supported Python versions](https://img.shields.io/pypi/pyversions/datapasta.svg)](https://pypi.org/project/datapasta)
[![License](https://img.shields.io/pypi/l/datapasta.svg)](https://pypi.python.org/pypi/datapasta)
[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/lmmx/octopolars/master.svg)](https://results.pre-commit.ci/latest/github/lmmx/octopolars/master)

A Python package inspired by the R `datapasta` package for pasting tabular data into DataFrame code. datapasta analyzes clipboard content or text input and generates Python code to recreate the data as a pandas or polars DataFrame.

## Features

- Automatic detection of delimiters (comma, tab, pipe, semicolon, etc.)
- Smart header detection
- Type inference for columns (int, float, boolean, datetime, string)
- Generates code for both pandas and polars DataFrames
- Command-line interface for easy integration with text editors
- Simple API for programmatic use

## Installation

```bash
# Install with pip
pip install datapasta

# With Pandas support
pip install datapasta[pandas]

# With Polars support
pip install datapasta[polars]

# For Polars on older CPUs
pip install datapasta[polars-lts-cpu]
```

> The `polars` dependency is not included in the package by default.
> It is shipped as an optional extra which can be activated by passing it in square brackets.

## Usage

### Command Line

```bash
# Read from clipboard, generate pandas code
datapasta > dataframe_code.py

# Read from clipboard, generate polars code
datapasta --polars > dataframe_code.py

# Read from file instead of clipboard
datapasta --file data.csv > dataframe_code.py

# Specify a separator (otherwise auto-detected)
datapasta --sep "," > dataframe_code.py
```

### Python API

```python
import datapasta

# Read from clipboard and get pandas code
pandas_code = datapasta.clipboard_to_pandas()
print(pandas_code)

# Read from clipboard and get polars code
polars_code = datapasta.clipboard_to_polars()
print(polars_code)

# Convert text directly to DataFrame code
csv_text = """name,age,city
Alice,25,New York
Bob,30,San Francisco
Charlie,35,Seattle"""

pandas_code = datapasta.text_to_pandas(csv_text)
print(pandas_code)
```

## Controlling Header Detection

datapasta attempts to automatically detect whether your data has a header row, but you can override this behavior when needed:

### Command Line

```bash
# Auto-detect headers (default behavior)
datapasta --file data.csv

# Force using the first row as a header
datapasta --file data.csv --header yes

# Force no header (generate column names like col1, col2, etc.)
datapasta --file data.csv --header no
```

### Python API

```python
import datapasta

# Auto-detect headers (default)
code = datapasta.text_to_pandas(text)

# Force using the first row as a header
code = datapasta.text_to_pandas(text, has_header=True)

# Force no header
code = datapasta.text_to_pandas(text, has_header=False)
```

This is particularly useful when:
- The auto-detection logic misidentifies numeric headers as data
- You want to preserve the first row as data but datapasta treats it as a header
- You need consistent column names (col1, col2, etc.) for multiple similar datasets

## Examples

### From a CSV in the clipboard
```
name,age,city
Alice,25,New York
Bob,30,San Francisco
Charlie,35,Seattle
```

datapasta will generate:

```python
import pandas as pd

df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "city": ["New York", "San Francisco", "Seattle"],
})
```

### From a TSV in the clipboard
```
name	age	city
Alice	25	New York
Bob	30	San Francisco
Charlie	35	Seattle
```

datapasta will generate similar code, automatically detecting the tab delimiter.

### Using in a Jupyter notebook

```python
import datapasta

# Assuming you've copied data to clipboard
code = datapasta.clipboard_to_pandas()
print("Generated code:")
print(code)

# Execute the code to create the DataFrame
exec(code)
# Now 'df' contains your DataFrame
display(df)
```

## How It Works

datapasta works by:

1. Reading text from the clipboard or a file
2. Intelligently guessing the delimiter/separator
3. Detecting if there's a header row
4. Inferring column types (int, float, boolean, datetime, string)
5. Generating code to create a pandas or polars DataFrame

## Project Structure

- `clipboard.py`: Functions for reading from the system clipboard
- `parser.py`: Functions for parsing text data, detecting delimiters, and headers
- `type_inference.py`: Functions for inferring column data types
- `formatter.py`: Functions for generating pandas and polars code
- `main.py`: Main entry points and CLI functionality

## Contributing

Contributions welcome!

1. **Issues & Discussions**: Please open a GitHub issue or discussion for bugs, feature requests, or questions.
2. **Pull Requests**: PRs are welcome!
   - Install the dev extra with `pip install -e ".[dev]"`
   - Run tests with `pytest`
   - Include updates to docs or examples if relevant

## Requirements

- Python 3.10+
- pyperclip (for clipboard access)

## License

This project is licensed under the [MIT License](https://opensource.org/licenses/MIT).

## Credits

Inspired by the R package [datapasta](https://github.com/MilesMcBain/datapasta) by Miles McBain.
