Metadata-Version: 2.4
Name: dataguy
Version: 0.1.2
Summary: Tiny wrapper around pandas with built‑in logging & docs.
Project-URL: Homepage, https://github.com/magistak/llm-data
Project-URL: Documentation, https://dataguy.readthedocs.io
Author-email: István Magyary <magistak@gmail.com>, Sára Viemann <viemannsara@gmail.com>, Bálint Kristóf <balint.kristof.99@gmail.com>
License-File: LICENSE
Requires-Python: >=3.9
Requires-Dist: pandas>=2.0
Description-Content-Type: text/markdown

# DataGuy

**DataGuy** is a Python package designed to simplify data science workflows by leveraging the power of Large Language Models (LLMs). It provides tools for automated data wrangling, intelligent analysis, and AI-assisted visualization, making it ideal for small-to-medium datasets.

## Features

- **Automated Data Wrangling**: Clean and preprocess your data with minimal effort using LLM-generated code.
- **AI-Powered Data Visualization**: Generate insightful plots and visualizations based on natural language descriptions.
- **Intelligent Data Analysis**: Perform descriptive and inferential analysis with the help of LLMs.
- **Customizable Workflows**: Integrate with pandas, matplotlib, and other Python libraries for seamless data manipulation.
- **Safe Code Execution**: Built-in safeguards to ensure only safe and trusted code is executed.

## Installation

Install the package using pip:

```bash
pip install dataguy
```

## Usage

### Getting Started

1. **Import the Package**:
   ```python
   from dataguy import DataGuy
   ```

2. **Initialize a DataGuy Instance**:
   ```python
   dg = DataGuy()
   ```

3. **Load Your Data**:
   ```python
   import pandas as pd
   data = pd.DataFrame({"age": [25, 30, None], "score": [88, 92, 75]})
   dg.set_data(data)
   ```

4. **Summarize Your Data**:
   ```python
   summary = dg.summarize_data()
   print(summary)
   ```

5. **Wrangle Your Data**:
   ```python
   cleaned_data = dg.wrangle_data()
   ```

6. **Visualize Your Data**:
   ```python
   dg.plot_data("age", "score")
   ```

7. **Analyze Your Data**:
   ```python
   results = dg.analyze_data()
   print(results)
   ```

### Example Workflow

```python
from dataguy import DataGuy
import pandas as pd

# Initialize DataGuy
dg = DataGuy()

# Load data
data = pd.read_csv("path/to/data.csv")
dg.set_data(data)

# Summarize data
summary = dg.summarize_data()
print("Data Summary:", summary)

# Wrangle data
cleaned_data = dg.wrangle_data()

# Visualize data
dg.plot_data("column_x", "column_y")

# Analyze data
analysis_results = dg.analyze_data()
print("Analysis Results:", analysis_results)
```

## Key Methods

- **`set_data(obj)`**: Load data into the `DataGuy` instance. Supports pandas DataFrames, dictionaries, lists, numpy arrays, and CSV files.
- **`summarize_data()`**: Generate a summary of the dataset, including shape, columns, missing values, and means.
- **`wrangle_data()`**: Automatically clean and preprocess the dataset for analysis.
- **`plot_data(column_x, column_y)`**: Create a scatter plot of two columns using matplotlib.
- **`analyze_data()`**: Perform an automated analysis of the dataset, returning descriptive statistics and insights.

## Requirements

- Python 3.8 or higher
- Dependencies:
  - pandas
  - numpy
  - matplotlib
  - scikit-learn
  - claudette
  - anthropic

## Contributing

Contributions are welcome! Please submit issues or pull requests via the [GitHub repository](https://github.com/magistak/llm-data).

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## Authors

- István Magyary
- Sára Viemann
- Kristóf Bálint

For inquiries, contact: magistak@gmail.com