Metadata-Version: 2.4
Name: schemai
Version: 0.1.0
Summary: Infer Pydantic schemas from JSON and CSV files automatically
Home-page: https://github.com/yourusername/schemai
Author: Your Name
Author-email: Your Name <your.email@example.com>
License: MIT
Project-URL: Homepage, https://github.com/yourusername/schemai
Project-URL: Bug Tracker, https://github.com/yourusername/schemai/issues
Project-URL: Documentation, https://github.com/yourusername/schemai#readme
Project-URL: Source Code, https://github.com/yourusername/schemai
Keywords: pydantic,schema,json,csv,inference,data-engineering
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0
Requires-Dist: click>=8.0
Requires-Dist: pandas>=1.3
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: isort>=5.0; extra == "dev"
Requires-Dist: ruff>=0.1; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# Schemai - Schema Inference for Pydantic

Automatically infer **production-ready Pydantic models** from JSON and CSV files with a single command.

`schemai` solves the repetitive task that every data engineer faces: converting raw JSON and CSV data into validated, typed Python models.

## Features

- **One-Command Schema Inference**: Generate Pydantic models from any JSON or CSV file
- **Type Inference**: Automatically detects integers, floats, strings, booleans, lists, and nested objects
- **Production-Ready**: Generates clean, well-structured Pydantic v2 models
- **CLI + Library**: Use as a command-line tool or import as a Python library
- **Batch Processing**: Process multiple files at once
- **Code Generation**: Export inferred schemas as executable Python code
- **Flexible Output**: Generate models, JSON schema, or Python code

## Installation

Install from PyPI (coming soon) or from source:

```bash
# From PyPI (once published)
pip install schemai

# From source
git clone https://github.com/yourusername/schemai.git
cd schemai
pip install -e .
```

## Quick Start

### CLI Usage

#### Infer schema from JSON file

```bash
schemai infer data.json -n User
```

**Output:**
```python
from pydantic import BaseModel
from typing import Optional

class User(BaseModel):
    name: Optional[str] = None
    age: Optional[int] = None
    email: Optional[str] = None
    is_active: Optional[bool] = None
```

#### Infer schema from CSV file

```bash
schemai infer customers.csv -n Customer
```

#### Save output to file

```bash
schemai infer data.json -n Product -o product_schema.py
```

#### Process multiple files

```bash
schemai batch *.json -o schemas/
```

#### Get file information

```bash
schemai info data.json
```

### Library Usage

```python
from schemai import SchemaInferencer

# Create inferencer
inferencer = SchemaInferencer()

# Infer from JSON
model = inferencer.infer_from_json("data.json", model_name="User")

# Infer from CSV
model = inferencer.infer_from_csv("customers.csv", model_name="Customer")

# Generate code
code = inferencer.to_code(model, class_name="User")
print(code)

# Use the model
user = model(name="Alice", age=30, email="alice@example.com")
print(user.model_dump_json())
```

## Examples

### Example 1: JSON Data

**Input file (users.json):**
```json
{
  "name": "John Doe",
  "age": 30,
  "email": "john@example.com",
  "is_active": true,
  "tags": ["admin", "user"]
}
```

**Command:**
```bash
schemai infer users.json -n User
```

**Generated Model:**
```python
from pydantic import BaseModel
from typing import Optional, List

class User(BaseModel):
    name: Optional[str] = None
    age: Optional[int] = None
    email: Optional[str] = None
    is_active: Optional[bool] = None
    tags: Optional[List[str]] = None
```

### Example 2: CSV Data

**Input file (products.csv):**
```csv
product_id,name,price,in_stock
1,Widget,19.99,true
2,Gadget,29.99,false
3,Tool,9.99,true
```

**Command:**
```bash
schemai infer products.csv -n Product
```

**Generated Model:**
```python
from pydantic import BaseModel
from typing import Optional

class Product(BaseModel):
    product_id: Optional[int] = None
    name: Optional[str] = None
    price: Optional[float] = None
    in_stock: Optional[bool] = None
```

## Command Reference

### `schemai infer`

Infer a schema from a single file.

```bash
schemai infer FILE [OPTIONS]
```

**Options:**
- `-n, --name TEXT`: Name for the generated model class (default: GeneratedModel)
- `-o, --output PATH`: Output file path (if not provided, prints to stdout)
- `-f, --format [model|code]`: Output format (default: code)
- `--strict`: Enable strict type checking

### `schemai batch`

Process multiple files and generate schemas.

```bash
schemai batch FILES... [OPTIONS]
```

**Options:**
- `-o, --output PATH`: Output directory for generated models

### `schemai info`

Display information about a file's inferred schema.

```bash
schemai info FILE [OPTIONS]
```

**Options:**
- `--sample-rows INTEGER`: Number of rows to sample from CSV

## Supported File Formats

- **JSON**: Objects and arrays of objects
- **CSV**: Comma-separated values with header row

## Type Mapping

`schemai` infers the following Python types:

| JSON/CSV Type | Python Type |
|---------------|-------------|
| `null`        | `Optional[str]` |
| `true/false`  | `bool`      |
| `123`         | `int`       |
| `123.45`      | `float`     |
| `"text"`      | `str`       |
| `[1, 2, 3]`   | `List[int]` |
| `{...}`       | `dict`      |

## Advanced Usage

### Custom Type Handling

```python
from schemai import SchemaInferencer

inferencer = SchemaInferencer(strict=True)
model = inferencer.infer_from_json("data.json")
```

### Generate Code Without Using CLI

```python
from schemai import SchemaInferencer

inferencer = SchemaInferencer()
model = inferencer.infer_from_json("users.json", model_name="User")

# Get Python code
code = inferencer.to_code(model, class_name="User")

# Save to file
with open("user_schema.py", "w") as f:
    f.write(code)
```

## Development

### Setup Development Environment

```bash
# Clone repository
git clone https://github.com/yourusername/schemai.git
cd schemai

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode with dev dependencies
pip install -e ".[dev]"
```

### Running Tests

```bash
pytest
pytest --cov=schemai  # With coverage report
```

### Code Quality

```bash
# Format code
black schemai/

# Sort imports
isort schemai/

# Lint
ruff check schemai/

# Type checking
mypy schemai/
```

## Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Run tests and linting
5. Commit your changes (`git commit -m 'Add amazing feature'`)
6. Push to the branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request

## License

This project is licensed under the MIT License - see [LICENSE](LICENSE) file for details.

## Why Schemai?

Every data engineer spends hours manually creating Pydantic models from data files. `schemai` automates this tedious process, letting you focus on data transformation and analysis instead of boilerplate model definition.

**Use cases:**
- Data pipeline setup for new data sources
- API request/response modeling
- Data validation frameworks
- Machine learning data preprocessing
- Rapid prototyping with new datasets

## Roadmap

- [ ] PostgreSQL table schema inference
- [ ] Parquet file support
- [ ] JSON Schema inference
- [ ] Model inheritance and composition
- [ ] Custom validation rules
- [ ] Type refinement with examples
- [ ] Web UI for schema exploration
- [ ] Integration with popular data tools (Airflow, dbt, etc.)

## Support

- 📖 [Documentation](https://github.com/yourusername/schemai#readme)
- 🐛 [Issue Tracker](https://github.com/yourusername/schemai/issues)
- 💬 [Discussions](https://github.com/yourusername/schemai/discussions)

---

Built with ❤️ for data engineers by data engineers.
