Metadata-Version: 2.4
Name: csvalchemy
Version: 0.1.0
Summary: Read, validate, and write CSV files using Pydantic models with dydactic.
License: MIT License
Keywords: csv,validation,orm,model
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pydantic>=2.9.2
Requires-Dist: dydactic>=0.2.0
Requires-Dist: python-dateutil>=2.8.0
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: mypy>=1.5.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"

# csvalchemy

A Python package for reading and writing CSV files using Pydantic models.

## Overview

csvalchemy provides a clean interface for validating CSV data against Pydantic models,
handling errors gracefully, and writing validated results back to CSV files. It integrates
with [dydactic](https://github.com/eddiethedean/dydactic) for robust validation of data records.

## Features

- **CSV Reading**: Read CSV files and validate each row against Pydantic models
- **Error Handling**: Continue processing even when individual rows fail validation
- **Type Safety**: Full type hints and validation using Pydantic
- **CSV Writing**: Write validated results back to CSV files
- **Integration**: Built on dydactic for reliable validation

## Dependencies

- **Python**: 3.10 or higher
- **pydantic**: >=2.9.2 (Data validation using Python type annotations)
- **dydactic**: >=0.2.0 (Validation engine - requires Python 3.10+)
- **python-dateutil**: >=2.8.0 (DateTime parsing)

## Installation

```bash
pip install csvalchemy
```

## Quick Start

```python
from pydantic import BaseModel
from csvalchemy import read
from io import StringIO

# Define your model
class Person(BaseModel):
    name: str
    age: int
    email: str | None = None

# Sample CSV content
csv_content = """name,age,email
Alice,30,alice@example.com
Bob,25,bob@example.com
Charlie,35,charlie@example.com
"""

# Read and validate CSV
with StringIO(csv_content) as f:
    for result in read(f, Person):
        if result.error:
            print(f"Validation error: {result.error}")
        else:
            print(f"Valid person: {result.result.name}, age {result.result.age}")
```

**Output:**
```
Valid person: Alice, age 30
Valid person: Bob, age 25
Valid person: Charlie, age 35
```

## Examples

### Error Handling

csvalchemy continues processing even when individual rows fail validation:

```python
from pydantic import BaseModel
from csvalchemy import read
from io import StringIO

class Person(BaseModel):
    name: str
    age: int
    email: str | None = None

# CSV with some invalid rows
csv_content = """name,age,email
Alice,30,alice@example.com
Bob,not_a_number,bob@example.com
Charlie,35,charlie@example.com
Diana,not_a_number,diana@example.com
"""

with StringIO(csv_content) as f:
    valid_count = 0
    error_count = 0
    
    for result in read(f, Person):
        if result.error:
            error_count += 1
            print(f"Error on row {error_count}: {result.error}")
        else:
            valid_count += 1
            print(f"Valid: {result.result.name}")
    
    print(f"\nSummary: {valid_count} valid, {error_count} errors")
```

**Output:**
```
Valid: Alice
Error on row 1: 1 validation error for Person
age
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='not_a_number', input_type=str]
    For further information visit https://errors.pydantic.dev/2.12/v/int_parsing
Valid: Charlie
Error on row 2: 1 validation error for Person
age
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='not_a_number', input_type=str]
    For further information visit https://errors.pydantic.dev/2.12/v/int_parsing

Summary: 2 valid, 2 errors
```

### Writing Validated CSV

Write only validated results back to CSV:

```python
from pydantic import BaseModel
from csvalchemy import read
from io import StringIO

class Product(BaseModel):
    id: int
    name: str
    price: float
    in_stock: bool

# Input CSV
input_csv = """id,name,price,in_stock
1,Widget,19.99,True
2,Gadget,29.99,False
3,Invalid,not_a_number,True
4,Thing,39.99,True
"""

# Read and validate
input_file = StringIO(input_csv)
validator = read(input_file, Product)

# Write validated results to new CSV
output_file = StringIO()

# Recreate validator since iterator was consumed
input_file2 = StringIO(input_csv)
validator2 = read(input_file2, Product)
writer = validator2.csv_writer(output_file)

# Consume writer to trigger CSV writing
for result in writer:
    if result.error:
        print(f"Skipped invalid row: {result.error}")
    else:
        print(f"Wrote: {result.result.name}")

# Show output CSV
output_file.seek(0)
print("\n=== Output CSV ===")
print(output_file.read())
```

**Output:**
```
Wrote: Widget
Wrote: Gadget
Skipped invalid row: 1 validation error for Product
price
  Input should be a valid number, unable to parse string as a number [type=float_parsing, input_value='not_a_number', input_type=str]
    For further information visit https://errors.pydantic.dev/2.12/v/float_parsing
Wrote: Thing

=== Output CSV ===
id,name,price,in_stock
1,Widget,19.99,True
2,Gadget,29.99,False
4,Thing,39.99,True
```

### Using Validator Directly

Validate data not from CSV files:

```python
from pydantic import BaseModel
from csvalchemy import Validator
import dydactic.options

class Person(BaseModel):
    name: str
    age: int
    email: str | None = None

# Data not from CSV
records = [
    {"name": "Alice", "age": "30", "email": "alice@example.com"},
    {"name": "Bob", "age": "not_a_number", "email": "bob@example.com"},
    {"name": "Charlie", "age": "35"},
]

# Standard validation
print("=== Using Validator directly ===")
validator = Validator(iter(records), Person)

for result in validator:
    if result.error:
        print(f"Error: {result.error}")
    else:
        print(f"Valid: {result.result.name}, age {result.result.age}")

# Skip invalid records
print("\n=== Using SKIP error option ===")
validator_skip = Validator(
    iter(records),
    Person,
    error_option=dydactic.options.ErrorOption.SKIP
)

valid_results = list(validator_skip)
print(f"Got {len(valid_results)} valid results (invalid ones skipped)")
```

**Output:**
```
=== Using Validator directly ===
Valid: Alice, age 30
Error: 1 validation error for Person
age
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='not_a_number', input_type=str]
    For further information visit https://errors.pydantic.dev/2.12/v/int_parsing
Valid: Charlie, age 35

=== Using SKIP error option ===
Got 2 valid results (invalid ones skipped)
```

## Integration with dydactic

csvalchemy uses [dydactic](https://github.com/eddiethedean/dydactic) as its core validation
engine. The `Validator` and `ValidatorIterator` classes wrap `dydactic.validate()` to provide
a consistent API for CSV data validation.

### How it works

1. **CSV Reading**: `read()` creates a `CSVReaderValidator` that reads CSV rows using Python's `csv.DictReader`
2. **Validation**: Each row is validated using `dydactic.validate()`, which handles Pydantic model validation
3. **Error Handling**: Validation errors are captured without stopping the iteration
4. **Result Mapping**: dydactic's result objects are mapped to csvalchemy's `Result` type for consistent API

### Benefits

- Leverages dydactic's robust validation handling
- Independent validation of each record (errors don't stop processing)
- Type-safe error handling with clear error messages
- Compatible with dydactic's validation strategies
- Configurable error handling (RETURN, RAISE, or SKIP)
- Support for strict validation and attribute-based validation

### Configuration Options

The `Validator` class supports dydactic's configuration options:

- **error_option**: Control how validation errors are handled:
  - `RETURN` (default): Errors are returned in `Result.error`
  - `RAISE`: Exceptions are raised immediately on validation errors
  - `SKIP`: Records with errors are skipped entirely
- **strict**: Enable strict Pydantic validation
- **from_attributes**: Validate from object attributes

Example:

```python
from pydantic import BaseModel
from csvalchemy import Validator
import dydactic.options

class Person(BaseModel):
    name: str
    age: int

records = [
    {"name": "Alice", "age": "30"},
    {"name": "Bob", "age": "invalid"},
    {"name": "Charlie", "age": "35"},
]

# Default: RETURN errors
validator_return = Validator(iter(records), Person)
results_return = list(validator_return)
print(f"RETURN mode: {len(results_return)} results (including errors)")

# SKIP invalid records
validator_skip = Validator(
    iter(records),
    Person,
    error_option=dydactic.options.ErrorOption.SKIP
)
results_skip = list(validator_skip)
print(f"SKIP mode: {len(results_skip)} results (errors skipped)")
```

**Output:**
```
RETURN mode: 3 results (including errors)
SKIP mode: 2 results (errors skipped)
```

## Architecture Notes

### Casting and Validation

csvalchemy provides two approaches to validation:

1. **Full Validation (Recommended)**: Use `Validator` or `read()` which leverage dydactic's complete validation pipeline including dydactic's casting functionality. This is the primary and recommended approach for CSV validation.

2. **Standalone Casting**: The `cast.py` module provides casting utilities similar to `dydactic.cast`. This module is kept for:
   - Standalone use cases that don't require full dydactic validation
   - Direct class instantiation without Pydantic models
   - Testing scenarios

Note: The main validation flow uses dydactic's casting internally, so `cast.py` is not used in the primary validation pipeline.

## Requirements

- Python 3.10+ (required by dydactic)
- See `pyproject.toml` for complete dependency list
