Metadata-Version: 2.4
Name: serpen
Version: 0.1.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Classifier: Topic :: Software Development :: Build Tools
Classifier: Topic :: Software Development :: Libraries :: Python Modules
License-File: LICENSE
Summary: Python source bundler that produces a single .py file from multi-module projects
Keywords: bundler,python,deployment,pyspark,lambda
Author: Konstantin Vyatkin <tino@vtkn.io>
Author-email: Konstantin Vyatkin <tino@vtkn.io>
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/tinovyatkin/serpen
Project-URL: Repository, https://github.com/tinovyatkin/serpen
Project-URL: Documentation, https://github.com/tinovyatkin/serpen#readme
Project-URL: Issues, https://github.com/tinovyatkin/serpen/issues

# Serpen: Python Source Bundler

[![Crates.io](https://img.shields.io/crates/v/serpen.svg)](https://crates.io/crates/serpen)
[![PyPI](https://img.shields.io/pypi/v/serpen.svg)](https://pypi.org/project/serpen/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**Serpen** is a CLI and Python library that produces a single `.py` file from a multi-module Python project by inlining all *first-party* source files. This approach is inspired by JavaScript bundlers and aims to simplify deployment, especially in constrained environments like PySpark jobs, AWS Lambdas, and notebooks.

## Features

- 🦀 **Rust-based CLI** using the RustPython parser (same as Ruff and Pyrefly)
- 🐍 **Python 3.10+** support
- 🌲 **Tree-shaking logic** to inline only the modules that are actually used
- 📦 **Requirements generation** with optional `requirements.txt` output
- 🔧 **Configurable** import classification and source directories
- 🚀 **Fast** and memory-efficient
- 🐍 **Python API** available via maturin packaging

## Installation

### From PyPI (Python Package)

```bash
pip install serpen
```

### From Crates.io (Rust Binary)

```bash
cargo install serpen
```

### From Source

```bash
git clone https://github.com/tinovyatkin/serpen.git
cd serpen
cargo build --release
```

## Quick Start

### Command Line Usage

```bash
# Basic bundling
serpen --entry src/main.py --output bundle.py

# Generate requirements.txt
serpen --entry src/main.py --output bundle.py --emit-requirements

# Verbose output
serpen --entry src/main.py --output bundle.py --verbose

# Custom config file
serpen --entry src/main.py --output bundle.py --config my-serpen.toml
```

### Python API Usage

```python
from serpen import Bundler

bundler = Bundler()
bundler.bundle("src/main.py", "bundle.py", emit_requirements=True)
```

## Configuration

Create a `serpen.toml` file in your project root:

```toml
# Source directories to scan for first-party modules
src = ["src", ".", "lib"]

# Known first-party module names
known_first_party = [
    "my_internal_package",
]

# Known third-party module names
known_third_party = [
    "requests",
    "numpy",
    "pandas",
]

# Whether to preserve comments in the bundled output
preserve_comments = true

# Whether to preserve type hints in the bundled output
preserve_type_hints = true
```

## How It Works

1. **Module Discovery**: Scans configured source directories to discover first-party Python modules
2. **Import Classification**: Classifies imports as first-party, third-party, or standard library
3. **Dependency Graph**: Builds a dependency graph and performs topological sorting
4. **Tree Shaking**: Only includes modules that are actually imported (directly or transitively)
5. **Code Generation**: Generates a single Python file with proper module separation
6. **Requirements**: Optionally generates `requirements.txt` with third-party dependencies

## Output Structure

The bundled output follows this structure:

```python
#!/usr/bin/env python3
# Generated by Serpen - Python Source Bundler

# Preserved imports (stdlib and third-party)
import os
import sys
import requests

# ─ Module: utils/helpers.py ─
def greet(name: str) -> str:
    return f"Hello, {name}!"

# ─ Module: models/user.py ─
class User:
    def __init__(self, name: str):
        self.name = name

# ─ Entry Module: main.py ─
from utils.helpers import greet
from models.user import User

def main():
    user = User("Alice")
    print(greet(user.name))

if __name__ == "__main__":
    main()
```

## Use Cases

### PySpark Jobs

Deploy complex PySpark applications as a single file:

```bash
serpen --entry spark_job.py --output dist/spark_job_bundle.py --emit-requirements
spark-submit dist/spark_job_bundle.py
```

### AWS Lambda

Package Python Lambda functions with all dependencies:

```bash
serpen --entry lambda_handler.py --output deployment/handler.py
# Upload handler.py + requirements.txt to Lambda
```

### Jupyter Notebooks

Create self-contained notebooks:

```python
# In your notebook
from serpen import Bundler
bundler = Bundler()
bundler.bundle("my_analysis.py", "notebook_bundle.py")
```

## Special Considerations

### Pydantic Compatibility

Serpen preserves class identity and module structure to ensure Pydantic models work correctly:

```python
# Original: models/user.py
class User(BaseModel):
    name: str

# Bundled output preserves __module__ and class structure
```

### Pandera Decorators

Function and class decorators are preserved with their original module context:

```python
# Original: validators/schemas.py
@pa.check_types
def validate_dataframe(df: DataFrame[UserSchema]) -> DataFrame[UserSchema]:
    return df

# Bundled output maintains decorator functionality
```

### Circular Dependencies

Serpen detects circular imports and reports them as errors:

```bash
Error: Circular dependency detected involving module: utils.helpers
```

## Comparison with Other Tools

| Tool | Language | Tree Shaking | PySpark Ready | Type Hints |
|------|----------|--------------|---------------|------------|
| Serpen | Rust | ✅ | ✅ | ✅ |
| PyInstaller | Python | ❌ | ❌ | ✅ |
| Nuitka | Python | ❌ | ❌ | ✅ |
| Pex | Python | ❌ | ❌ | ✅ |

## Development

### Building from Source

```bash
git clone https://github.com/tinovyatkin/serpen.git
cd serpen

# Build Rust CLI
cargo build --release

# Build Python package
pip install maturin
maturin develop

# Run tests
cargo test
```

### Project Structure

```
serpen/
├── src/                    # Rust source code
│   ├── main.rs            # CLI entry point
│   ├── bundler.rs         # Core bundling logic
│   ├── resolver.rs        # Import resolution
│   ├── emit.rs            # Code generation
│   └── ...
├── python/serpen/         # Python package
├── tests/                 # Test suites
│   └── fixtures/          # Test projects
├── docs/                  # Documentation
└── Cargo.toml            # Rust dependencies
```

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Submit a pull request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- **Ruff**: Import resolution logic inspiration
- **RustPython**: Python AST parsing
- **Maturin**: Python-Rust integration

## Roadmap

- [ ] Source maps for debugging
- [ ] Parallel processing
- [ ] Package flattening mode
- [ ] Comment and type hint stripping
- [ ] Plugin system for custom transformations

---

For more examples and detailed documentation, visit our [documentation site](https://github.com/tinovyatkin/serpen#readme).

