Metadata-Version: 2.4
Name: serpen
Version: 0.3.13
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Classifier: Topic :: Software Development :: Build Tools
Classifier: Topic :: Software Development :: Libraries :: Python Modules
License-File: LICENSE
Summary: Python source bundler that produces a single .py file from multi-module projects
Keywords: bundler,python,deployment,pyspark,lambda
Author: Konstantin Vyatkin <tino@vtkn.io>
Author-email: Konstantin Vyatkin <tino@vtkn.io>
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/tinovyatkin/serpen
Project-URL: Repository, https://github.com/tinovyatkin/serpen
Project-URL: Documentation, https://github.com/tinovyatkin/serpen#readme
Project-URL: Issues, https://github.com/tinovyatkin/serpen/issues

# Serpen: Python Source Bundler

[![PyPI](https://img.shields.io/pypi/v/serpen.svg)](https://pypi.org/project/serpen/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**Serpen** is a CLI and Python library that produces a single `.py` file from a multi-module Python project by inlining all *first-party* source files. This approach is inspired by JavaScript bundlers and aims to simplify deployment, especially in constrained environments like PySpark jobs, AWS Lambdas, and notebooks.

## Features

- 🦀 **Rust-based CLI** using the RustPython parser (same as Ruff and Pyrefly)
- 🐍 **Python 3.10+** support
- 🌲 **Tree-shaking logic** to inline only the modules that are actually used
- 🧹 **Unused import trimming** to clean up Python files standalone
- 📦 **Requirements generation** with optional `requirements.txt` output
- 🔧 **Configurable** import classification and source directories
- 🚀 **Fast** and memory-efficient
- 🐍 **Python API** available via maturin packaging

## Installation

### From PyPI (Python Package)

```bash
pip install serpen
```

### From npm (Node.js CLI)

```bash
# Global installation
npm install -g serpen

# One-time use
npx serpen --help
```

> **🔐 Supply Chain Security**: All npm packages include [provenance attestations](docs/NPM_PROVENANCE.md) for enhanced security and verification.

### From Source

```bash
git clone https://github.com/tinovyatkin/serpen.git
cd serpen
cargo build --release
```

## Quick Start

### Command Line Usage

```bash
# Basic bundling
serpen --entry src/main.py --output bundle.py

# Generate requirements.txt
serpen --entry src/main.py --output bundle.py --emit-requirements

# Verbose output
serpen --entry src/main.py --output bundle.py --verbose

# Custom config file
serpen --entry src/main.py --output bundle.py --config my-serpen.toml
```

## Configuration

Create a `serpen.toml` file in your project root:

```toml
# Source directories to scan for first-party modules
src = ["src", ".", "lib"]

# Known first-party module names
known_first_party = [
    "my_internal_package",
]

# Known third-party module names
known_third_party = [
    "requests",
    "numpy",
    "pandas",
]

# Whether to preserve comments in the bundled output
preserve_comments = true

# Whether to preserve type hints in the bundled output
preserve_type_hints = true
```

## How It Works

1. **Module Discovery**: Scans configured source directories to discover first-party Python modules
2. **Import Classification**: Classifies imports as first-party, third-party, or standard library
3. **Dependency Graph**: Builds a dependency graph and performs topological sorting
4. **Tree Shaking**: Only includes modules that are actually imported (directly or transitively)
5. **Code Generation**: Generates a single Python file with proper module separation
6. **Requirements**: Optionally generates `requirements.txt` with third-party dependencies

## Output Structure

The bundled output follows this structure:

```python
#!/usr/bin/env python3
# Generated by Serpen - Python Source Bundler

# Preserved imports (stdlib and third-party)
import os
import sys
import requests

# ─ Module: utils/helpers.py ─
def greet(name: str) -> str:
    return f"Hello, {name}!"

# ─ Module: models/user.py ─
class User:
    def **init**(self, name: str):
        self.name = name

# ─ Entry Module: main.py ─
from utils.helpers import greet
from models.user import User

def main():
    user = User("Alice")
    print(greet(user.name))

if **name** == "**main**":
    main()
```

## Use Cases

### PySpark Jobs

Deploy complex PySpark applications as a single file:

```bash
serpen --entry spark_job.py --output dist/spark_job_bundle.py --emit-requirements
spark-submit dist/spark_job_bundle.py
```

### AWS Lambda

Package Python Lambda functions with all dependencies:

```bash
serpen --entry lambda_handler.py --output deployment/handler.py
# Upload handler.py + requirements.txt to Lambda
```

### Jupyter Notebooks

Create self-contained notebooks:

```python
# In your notebook
from serpen import Bundler
bundler = Bundler()
bundler.bundle("my_analysis.py", "notebook_bundle.py")
```

### Code Cleanup

Clean up unused imports in development:

```bash
# Review what imports would be removed
serpen trim src/**/*.py --dry-run

# Clean up entire codebase
find src -name "*.py" -exec serpen trim {} \;
```

## Special Considerations

### Pydantic Compatibility

Serpen preserves class identity and module structure to ensure Pydantic models work correctly:

```python
# Original: models/user.py
class User(BaseModel):
    name: str

# Bundled output preserves **module** and class structure
```

### Pandera Decorators

Function and class decorators are preserved with their original module context:

```python
# Original: validators/schemas.py
@pa.check_types
def validate_dataframe(df: DataFrame[UserSchema]) -> DataFrame[UserSchema]:
    return df

# Bundled output maintains decorator functionality
```

### Circular Dependencies

Serpen detects circular imports and reports them as errors:

```bash
Error: Circular dependency detected involving module: utils.helpers
```

## Comparison with Other Tools

| Tool        | Language | Tree Shaking | Import Cleanup | PySpark Ready | Type Hints |
| ----------- | -------- | ------------ | -------------- | ------------- | ---------- |
| Serpen      | Rust     | ✅           | ✅             | ✅            | ✅         |
| PyInstaller | Python   | ❌           | ❌             | ❌            | ✅         |
| Nuitka      | Python   | ❌           | ❌             | ❌            | ✅         |
| Pex         | Python   | ❌           | ❌             | ❌            | ✅         |

## Development

### Building from Source

```bash
git clone https://github.com/tinovyatkin/serpen.git
cd serpen

# Build Rust CLI
cargo build --release

# Build Python package
pip install maturin
maturin develop

# Run tests
cargo test
```

### Project Structure

```text
serpen/
├── src/                    # Rust source code
│   ├── main.rs            # CLI entry point
│   ├── bundler.rs         # Core bundling logic
│   ├── resolver.rs        # Import resolution
│   ├── emit.rs            # Code generation
│   └── ...
├── python/serpen/         # Python package
├── tests/                 # Test suites
│   └── fixtures/          # Test projects
├── docs/                  # Documentation
└── Cargo.toml            # Rust dependencies
```

## Contributing

### Development Setup

```bash
# Clone the repository
git clone https://github.com/tinovyatkin/serpen.git
cd serpen

# Install Rust toolchain and components
rustup component add llvm-tools-preview
cargo install cargo-llvm-cov

# Build Rust CLI
cargo build --release

# Build Python package
pip install maturin
maturin develop

# Run tests
cargo test
```

### Code Coverage

The project uses `cargo-llvm-cov` for code coverage analysis:

```bash
# Generate text coverage report (Istanbul-style)
cargo coverage-text

# Generate HTML coverage report and open in browser
cargo coverage

# Generate LCOV format for CI
cargo coverage-lcov

# Clean coverage data
cargo coverage-clean
```

**Branch Coverage (Experimental)**:

```bash
# Requires nightly Rust for branch coverage
cargo +nightly coverage-branch
```

Coverage reports are automatically generated in CI and uploaded to Codecov. See [`docs/coverage.md`](docs/coverage.md) for detailed coverage documentation.

**Note**: If you see zeros in the "Branch Coverage" column in HTML reports, this is expected with stable Rust. Branch coverage requires nightly Rust and is experimental.

### Contributing Guidelines

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Submit a pull request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- **Ruff**: Import resolution logic inspiration
- **RustPython**: Python AST parsing
- **Maturin**: Python-Rust integration

## Roadmap

- [ ] Source maps for debugging
- [ ] Parallel processing
- [ ] Package flattening mode
- [ ] Comment and type hint stripping
- [ ] Plugin system for custom transformations

---

For more examples and detailed documentation, visit our [documentation site](https://github.com/tinovyatkin/serpen#readme).

For detailed documentation on the unused import trimmer, see [`docs/unused_import_trimmer.md`](docs/unused_import_trimmer.md).

