MIT License

Copyright (c) 2025 SMDPfier Contributors
# SMDPfier
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Add SMDP-level behavior to Gymnasium environments with options and durations.

## Overview

SMDPfier is a Gymnasium wrapper that enables Semi-Markov Decision Process (SMDP) behavior by allowing users to choose Options (chains of primitive actions) while attaching wall-time durations (in ticks) per option or per action. The wrapper executes one `env.step` per primitive action and reports planned and executed durations so learners can apply true SMDP discounting (e.g., γ^{ticks}).

## Quick Start

```python
import gymnasium as gym
from smdpfier import SMDPfier, Option
from smdpfier.defaults import RandomStaticLen, ConstantOptionDuration

# Create environment
env = gym.make("CartPole-v1")

# Define static options
options = [
    Option(actions=[0, 0, 1], name="left-left-right"),
    Option(actions=[1, 1, 0], name="right-right-left"),
]

# Wrap with SMDPfier
env = SMDPfier(
    env,
    options_provider=options,
    duration_fn=ConstantOptionDuration(10),
    action_interface="index"
)

# Use it
obs, info = env.reset()
action = 0  # Choose first option
obs, reward, terminated, truncated, info = env.step(action)

print(f"Executed {info['smdp']['k_exec']} steps")
print(f"Duration: {info['smdp']['duration_exec']} ticks")
```

## Features

- **Flexible Options**: Static sequences or dynamic discovery via callable
- **Two Interfaces**: Index-based (Discrete actions) or direct Option passing
- **Duration Metadata**: Integer ticks for true SMDP discounting
- **Action Masking**: Support for discrete action availability
- **Rich Info**: Detailed execution metadata in `info["smdp"]`
- **Error Handling**: Comprehensive validation and runtime error reporting
- **Continuous Actions**: Full support for continuous action spaces

## Installation

```bash
pip install smdpfier
```

For development:

```bash
git clone https://github.com/smdpfier/smdpfier.git
cd smdpfier
pip install -e .[dev]
```

## Documentation

Full documentation available at: https://smdpfier.readthedocs.io

## License

MIT License - see [LICENSE](LICENSE) file.

## Contributing

Contributions welcome! Please see our contributing guidelines and ensure all tests pass:

```bash
pytest
ruff check
mypy smdpfier
```
