Skip to content

API Reference

Complete reference for SMDPfier classes, functions, and configurations.

Quick Start Examples

Basic SMDPfier Setup

from smdpfier import SMDPfier, Option
from smdpfier.defaults import sum_rewards
import gymnasium as gym

# Basic setup with static options
env = gym.make("CartPole-v1")
options = [
    Option([0, 1], "left-right"),     # 2 actions = 2 ticks
    Option([1, 0], "right-left")      # 2 actions = 2 ticks
]

smdp_env = SMDPfier(
    env,
    options_provider=options,
    action_interface="index",
    max_options=len(options)
)

Dynamic Options with Built-in Generators

from smdpfier.defaults.options import RandomStaticLen

smdp_env = SMDPfier(
    env,
    options_provider=RandomStaticLen(length=3, num_options=8),
    action_interface="index",
    max_options=8
)

Core Classes

SMDPfier

Primary wrapper class that transforms any Gymnasium environment into an SMDP.

class SMDPfier(gym.Wrapper):
    def __init__(
        self,
        env: gym.Env,
        *,
        options_provider: Callable[[Any, dict], list[Option]] | Sequence[Option],
        reward_agg: Callable[[list[float]], float] = sum_rewards,
        action_interface: Literal["index", "direct"] = "index",
        max_options: int | None = None,
        availability_fn: Optional[Callable[[Any], Iterable[int]]] = None,
        precheck: bool = False,
        rng_seed: int | None = None,
    )

Parameters

Parameter Type Required Default Description
env gym.Env - Base Gymnasium environment
options_provider Callable or Sequence[Option] - Static options or dynamic generator
reward_agg Callable sum_rewards How to aggregate per-step rewards
action_interface "index" or "direct" "index" Action selection interface
max_options int or None None Max options (required for index interface)
availability_fn Callable or None None Action masking function
precheck bool False Validate options before execution
rng_seed int or None None Random seed for reproducibility

Methods

step(action: int | Option) -> tuple[obs, reward, terminated, truncated, info]

Execute an option in the environment.

  • Index interface: action is integer index
  • Direct interface: action is Option object

reset(**kwargs) -> tuple[obs, info]

Reset the environment and return initial observation and info.

close()

Close the environment.

Option

Represents a sequence of primitive actions with metadata.

class Option:
    def __init__(
        self,
        actions: Sequence[Any],
        name: str,
        meta: dict | None = None
    )

Parameters

Parameter Type Required Description
actions Sequence[Any] Sequence of primitive actions
name str Human-readable option name
meta dict or None Additional metadata

Properties

  • actions: The action sequence
  • name: Human-readable name
  • meta: User-defined metadata dictionary
  • id: Stable hash-based identifier (computed from actions + name)

Examples

# Discrete actions
option1 = Option([0, 1, 0], "left-right-left")

# Continuous actions
option2 = Option([[-1.0], [0.5], [2.0]], "continuous-sequence")

# With metadata
option3 = Option(
    actions=[0, 0, 1, 1], 
    name="double-pairs",
    meta={"category": "symmetric", "difficulty": "easy"}
)

print(option1.id)  # Stable hash: "a1b2c3..."

SMDP Info Payload Structure

Every step() call returns comprehensive metadata in info["smdp"]:

{
    "option": {
        "id": "abc123...",              # Stable hash-based ID
        "name": "left-right-left",      # Human-readable name
        "len": 3,                       # Number of primitive actions
        "meta": {"category": "test"}    # User metadata (if any)
    },
    "k_exec": 3,                        # Primitive steps actually executed
    "duration": 3,                      # Duration in ticks (= k_exec)
    "rewards": [1.0, 1.0, 1.0],        # Per-step rewards
    "terminated_early": False,          # Whether episode ended during option
    "action_mask": [1, 1, 0, 1],       # Available option indices (index interface only)
    "num_dropped": 0                    # Options dropped due to overflow (index interface only)
}

Info Fields Detail

Field Type Description
option.id str Stable identifier (hash of actions + name)
option.name str Human-readable option name
option.len int Total number of primitive actions in option
option.meta dict User-defined metadata
k_exec int Number of primitive steps actually executed
duration int Duration in ticks (always equals k_exec)
rewards list[float] Per-primitive-step rewards
terminated_early bool True if episode ended before option completed
action_mask list[int] Binary mask of available options (index interface)
num_dropped int Number of options dropped due to overflow (index interface)

Action Interfaces

Index Interface

Action Interfaces

Index Interface

Transforms the action space to Discrete(max_options) where actions are integer indices.

Configuration:

env = SMDPfier(
    base_env,
    options_provider=options,
    action_interface="index",
    max_options=len(options)  # Required
)

# Usage
action = 1  # Select second option
obs, reward, term, trunc, info = env.step(action)

Features: - Built-in action masking via info["smdp"]["action_mask"] - Overflow handling for dynamic options - Seamless RL algorithm integration

Direct Interface

Allows passing Option objects directly to step().

Configuration:

env = SMDPfier(
    base_env,
    options_provider=options,
    action_interface="direct"
    # No max_options needed
)

# Usage
option = options[1]  # Select option object
obs, reward, term, trunc, info = env.step(option)

Features: - Intuitive option selection - Full control over option choice - Works naturally with continuous actions

Built-in Defaults

Option Generators

RandomStaticLen - Generate random options with fixed length:

from smdpfier.defaults.options import RandomStaticLen

generator = RandomStaticLen(
    length=3,                    # Fixed option length (= 3 ticks)
    action_space_size=4,         # Discrete action space size (auto-detected if None)
    num_options=10,              # Number of options to generate
    rng_seed=42                  # Random seed
)

RandomVarLen - Generate random options with variable length:

from smdpfier.defaults.options import RandomVarLen

generator = RandomVarLen(
    min_length=2,                # Minimum option length (= 2 ticks)
    max_length=5,                # Maximum option length (= 5 ticks)
    action_space_size=4,         # Discrete action space size
    num_options=8,               # Number of options to generate
    rng_seed=42                  # Random seed
)

Reward Aggregation

from smdpfier.defaults import sum_rewards, mean_rewards, discounted_sum

# Sum all per-step rewards (default)
reward_agg = sum_rewards

# Average per-step rewards
reward_agg = mean_rewards

# Discount per-step rewards with γ
reward_agg = discounted_sum(gamma=0.99)

Configuration Patterns

Static Options with Index Interface

options = [
    Option([0, 1], "left-right"),                # 2 ticks
    Option([1, 0], "right-left"),                # 2 ticks
    Option([0, 0, 1], "left-left-right"),        # 3 ticks
]

env = SMDPfier(
    base_env,
    options_provider=options,
    action_interface="index",
    max_options=len(options)
)

Dynamic Options with Masking

def dynamic_generator(obs, info):
    # Generate options based on current state
    if obs[0] > 0:
        return [Option([0], "left"), Option([0, 0], "double-left")]
    else:
        return [Option([1], "right"), Option([1, 1], "double-right")]

def availability_mask(obs):
    # Mask options based on state
    return [0, 1] if obs[1] > 0.5 else [0, 1]  # All available

env = SMDPfier(
    base_env,
    options_provider=dynamic_generator,
    action_interface="index",
    max_options=5,
    availability_fn=availability_mask
)

Continuous Actions with Direct Interface

continuous_options = [
    Option([[-1.0], [0.0]], "left-center"),            # 2 ticks
    Option([[0.5], [1.0]], "gentle-hard-right"),       # 2 ticks
    Option([[-2.0], [2.0], [0.0]], "extreme-swing"),   # 3 ticks
]

env = SMDPfier(
    gym.make("Pendulum-v1"),
    options_provider=continuous_options,
    action_interface="direct"
)

Error Handling

SMDPfier provides detailed error context through specialized exceptions:

from smdpfier.errors import SMDPOptionValidationError, SMDPOptionExecutionError

try:
    obs, reward, term, trunc, info = env.step(action)
except SMDPOptionValidationError as e:
    print(f"Precheck failed for option '{e.option_name}' at step {e.failing_step_index}")
    print(f"Action: {e.action_repr}, State: {e.short_obs_summary}")
except SMDPOptionExecutionError as e:
    print(f"Runtime error for option '{e.option_name}' at step {e.failing_step_index}")
    print(f"Underlying error: {e.base_error}")

See Error Handling for complete details.

Custom Functions

Custom Options Provider

def custom_options_provider(obs, info):
    """Generate options based on observation and info."""
    # Access current state
    position = obs[0]

    # Access action space if needed
    action_space = info.get("action_space")

    # Access action mask if available
    action_mask = info.get("action_mask")

    # Generate options
    options = []
    if position > 0:
        options.append(Option([0, 0], "strong-left"))
    if position < 0:
        options.append(Option([1, 1], "strong-right"))

    return options

Custom Duration Function

def custom_duration_fn(option, obs, info):
    """Compute duration based on option and state."""
    base_duration = len(option.actions) * 2

    # State-dependent adjustment
    if obs[0] > 0.5:  # Far right position
        return base_duration + 3  # Takes longer
    else:
        return base_duration

# Can return scalar (int) or list (list[int])
def per_action_duration_fn(option, obs, info):
    """Return duration for each action."""
    durations = []
    for action in option.actions:
        if action == 0:  # Left action
            durations.append(2)
        else:  # Right action
            durations.append(5)
    return durations

Custom Availability Function

def custom_availability_fn(obs):
    """Return available option indices based on state."""
    position, velocity = obs[0], obs[1]

    available = []

    # Always allow basic options
    available.extend([0, 1])

    # Complex options only when stable
    if abs(velocity) < 0.1:
        available.extend([2, 3, 4])

    return available

Performance Tips

Efficient Option Generation

# Pre-compute static options when possible
static_options = [Option([0, 1], f"option_{i}") for i in range(10)]

# Cache dynamic options when state doesn't change much
class CachedOptionsProvider:
    def __init__(self):
        self._cache = {}

    def __call__(self, obs, info):
        state_key = tuple(obs[:2])  # Use subset of observation as key
        if state_key not in self._cache:
            self._cache[state_key] = generate_options_for_state(obs)
        return self._cache[state_key]

Memory-Efficient Duration Functions

# Use generators for large option sets
def memory_efficient_duration_fn(option, obs, info):
    # Compute duration on demand rather than storing
    return len(option.actions) * compute_action_cost(obs)

Action Masking

Action masking in SMDPfier is handled through the availability_fn parameter and works exclusively with the index interface. See the Masking and Precheck guide for comprehensive examples.

def availability_fn(obs):
    """Return list of available option indices."""
    # Return indices of valid options based on current state
    return [0, 2, 3]  # Options 1 is masked out

env = SMDPfier(
    base_env,
    options_provider=options,
    duration_fn=duration_fn,
    action_interface="index",
    availability_fn=availability_fn
)

# Action mask appears in info
obs, info = env.reset()
mask = info["smdp"]["action_mask"]  # e.g., [1, 0, 1, 1]

See Also: - Duration Guide - Understanding ticks and SMDP discounting - Interface Guide - Choosing index vs direct - Error Handling - Debugging failed options - Examples - Complete working examples