Index vs Direct Interfaces
SMDPfier provides two interfaces for option selection, each optimized for different use cases. Understanding when to use each interface is crucial for effective SMDP implementation.
Interface Comparison
| Aspect | Index Interface | Direct Interface |
|---|---|---|
| Action Space | Discrete(max_options) |
Same as Option actions |
| Actions | Integer indices (0, 1, 2, ...) | Option objects |
| Best For | Reinforcement Learning | Scripting/Testing |
| Action Masking | Built-in support | Not applicable |
| Dynamic Options | Supported with overflow handling | Straightforward |
| RL Integration | Seamless | Requires adaptation |
| Debugging | Index-based (less intuitive) | Object-based (more intuitive) |
Index Interface
The index interface transforms SMDPfier into a discrete action space where each action is an integer index selecting an available option.
When to Use Index Interface
✅ Choose Index Interface When: - Training RL agents (most algorithms expect discrete actions) - Need action masking based on environment state - Working with dynamic option sets - Integrating with existing RL frameworks (Stable-Baselines3, RLLib, etc.) - Want built-in overflow handling for dynamic options
Basic Index Interface Setup
import gymnasium as gym
from smdpfier import SMDPfier, Option
from smdpfier.defaults import ConstantOptionDuration
# Define static options
options = [
Option([0, 0, 1], "left-left-right"),
Option([1, 1, 0], "right-right-left"),
Option([0, 1, 0], "left-right-left"),
]
# Create SMDPfier with index interface
env = SMDPfier(
gym.make("CartPole-v1"),
options_provider=options,
duration_fn=ConstantOptionDuration(5),
action_interface="index", # Default
max_options=len(options) # Must specify for static options
)
print(f"Original action space: Discrete(2)")
print(f"SMDP action space: {env.action_space}") # Discrete(3)
# Use integer actions
obs, info = env.reset()
action = 1 # Select second option ("right-right-left")
obs, reward, term, trunc, info = env.step(action)
print(f"Executed option: {info['smdp']['option']['name']}")
Action Masking with Index Interface
Action masking allows you to restrict which options are available based on the current environment state.
def cart_availability(obs):
"""Restrict options based on cart position."""
cart_position = obs[0]
if cart_position > 0.5:
return [0, 2] # Only left-based options when cart is far right
elif cart_position < -0.5:
return [1] # Only right-based options when cart is far left
else:
return [0, 1, 2] # All options available in center
env = SMDPfier(
gym.make("CartPole-v1"),
options_provider=options,
duration_fn=ConstantOptionDuration(5),
action_interface="index",
max_options=3,
availability_fn=cart_availability
)
obs, info = env.reset()
print(f"Available actions: {info['smdp']['action_mask']}")
# Might show: [1, 1, 1] (all available) or [1, 0, 1] (middle option masked)
Dynamic Options with Index Interface
Dynamic options change based on the current state, requiring careful overflow handling.
from smdpfier.defaults.options import RandomStaticLen
def dynamic_options_generator(obs, info):
"""Generate different options based on cart velocity."""
velocity = obs[1]
if abs(velocity) > 0.1: # Fast movement
# Short options for quick corrections
return [
Option([0], "quick-left"),
Option([1], "quick-right"),
]
else: # Slow movement
# Longer options for building momentum
return [
Option([0, 0, 0], "triple-left"),
Option([1, 1, 1], "triple-right"),
Option([0, 1, 0], "left-right-left"),
Option([1, 0, 1], "right-left-right"),
]
env = SMDPfier(
gym.make("CartPole-v1"),
options_provider=dynamic_options_generator,
duration_fn=ConstantOptionDuration(3),
action_interface="index",
max_options=4 # Maximum expected options
)
obs, info = env.reset()
# If generator returns 2 options but max_options=4:
# - Actions 0,1 are valid
# - Actions 2,3 are masked out
# - info["smdp"]["action_mask"] = [1, 1, 0, 0]
Overflow Handling
When dynamic options exceed max_options, SMDPfier applies truncate policy by default:
# Generator returns 6 options, but max_options=4
available_options = dynamic_generator(obs, info) # Returns 6 options
# Result: First 4 options are used, 2 are dropped
# info["smdp"]["num_dropped"] = 2
Direct Interface
The direct interface allows you to pass Option objects directly to env.step(), providing an intuitive and flexible approach.
When to Use Direct Interface
✅ Choose Direct Interface When: - Scripting or testing specific option sequences - Building non-RL controllers or heuristics - Debugging option behavior - Prototyping before RL training - Need full control over option selection
Basic Direct Interface Setup
import gymnasium as gym
from smdpfier import SMDPfier, Option
from smdpfier.defaults import ConstantActionDuration
# Define options
options = [
Option([0, 0, 1], "left-left-right"),
Option([1, 1, 0], "right-right-left"),
Option([0, 1, 0, 1], "alternating"),
]
# Create SMDPfier with direct interface
env = SMDPfier(
gym.make("CartPole-v1"),
options_provider=options,
duration_fn=ConstantActionDuration(2), # 2 ticks per action
action_interface="direct"
)
print(f"Action space: {env.action_space}") # Same as original env
# Use Option objects directly
obs, info = env.reset()
option = options[1] # Select "right-right-left"
obs, reward, term, trunc, info = env.step(option)
print(f"Executed {info['smdp']['k_exec']} steps")
print(f"Duration: {info['smdp']['duration_exec']} ticks")
Dynamic Options with Direct Interface
def get_option_for_state(obs):
"""Select option based on current state."""
cart_position, cart_velocity = obs[0], obs[1]
if cart_position > 0 and cart_velocity > 0:
return Option([0, 0], "strong-left") # Moving right, correct strongly
elif cart_position < 0 and cart_velocity < 0:
return Option([1, 1], "strong-right") # Moving left, correct strongly
else:
return Option([0, 1], "gentle-correction") # Gentle correction
# Simple control loop
obs, info = env.reset()
for step in range(100):
option = get_option_for_state(obs)
obs, reward, term, trunc, info = env.step(option)
if term or trunc:
break
Continuous Actions with Direct Interface
The direct interface works seamlessly with continuous action spaces:
import gymnasium as gym
from smdpfier import SMDPfier, Option
from smdpfier.defaults import ConstantOptionDuration
# Continuous action options
continuous_options = [
Option([[-1.0], [0.0], [1.0]], "left-center-right"),
Option([[0.5], [0.5]], "gentle-right"),
Option([[-2.0]], "hard-left"),
]
env = SMDPfier(
gym.make("Pendulum-v1"),
options_provider=continuous_options,
duration_fn=ConstantOptionDuration(5),
action_interface="direct"
)
obs, info = env.reset()
option = continuous_options[0]
obs, reward, term, trunc, info = env.step(option)
Interface Selection Guide
Choose Index Interface When:
# RL Training
env = SMDPfier(..., action_interface="index", max_options=10)
agent.learn(env) # Works with any RL algorithm
# Action Masking Needed
env = SMDPfier(..., action_interface="index", availability_fn=mask_fn)
# Dynamic Options with Overflow
env = SMDPfier(..., action_interface="index", max_options=20)
# Handles varying option counts gracefully
Choose Direct Interface When:
# Scripted Control
for situation in test_cases:
option = select_option_for_situation(situation)
obs, reward, term, trunc, info = env.step(option)
# Debugging Specific Options
problem_option = Option([0, 1, 0], "problematic-sequence")
obs, reward, term, trunc, info = env.step(problem_option)
# Prototyping Before RL
def human_policy(obs):
return Option([best_action_for(obs)], "human-choice")
Configuration Examples
Index Interface Configuration
# Static options with masking
env = SMDPfier(
base_env,
options_provider=static_options,
duration_fn=ConstantOptionDuration(10),
action_interface="index",
max_options=len(static_options),
availability_fn=masking_function,
precheck=True # Validate options before execution
)
# Dynamic options with overflow handling
env = SMDPfier(
base_env,
options_provider=dynamic_generator,
duration_fn=RandomActionDuration(2, 5),
action_interface="index",
max_options=15, # Allow up to 15 options
# Overflow: truncate to first 15, record num_dropped
)
Direct Interface Configuration
# Simple direct interface
env = SMDPfier(
base_env,
options_provider=option_list,
duration_fn=ConstantActionDuration(3),
action_interface="direct"
# No max_options needed
# No availability_fn needed
)
Summary
| Use Case | Recommended Interface | Why |
|---|---|---|
| RL Training | Index | Discrete action space expected |
| Action Masking | Index | Built-in masking support |
| Dynamic Options | Index | Overflow handling |
| Scripting/Testing | Direct | Intuitive option passing |
| Debugging | Direct | Clear option identification |
| Continuous Actions | Direct | Natural continuous support |
| Prototyping | Direct | Flexible experimentation |
Most Common Pattern: 1. Start with direct interface for prototyping and testing 2. Switch to index interface when training RL agents
Next: Masking and Precheck | See Also: API Reference