Metadata-Version: 2.4
Name: asymmetric-td
Version: 0.1.1
Summary: Asymmetric Temporal Difference Learning for stable deep RL
Author: StarEmporium Enterprise
License: MIT
Project-URL: Homepage, https://github.com/sallahuddin92/mini-chess-zero
Project-URL: Documentation, https://github.com/sallahuddin92/mini-chess-zero/blob/main/docs/walkthrough.md
Project-URL: Repository, https://github.com/sallahuddin92/mini-chess-zero
Project-URL: Bug Tracker, https://github.com/sallahuddin92/mini-chess-zero/issues
Project-URL: Changelog, https://github.com/sallahuddin92/mini-chess-zero/releases
Keywords: reinforcement-learning,deep-learning,dqn,pytorch,stable-training,temporal-difference
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=1.9.0
Requires-Dist: numpy>=1.19.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Provides-Extra: examples
Requires-Dist: gymnasium>=0.26.0; extra == "examples"
Requires-Dist: matplotlib>=3.0.0; extra == "examples"
Dynamic: license-file

# Mini-Chess Zero

> **Asymmetric TD Learning**: A biologically-inspired approach to stable deep reinforcement learning

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![PyTorch](https://img.shields.io/badge/PyTorch-1.9+-red.svg)](https://pytorch.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)

## 🎯 Results

| Metric | Before | After |
|--------|--------|-------|
| Win Rate | 3% | **87.5%** |
| Q-Value | 70,615,864 | **9.29** |
| Status | Diverging | **Converged** |

**Key Achievement**: 1,380× reduction in Q-value explosion through novel stabilization techniques.

---

## 🧠 Novel Contribution: Asymmetric TD Learning

Inspired by dopamine neuron asymmetry in biological brains:

```python
# Positive TD errors: Learn cautiously (weight = 0.5)
# Negative TD errors: Learn aggressively (weight = 1.5)
weights = torch.where(td_errors > 0, 0.5, 1.5)
loss = (weights * huber_loss(predicted, target)).mean()
```

This prevents overoptimistic value estimates while enabling rapid learning from mistakes.

---

## 📦 Installation

```bash
# Clone repository
git clone https://github.com/your-username/mini-chess-zero.git
cd mini-chess-zero

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install the asymmetric-td package locally
pip install -e .
```

---

## 🚀 Quick Start

### Train the Agent
```bash
python train_monitor.py
```

### Play Against AI
```bash
python play_human.py
```

### Run Web App
```bash
cd webapp && python api.py
# Open http://localhost:8000
```

### Run Trading Backtest
```bash
cd trading && python backtest.py
```

---

## 📁 Project Structure

```
mini_chess_rl/
├── src/                    # Core RL (87.5% win rate)
│   ├── agent.py            # Stabilized DQN
│   ├── environment.py      # 5x5 Mini-Chess
│   └── replay_buffer.py
├── asymmetric_td/          # Python Package
│   ├── losses/             # AsymmetricTDLoss
│   ├── agents/             # StableDQN
│   └── utils/              # Gradient, soft update
├── webapp/                 # Chess Web App
├── trading/                # Trading Adaptation (+24% return)
├── research/               # Paper, ablation study
└── models/                 # Trained models
```

---

## 📊 Ablation Study

| Configuration | Q-Value | Status |
|---------------|---------|--------|
| FULL | 9.29 | ✅ Stable |
| NO_ATD | 9.23 | ✅ Stable |
| NO_Q_CLIP | 10.82 | ⚠️ Drifting |
| **VANILLA** | **12,822** | 💥 Exploded |

The combined stabilization stack is essential—removing all techniques causes catastrophic failure.

---

## 🎮 Features

- **Asymmetric TD Learning** - Novel biologically-inspired loss function
- **Gradient Clipping** - Prevents explosion (max_norm=10)
- **Polyak Soft Updates** - Smooth target updates (τ=0.005)
- **Q-Value Clipping** - Bounds estimates ([-10, 10])
- **Reward Centering** - Reduces variance
- **Action Masking** - Legal move enforcement

---

## 📈 Neural Network Analysis

The trained agent learned classical chess piece values:

| Piece | Learned Q-Value |
|-------|-----------------|
| King | +3.18 |
| Queen | +2.97 |
| Rook | +2.64 |
| Bishop | +2.26 |
| Knight | +1.98 |
| Pawn | +1.86 |

---

## 📄 Research Paper

See `research/paper/paper_draft.md` for the complete paper:
- **Title**: "Asymmetric Temporal Difference Learning: Biologically-Inspired Stabilization for Deep RL"
- **Key Result**: 1,380× reduction in Q-value magnitude

---

## 🔧 Configuration

Key hyperparameters in `src/agent.py`:

| Parameter | Value | Description |
|-----------|-------|-------------|
| GAMMA | 0.95 | Discount factor |
| LEARNING_RATE | 0.0001 | Adam LR |
| TAU | 0.005 | Soft update rate |
| GRAD_CLIP | 10.0 | Gradient clipping |
| ATD_WEIGHTS | (0.5, 1.5) | Asymmetric weights |

---

## 📜 License

MIT License - see [LICENSE](LICENSE) for details.

---

## 🙏 Acknowledgments

- Gardner Mini-Chess variant
- PyTorch team
- Schultz (1997) for dopamine asymmetry research
