Metadata-Version: 2.3
Name: neurojack
Version: 0.1.0
Summary: 
Author: dafaqboomduck
Requires-Python: >=3.10, <3.11
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Requires-Dist: matplotlib (==3.10.5)
Requires-Dist: matplotlib-inline (>=0.1.7,<0.2.0)
Requires-Dist: numpy (==2.0.1)
Requires-Dist: tensorflow (==2.18.0)
Requires-Dist: tensorflow-io-gcs-filesystem (>0.28.0,<0.30.0)
Requires-Dist: tqdm (==4.67.1)
Description-Content-Type: text/markdown

# Blackjack Reinforcement Learning

This project implements a reinforcement learning system to train an agent to play Blackjack. It features a custom environment and two types of Deep Q-Learning (DQN) agents to explore different learning strategies.

---

## 1. `env.py`: The Custom Blackjack Environment

The `CustomBlackjackEnv` class defines a flexible and highly configurable Blackjack game environment. It is built from scratch to allow for specific rule sets and advanced features beyond standard library environments.

### Key Features
- **Customizable Rules**: Supports multiple rule variations, including different numbers of decks, variable payouts, and specific dealer rules.  
- **Card Counting**: Optionally tracks a running count and a true count, enabling more advanced state representations.  
- **State Representation**: The observation space includes:
  - Player’s hand sum  
  - Dealer’s visible card  
  - Usable ace indicator  
  - Optional card counting values (if enabled)  
- **Action Space**: Supports common actions (`Hit`, `Stand`), and optionally `Double Down` or `Split` depending on the rule set.  

---

## 2. `dqn_agent.py`: The Deep Q-Network Agent

The `DQNAgent` implements a Deep Q-Network to learn the optimal strategy for Blackjack.

### How It Works
- **Q-Networks**: Uses two neural networks — a main Q-network for training and a target Q-network for stable updates.  
- **Experience Replay**: Stores past experiences (state, action, reward, next state) in a replay buffer to improve sample efficiency and reduce correlation between updates.  
- **Epsilon-Greedy Policy**: Balances exploration and exploitation by occasionally taking random actions during training.  

---

## 3. `double_dqn_agent.py`: The Double DQN Agent

The `DoubleDQNAgent` extends the standard DQN approach to address overestimation of Q-values, leading to more accurate and stable learning.

### Key Improvements
- **Decoupled Action Selection**: Uses the main Q-network to select actions while the target Q-network evaluates their values.  
- **Stabilized Updates**: This separation reduces bias and improves the accuracy of Q-value estimates.  

---

## Project Structure

- **`NeuroJack.env`** → Custom Blackjack environment  
- **`NeuroJack.dqn_agent`** → Standard DQN agent implementation  
- **`NeuroJack.double_dqn_agent`** → Double DQN agent implementation  

This modular design allows easy switching between different agents and experimentation with various reinforcement learning strategies while keeping the core Blackjack logic consistent.

---

## License

This project is licensed under the [MIT License](LICENSE).  

Copyright (c) 2024 **dafaqboomduck**

You are free to use, modify, and distribute this software in accordance with the terms of the license.  
The software is provided **“as is”**, without warranty of any kind.  

---
