This project implements an AI agent to play a version of Pong using PyTorch and PyGame, utilizing reinforcement learning concepts. The agent learns to control the left paddle and maximize its score over training epochs by interacting with the environment.
-
Reinforcement Learning Framework:
- The agent uses a policy gradient-based reinforcement learning approach.
- Actions are sampled from a probability distribution generated by the neural network.
-
Reward Mechanism:
- +100 points for scoring a goal.
- -500 points for missing the ball.
- +50 points for hitting the ball.
- Small rewards or penalties for moving closer to or away from the ball to encourage strategic positioning.
-
Training Process:
- Rewards are accumulated and discounted over time to emphasize future rewards.
- The loss function uses the log-probabilities of the agent's actions, scaled by the discounted reward, to optimize the policy.
-
Neural Network Design:
- Input: The state of the game (paddle position, ball position, and ball velocity).
- Output: A probability distribution over possible actions (
upordown). - Architecture: Fully connected layers with ReLU activations and a Softmax output layer.
-
PyGame Visualization:
- Real-time display of the Pong game, including paddle and ball movements.
- Scores for both the agent (left paddle) and the opponent (right paddle) are displayed.
- The game starts, and the agent observes the game state: paddle position, ball position, and ball velocity.
- The agent outputs probabilities for moving the paddle up or down.
- An action is sampled from the probability distribution, and the paddle is updated accordingly.
- Rewards are calculated based on the agent's actions and the game outcome.
- At the end of each game, the agent uses the accumulated rewards to optimize its policy.
The agent is implemented as a neural network with the following architecture:
- Input Layer: 5 features (paddle position, ball position, and ball velocity).
- Hidden Layers: Two fully connected layers with ReLU activations.
- Output Layer: A probability distribution over 3 actions (
up,down, orno action) using a Softmax function.
The reward system incentivizes the agent to:
- Score points by returning the ball effectively.
- Position itself optimally near the ball to increase its chances of returning it.
- Avoid penalties by missing the ball or failing to move strategically.
-
Install Dependencies:
- PyTorch:
pip install torch - PyGame:
pip install pygame
- PyTorch:
-
Run the Code:
- Execute the Python script to start training and visualizing the Pong game.
-
Adjust Parameters:
- Modify hyperparameters like learning rate, discount factor, and paddle speed to experiment with different training behaviors.
The game interface includes:
- A dynamic Pong game environment with moving paddles and a bouncing ball.
- Real-time updates of scores for both the AI agent and the opponent.
- Improve the reward function for more sophisticated strategies.
- Train the agent using more advanced reinforcement learning algorithms like DDPG or PPO.
- Add multiplayer support or implement a more competitive opponent.
