ml data technical write-up
RL Pacman Agents
Q-learning, approximate Q-learning, and experimentation
The Pacman agents project focused on reinforcement learning fundamentals: value estimation, policy improvement, exploration, and approximate feature representations.
Problem / motivation
Pacman is a compact environment for seeing how algorithmic choices affect behavior. Small changes in reward, exploration, or features can completely change an agent's strategy.
Key technical challenges
- Tuning exploration so the agent learns without becoming unstable.
- Designing useful features for approximate Q-learning.
- Interpreting behavior from reward curves and gameplay traces.
Architecture / workflow
How the system fits together
Environment exposes states, legal actions, transitions, and reward signals.
Q-learning agent updates action-value estimates from observed transitions.
Approximate agent replaces table values with feature-weighted estimates.
Experiment logs compare performance under different parameters.
What I built
- Q-learning and approximate Q-learning agents.
- Experiment configurations for learning rate, discount, and exploration.
- Analysis of learned behavior and failure modes.
Outcomes / metrics
- Hands-on reinforcement learning implementation experience.
- Clearer intuition for exploration, reward shaping, and feature design.
Lessons learned
- RL debugging is behavioral: curves help, but watching the policy matters.
- Approximation adds power and new failure modes at the same time.
Screenshots / media
Visual evidence placeholders
Replace these panels with screenshots, demos, diagrams, or notebook exports as each artifact becomes ready for publishing.
Media
Learning loop
State, action, reward, update, and policy improvement cycle.