ml data technical write-up

RL Pacman Agents

Q-learning, approximate Q-learning, and experimentation

The Pacman agents project focused on reinforcement learning fundamentals: value estimation, policy improvement, exploration, and approximate feature representations.

Problem / motivation

Pacman is a compact environment for seeing how algorithmic choices affect behavior. Small changes in reward, exploration, or features can completely change an agent's strategy.

Key technical challenges

  • Tuning exploration so the agent learns without becoming unstable.
  • Designing useful features for approximate Q-learning.
  • Interpreting behavior from reward curves and gameplay traces.

Architecture / workflow

How the system fits together

01

Environment exposes states, legal actions, transitions, and reward signals.

02

Q-learning agent updates action-value estimates from observed transitions.

03

Approximate agent replaces table values with feature-weighted estimates.

04

Experiment logs compare performance under different parameters.

What I built

  • Q-learning and approximate Q-learning agents.
  • Experiment configurations for learning rate, discount, and exploration.
  • Analysis of learned behavior and failure modes.

Outcomes / metrics

  • Hands-on reinforcement learning implementation experience.
  • Clearer intuition for exploration, reward shaping, and feature design.

Lessons learned

  • RL debugging is behavioral: curves help, but watching the policy matters.
  • Approximation adds power and new failure modes at the same time.

Screenshots / media

Visual evidence placeholders

Replace these panels with screenshots, demos, diagrams, or notebook exports as each artifact becomes ready for publishing.

Media

Learning loop

State, action, reward, update, and policy improvement cycle.