Phase 25: Reinforcement Learning β€” Start HereΒΆ

Train agents to make decisions through trial and error β€” the same technology behind AlphaGo, game-playing AIs, and RLHF for LLMs.

What Is Reinforcement Learning?ΒΆ

Agent observes State β†’ takes Action β†’ receives Reward β†’ updates Policy
                              ↑__________________________________↓
                                        (environment loop)

RL is also the engine behind RLHF (Reinforcement Learning from Human Feedback) β€” how ChatGPT and Claude were trained to be helpful and safe.

Notebooks in This PhaseΒΆ

Notebook

Topic

01_markov_decision_processes.ipynb

MDPs: states, actions, rewards, Bellman equations

02_q_learning.ipynb

Tabular Q-learning and temporal difference learning

03_deep_q_networks.ipynb

DQN with neural networks (Atari games)

04_policy_based_methods.ipynb

REINFORCE, Actor-Critic, PPO

05_advanced_topics_applications.ipynb

RLHF, multi-agent RL, real-world applications

06_practical_exercises.ipynb

OpenAI Gym environments, hands-on projects

Key AlgorithmsΒΆ

Algorithm

Type

Use Case

Q-Learning

Value-based

Simple discrete action spaces

DQN

Value-based

Atari, discrete actions

PPO

Policy-based

Most practical RL tasks

SAC

Actor-Critic

Continuous control (robotics)

RLHF

Human feedback

Fine-tuning LLMs

PrerequisitesΒΆ

  • Neural Networks (Phase 06)

  • Probability and statistics (Phase 03)

  • PyTorch basics

Learning PathΒΆ

01_markov_decision_processes.ipynb   ← Start here
02_q_learning.ipynb
03_deep_q_networks.ipynb
04_policy_based_methods.ipynb
05_advanced_topics_applications.ipynb
06_practical_exercises.ipynb         ← Build and train agents