📖 How To Use AI Robot Reinforcement Learning Playground
🎯 Step 1: Understand the Environment
The robot (blue circle) starts at the bottom-left. The goal (green star) is at top-right. Red squares are obstacles. The robot must learn to reach the goal without hitting obstacles.
⚡ Pro Tip: Click anywhere on the canvas to move the goal and create new learning challenges.
⚙️ Step 2: Adjust Reward Rules
Check/uncheck reward rules to shape the robot's behavior. Give higher rewards for desired actions and penalties for undesired ones.
🧠 Step 3: Set Training Parameters
- Learning Rate (α): How quickly the robot adapts to new information
- Exploration Rate (ε): Chance to try random actions vs. using learned knowledge
- Discount Factor (γ): How much the robot values future rewards
▶️ Step 4: Start Training
Click "Start Training" to begin. Watch the robot improve over time. The progress bars show learning improvement across episodes.
📊 Learning Concept:
Q(s,a) = Q(s,a) + α [R + γ max Q(s',a') - Q(s,a)]
// This is the Bellman equation for Q-learning
❓ Frequently Asked Questions
What is reinforcement learning?
▼
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize cumulative reward. It's like training a dog with treats – good actions get rewards, bad actions don't.
How does the robot learn to avoid obstacles?
▼
Through trial and error! When the robot hits an obstacle, it receives a negative reward (-5). Over many episodes, it learns which actions lead to obstacles and avoids them. The Q-learning algorithm updates its "knowledge" (Q-values) based on these experiences.
What do the training parameters mean?
▼
Learning Rate (α): How much new information overrides old information. High α = quick adaptation but may be unstable.
Exploration Rate (ε): Probability of taking a random action. Higher ε = more exploration.
Discount Factor (γ): Importance of future rewards. γ=0.9 means future rewards are valued almost as much as immediate ones.
Can I see the robot's decision-making process?
▼
Yes! The progress bars show how the total reward per episode improves over time. The "States Explored" counter shows how many different positions the robot has learned about. You can also watch its path change as it learns better routes.
How is this used in real robotics?
▼
Real robots use similar algorithms for navigation, manipulation, and task learning. Companies like Boston Dynamics use RL for walking robots, autonomous vehicles use it for driving policies, and warehouse robots use it for efficient item picking.
Why does the robot sometimes take weird paths?
▼
That's exploration! Early in training, the robot tries random actions to discover the environment. As it learns, it exploits known good paths but still explores occasionally (controlled by ε) to find potentially better routes.