Q-learning
Q-learning is a model-free way of reinforcement learning
So, why doesn't Q-learning need a model? Good question! Q-learning is simply trying to create a policy by minimizing a loss function based on random actions.
An example
Let's say we want to let a DQN (Deep Q Network) learn to drive a car. How would we do this?
We'd simulate the environment digitally
We tell the DQN that it loves candy
We'd allow the DQN to do whatever it wants but we give it candy if it does something we want it to do.
Now the first bunch of training sessions our network will continue to run into trees and rocks and other cars however after learning that it gets candy for good driving behaviour it starts to drive into trees less and less often, knowing that it can get more candy if it behaves well.
Congratulations! You just created a DQN with a loss function (the candy) and
Last updated
Was this helpful?