Introduction to Reinforcement Learning –
This blog post is an attempt to tackle the question of what reinforcement learning really is? If you have heard all the buzz around reinforcement learning and don’t know about it then please keep reading. Also, follow the blog if you like my attempt!
We will try to develop an intuition about reinforcement learning. This post targets beginners, however if you are really interested in RL and want to dive deep then make sure you checkout this playlist – Reinforcement Learning Playlist. I will be adding more videos in the mentioned playlist.
So if you want a one liner then Reinforcement Learning is an area of machine Learning in which we “Learn to make good sequence of decisions!“.
Now we have to understand some important keywords here, i.e Learn, Good and Sequence of decisions. We have to come up with an architecture or system, which
- is capable of learning – Find unknown solutions, make decisions online(even in unseen circumstances).
- can make good sequence of decisions – It is different from RNNs(If you are wondering how then One argument is that RL works best in Markov Decision Processes, which doesn’t rely on history. The present state is enough for future prediction, however in LSTMs we need history. They both solve different problems. Read more). Here in Reinforcement Learning, we have to keep making good decisions if we want to win game of Chess or Go, the rewards(positive or negative) can be delayed. You take a move at the start (take Ruy Lopez or the King’s Indian Defence if you are white in chess) and at the end only you will know whether it was a winning move or not, the time period can be anywhere between 17 moves or 269 moves (20 hours and 15 mins)!
Since we are talking about learning, there are broadly two types of learning – active and passive(learn by example). Now a question for you guys – A human can learn without examples of optimal behaviour. Right? (Have you seen some one like Sachin Tendulkar, Messi or Federer? Who did they replicate? If they didn’t replicate then is active learning the only way? Or passive in the beginning and then switch on active mode?)
So basically RL involves learning only with reward signals, no supervision, delayed feedback and with a caution that past predictions may affect future decisions! Phew!
Core concepts of Reinforcement Learning –
There are following concepts in RL –
- Reward Signal
There is an environment in which the agent(which learns and makes decisions) takes permissible actions to get maximum positive reward. Simple!
Note – Actions taken by agent may change the state of environment. For example if you take a move on a chess board then the state of the board has changed, however if you punch a boxing bag then it will return back to its original position.
Agent and Environment
At each time step t the agent :
- receives an observation or environment state and reward.
- executes an Action
- receives an action
- emits observation and reward
If environment state is fully observable then observation = environment state. There is something known as POMDP(Partially Observed Markov Decision Process).
Reinforcement Learning is not just a subset of Machine Learning or Deep Learning. Deep Reinforcement Learning is a very active and emerging area of research. Organisations like DeepMind and OpenAI are actively involved in this field.
This image is taken from David Silver’s slides.
Reinforcement Learning interests me particularly because it lies in cross section of mathematics, computer science psychology and neuroscience(my favourites).
Now Obviously there are many more things we can discuss, Markov Reward Process, Model based and Model free agents etc. However, we will keep those things for later posts.
If you want to learn more about RL then check these resources out –
Happy Coding! 🙂