Reinforcement Learning Game

This blog post is about a small script which I have written which will help you to understand basic concepts of Reinforcement Learning.

Recently, I came across a talk by Richard Sutton at Microsoft titled – “Tutorial: Introduction to Reinforcement Learning with Function Approximation”. In this tutorial, he demonstrates a tool which was written in common Lisp, I tried to replicate the same game in python here – github.

In this game, the player is a an agent. So in the world of reinforcement learning, there are two components. There is an agent and an environment with which agents interacts with.

Screenshot 2019-04-27 at 1.59.08 PM.png

The agent interacts with the environment by taking some permissible actions and accordingly gets a feedback(or reward).

Here in this game, there are two actions which the user(agent) can take  – {1, 2}. The environment has got two states – {A,B}. According to the action taken by user the state in the environment is affected.


How to play this game?

Following are the system requirements(as if I have made FIFA 😛 ) –

  • python3
  • numpy

after you have the system ready, then follow these steps –

> git clone https://github.com/vaibhawvipul/tildy-mdp.git
> python3 learn_mdp.py

That’s it!


About the Game –

true model of the world.jpeg

As we can see that, there are two states – state A and state B. Following are the scenarios in this game –

  • If state A and action 1 is taken then final state is A and small positive reward is earned.
  • If state A and action 2, 80% chances that final state is B, small negative reward. If final state is A then small positive reward.
  • If state B and action 1, 80% chances that final state is A, big positive reward is earned. If final state is B, small negative reward.
  • If state B and action 2, 80% chances that final state is A, small negative reward is earned. If final state is B, small negative reward.

The Optimal Strategy –

This game helps user understand the exploration vs exploitation dilemma and markov decision process(outcome is party random and partly under control of user).

As we can observe in above section that a good strategy can be to remain in state A and keep taking action 1 and get small positive rewards always.

However the optimal strategy is, when in state A, take action 2 with some negative reward. This will change the state to B and then take action 1 to get big positive rewards!

Screenshot 2019-04-27 at 2.16.56 PM.png


 

I hope this was a fun read! I really enjoyed coding this up! Thanks to Richard Sutton for inspiration.

This github repo is open for PRs.

Couple of ideas are –

  1. make this game to remember the total number of rewards so that a neural network can be trained on this.
  2. make the probabilities random so that every time this game is booted up, not even the programmer knows the optimal strategy.

Thanks for reading this! Happy coding! 🙂


 

If you liked this blog then please share it on twitter and follow this blog!

One thought on “Reinforcement Learning Game

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s