On April 3rd 2018, Data Science Milan organized an event about Reinforcement Learning.
The event was presented by Orobix (an Italian engineering company focused on building artificial intelligence-powered systems) and hosted by Buildo.
Luca Antiga, CEO at Orobix, introduced the basics of Reinforcement Learning.
RL bumped into popularity when Deepmind, which wasn’t owned by Google yet, published a paper on Nature (https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf).
In the paper Deepmind coupled reinforcement learning with DL. The algorithm could have equal or exceed human performance on some Atari games. It used only raw pixels as inputs and from that could devise a strategy. The agent had no previous knowledge about game rules.
Luca introduced the concepts of Agent, State, Action, Environment and Reward, which are all foundational to the theory of RL.
He then explained the concept of Markov Decision Process, policy, value function and q-value function, and how quickly becomes unfeasible to compute optimal policies and hence the need for function approximations.
For a detailed introduction on the topic one can look at the following references:
UCL course by David Silver (Google Deepmind):
Richard S. Sutton and Andrew G. Barto textbook:
Daniele Cortinovis, physicist by training and Data Scientist at Orobix, gave then a great overview on the process of training the agent on some classic examples like the Cart-pole problem, Atari Breakout and Atari Pong, using PyTorch and OpenAI Gym.
Author: Fabio Concina
Data Scientist at kwantis