24 novembre 2017

Developmental or Reinforcement Learning ?

Similarities and differences between reinforcement learning and developmental learning


Quite often, when I try to explain developmental learning to people having some knowledge of AI, I get this usual comment « oh, but it’s just like reinforcement learning »! Well… no, it’s not… and in the meantime, I understand why they can get confused.


So let’s dive deeper in the similarities and differences between these two AI paradigms.


For a start, let’s define both concepts.


Reinforcement learning (RL) is a class of learning algorithms designed to learn a set of actions to perform in a given situation in order to maximise a reward over time. Reinforcement learning consists in performing experiments in an environment and to build strategies, often called politics, for the next actions based on the rewards received. Basic building blocs of RL algorithms are states (of the environment), actions (that the agent can perform) and rewards that the agent can get. Over time, the agent adjusts its politics given the rewards it received. Reinforcement learning is successfully applied in many domains such as control in robotics, agent simulations, physical models simulations, planification, etc. Last, it must be noted that they are well suited to MDP (Markov Decision Processes) or even POMDP (Partially Observable Markov Decision Processes).


Developmental learning (DL) is also an AI paradigm. Developmental agents also have to learn strategies to evolve in an environment, chose next actions, but they don’t receive a reward from the environment. Instead, they rely on their intrinsic motivation to make decisions. Developmental learning draw inspiration from developmental psychology (Piaget) and relies on important hypotheses, namely: intrinsic motivation of the agent, no access to a full representation of the environment, active perception, and constructivist approach. They suit well to non markovian processes and are able to perform hierarchical learning of sequences.


So, what are the main differences?


The origin of the reward. In RL, the reward comes from the environment. In DL, the reward comes from the intrinsic motivation of the agent. It makes a huge difference in the way we model the system and the agent. In RL, we have to model the environment so that the agent can learn from it. In DL, on the other hand, we have to carefully design the motivation system of the agent so it can learn relevant sequences of interactions.


– As a consequence of the previous observation, we can observe that RL agents are designed to operated in a specific environment and won’t be able to process information that is not part of the initial setup. Contrarily, DL agents may be less efficient in discovering strategies in basic environments, but they are able to integrate unforeseen information in their learning process. This makes DL agents much more suitable to open environments.


Evaluation. The goal of RL agents is often quite different than the one of DL agents. In RL, the agent usually has a task to perform, and the reward provided by the environment is directly linked to the definition of this task. Hence, evaluating RL agents is quite straightforward (convergence, number of iterations, etc.). On the opposite, by design, DL agents are not implemented to solve a specific task… they are designed to build their own representations / understanding of their environments. This makes evaluation of DL agents more difficult: we can observe the emergence of behaviors and analyse the activity trace of the agent, but we cannot really monitor their ability to solve a specific task.  


So, what are the similarities?


Well, in both cases, the agent learns by exploring the environment. In both cases, we need a form of fitness function (should it be internal or external to the agent) to make decisions given the “rewards” obtained… and in both cases, we can use similar algorithmic implementations. To say it another way, the spirit is similar, but the way we implement the environment and the motivation of the agent differ quite a lot… and therefore the use cases do as well.


We should not see RL and DL as opposite concepts. Instead, we could see them as complementary. Each one as its own specificities and benefits. Some researches even implement clever combinations of the two intuitions. See [3] or [4] for more information.  


If you want to dig deeper on that topic here are a few reference.


[1] Olivier L. Georgeon, Rémi Casado & Laëtitia Matignon (2015). « Modeling Biological Agents Beyond the Reinforcement Learning Paradigm ». International Conference on Biologically Inspired Cognitive Architecture, 6 november 2015, Lyon (France), pp. 17-22. doi : 10.1016/j.procs.2015.12.179. HAL : hal-01251602.


[2] Olivier L. Georgeon & Amélie Cordier (2014). « Inverting the Interaction Cycle to Model Embodied Agents ». Procedia Computer Science, vol. 41, pp. 243-248. doi : 10.1016/j.procs.2014.11.109. HAL : hal-01131263.


[3] Alain Dutech. Self-organizing developmental reinforcement learning. International Conference on Simulated Animal Behavior, 2012, Odense, Denmark. 2012.


[4] Jonathan Mugan and Benjamin Kuipers. Towards the Application of Reinforcement Learning to Undirected Developmental Learning. Schlesinger, M., Berthouze, L., and Balkenius, C. (2008) Proceedings of the Eighth International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. Lund University Cognitive Studies, 139.



Leave a Reply