Autonomous design of experiments for learning by experimentation

Prassler, E.; Kahl, B.; Henne, T.; Juarez, A.; Reggiani, Monica

doi:10.1007/978-3-7908-2127-7_13

In Artificial Intelligence, numerous learning paradigms have been developed over the past decades. In most cases of embodied and situated agents, the learning goal for the artificial agent is to „map“ or classify the environment and the objects therein [1, 2], in order to improve navigation or the execution of some other domain-specific task. Dynamic environments and changing tasks still pose a major challenge for robotic learning in real-world domains. In order to intelligently adapt its task strategies, the agent needs cognitive abilities to more deeply understand its environment and the effects of its actions. In order to approach this challenge within an open-ended learning loop, the XPERO project (http://www.xpero.org) explores the paradigm of Learning by Experimentation to increase the robot's conceptual world knowledge autonomously. In this setting, tasks which are selected by an actionselection mechanism are interrupted by a learning loop in those cases where the robot identifies learning as necessary for solving a task or for explaining observations. It is important to note that our approach targets unsupervised learning, since there is no oracle available to the agent, nor does it have access to a reward function providing direct feedback on the quality of its learned model, as e.g. in reinforcement learning approaches. In the following sections we present our framework for integrating autonomous robotic experimentation into such a learning loop. In section 1 we explain the different modules for stimulation and design of experiments and their interaction. In section 2 we describe our implementation of these modules and how we applied them to a real world scenario to gather target-oriented data for learning conceptual knowledge. There we also indicate how the goaloriented data generation enables machine learning algorithms to revise the failed prediction model.