Academia.eduAcademia.edu

State Prediction for Development of Helping Behavior in Robots

2015

Robots are less and less programmed to execute a specic behavior, but develop abilities through the interactions with their environment. In our previous studies, we proposed a robotic model for the emergence of helping behavior based on the minimization of the prediction-error. Our hypoth- esis, dierent from traditional emotion contagion models, suggests that minimizing the dierence (or prediction-error) between the prediction of others' future action and the cur- rent observation can motivate infants to help others. Despite promising results, we observed that the prediction of others' actions generated strong perspective dierences, which ul- timately diminished the helping performance of our robotic system. To solve this issue, we propose to predict the eects of actions instead of predicting the actions per se. Such an ability to predict the environmental state has been observed in young infants and seems promising to improve the per- formance of our robotic system.

Proceedings of the HRI 2015 Workshop on Cognition: A Bridge between Robotics and Interaction pp. 11-12, March 2015 State Prediction for Development of Helping Behavior in Robots Jimmy Baraglia ∗ Osaka University, Department of Adaptive Machine System 2-1 Yamadaoka, Suita Osaka, Japan Yukie Nagai Minoru Asada Osaka University, Department of Adaptive Machine System 2-1 Yamadaoka, Suita Osaka, Japan Osaka University, Department of Adaptive Machine System 2-1 Yamadaoka, Suita Osaka, Japan ABSTRACT Robots are less and less programmed to execute a specific behavior, but develop abilities through the interactions with their environment. In our previous studies, we proposed a robotic model for the emergence of helping behavior based on the minimization of the prediction-error. Our hypothesis, different from traditional emotion contagion models, suggests that minimizing the difference (or prediction-error) between the prediction of others’ future action and the current observation can motivate infants to help others. Despite promising results, we observed that the prediction of others’ actions generated strong perspective differences, which ultimately diminished the helping performance of our robotic system. To solve this issue, we propose to predict the effects of actions instead of predicting the actions per se. Such an ability to predict the environmental state has been observed in young infants and seems promising to improve the performance of our robotic system. 1. INTRODUCTION Young infants, from the beginning to the middle of their second year of life, are able to altruistically help others with no expectation of future rewards [7, 5, 4]. Traditional approaches suggest that an early form of empathy, or emotional contagion, is the primary behavioral motivation for young infants to act altruistically [7, 2, 3]. Yet, recent experiments tend to show that a more general source of motivation prompts infants to help others achieving their unfulfilled goal [4]. To better understand the origin of altruistic behavior and to program this ability into robots, we developed a hypothesis for the emergence of altruistic behavior in which infants are not motivated to help others based on emotional contagion, but in order to minimize the predictionerror (hereafter PE) between others’ predicted future actions and current observations [1]. Although our results gave significant proofs that PE minimization could be used as a ∗email: jimmy.baraglia@ams.eng.osaka-u.ac.jp Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Human Robot Interaction 2015, Portland, OR, USA Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. behavioral motivation for robots to help others, computing PE based on action prediction could not solve the differences between the own and others’ perspective. Therefore, our robotic system failed to reliably achieve the expected helping behavior. To solve this new issue, we must change the way our robot perceives others’ actions and the consequences of these actions on the environment. Warneken and Tomasello [7] showed that infants from 14 months of age could help others by handing out an out-of-reach object directly to others, with almost no cases where infants kept the object. This seems to indicate that infants prefer to perform actions that would help achieving others’ goals, rather than imitating the predicted actions. Furthermore, other evidences strongly suggest that infants, already from the age of 3 to 5 months, represent actions in terms of goals, which is the relation between actors and objects. [6, 8]. Based on these evidences, it is clear that infants predict the goal of observed actions rather that the actions themselves. Our model then needs to predict the future goal, or targeted state, of an action and to estimate PE when the state is not achieved as predicted. Consequently, PE will be minimized when the goal is reached either by others or by the robot regardless of the mean. The rest of this paper is organized as follows: first, each module of our model is briefly described, then the expected results are presented. Finally a conclusion based on our previous results and literature evidences is given. 2. ROBOTIC MODEL Our robotic model is a continuation of the work presented by Baraglia et al. [1]. This model consists of five modules and tries to minimize PE by executing actions in the environment to reach a predicted state. The details of each module are presented in the following sections. 2.1 Scene recognition The scene recognition module recognizes the environment’s state including objects and others. An important point here is that others are not differentiated from objects, instead they are detected as parts of the environment. The recognized signals were chosen based the developmental studies previously presented [6, 8]. 2.2 Action-state memory The action-state memory is built as a Markov decision process (hereafter MDP) based on the robot’s own experience of executing actions. When an action performed by the robot changes the environment’s state, the action and 3. EXPECTED RESULTS Our previous results presented in [1] showed that estimating PE based on the prediction of actions caused strong perspective biases. For instance, if the experimenter was attempting to grasp a ball but failed during the reaching, our robotic model predicted the next action as being ”grasping” and performed the same action to minimize PE. This action was successful from the robot perspective, but failed in helping the experimenter and could not replicate the behavior observed in infants. However, if the future state of the environment is predicted instead of the action, we can expect that the minimization of PE will lead to a behavior that would be helpful from the experimenter’s perspective. Indeed, when observing others failing to achieve an action, the robot will first recognize the current state of the environment. In a second time, it will predict the future state based on its own experience and finally perform an action that can achieve the predicted state and minimize PE. 4. CONCLUSIONS Figure 1: Example of action-state memory. A: the system updates his action-state memory by experiencing the action ”Moving an object O2 toward another object O1 ”. B. The system generalizes its memory to other objects and recognizes the current state of OH and O1 , namely S1 highlighted in green. the new state are memorized. As we assumed that others are not differentiated from the environment, the system’s own experience can be generalized for the recognition of the environment’s state. For instance in Fig. 1 A, the robot experienced putting two objects close to each other and can generalize this experience to recognize the state of OH and O1 in Fig. 1 B. 2.3 State prediction The state prediction module estimates the future state based on the current observation and using the action-state memory. The prediction is applied to all the states recognized by the scene recognition module and the targeted goal is predicted as the possible future state with the highest probability. In Fig. 1 B, the recognized state is S1 , thus the predicted state would be the future state with the highest probability, here S2 . 2.4 Estimation of prediction-error The estimation of prediction-error module estimates PE between the current state of the environment and the future state predicted by the state prediction module. If the predicted state is not achieved within a predicted duration, PE increases accordingly. 2.5 Minimization of prediction-error The minimization of the prediction-error module tries to minimize PE when its value becomes larger than a predefined threshold. Using the action-state memory and the predicted future state, the system performs an action to minimize PE. For example, in Fig. 1 B, if the predicted state is S2 , the system will perform the action Ai and Ai+1 , namely ”move OH toward O1 ” and ”touch OH with O1 ” to reach S2 . To solve the perspective difference, we hypothesized that our system should predict the targeted goal (or state) of an action instead of predicting the future action. By generalizing self experience to the recognition of objects’ state in the scene, our robot is then able to minimize PE by performing an action that achieves the predicted state, regardless of the perspective differences. Such an approach is strongly supported by developmental studies and its benefices on the helping performances of our robotic system seem promising. Future experiments will test our assumption and prove whether the state prediction can indeed improve the emergence of altruistic behavior. 5. REFERENCES [1] J. Baraglia, Y. Nagai, and M. Asada. Prediction error minimization for emergence of altruistic behavior. 4th International Conference on Development and Learning and on Epigenetic Robotics, pages 281–286, Oct. 2014. [2] F. B. M. de Waal. Putting the altruism back into altruism: the evolution of empathy. Annual review of psychology, 59:279–300, Jan. 2008. [3] J. Decety and M. Svetlova. Putting together phylogenetic and ontogenetic perspectives on empathy. Developmental cognitive neuroscience, 2(1):1–24, Jan. 2012. [4] B. Kenward and G. Gredebäck. Infants help a non-human agent. PloS one, 8(9):e75130, Jan. 2013. [5] H. Over and M. Carpenter. Eighteen-month-old infants show increased helping following priming with affiliation: Research report. Psychological Science, 20(10):1189–1193, Oct. 2009. [6] J. a. Sommerville, A. L. Woodward, and A. Needham. Action experience alters 3-month-old infants’ perception of others’ actions. Cognition, 96(1):B1–11, May 2005. [7] F. Warneken and M. Tomasello. Helping and cooperation at 14 months of age. Infancy, 11(3):271–294, 2007. [8] A. L. Woodward. Infants’ grasp of others’ intentions. Current directions in psychological science, 18(1):53–57, Feb. 2009.