Proceedings of the HRI 2015 Workshop on
Cognition: A Bridge between Robotics and Interaction
pp. 11-12, March 2015
State Prediction for Development of Helping Behavior in
Robots
Jimmy Baraglia
∗
Osaka University, Department
of Adaptive Machine System
2-1 Yamadaoka, Suita
Osaka, Japan
Yukie Nagai
Minoru Asada
Osaka University, Department
of Adaptive Machine System
2-1 Yamadaoka, Suita
Osaka, Japan
Osaka University, Department
of Adaptive Machine System
2-1 Yamadaoka, Suita
Osaka, Japan
ABSTRACT
Robots are less and less programmed to execute a specific
behavior, but develop abilities through the interactions with
their environment. In our previous studies, we proposed a
robotic model for the emergence of helping behavior based
on the minimization of the prediction-error. Our hypothesis, different from traditional emotion contagion models,
suggests that minimizing the difference (or prediction-error)
between the prediction of others’ future action and the current observation can motivate infants to help others. Despite
promising results, we observed that the prediction of others’
actions generated strong perspective differences, which ultimately diminished the helping performance of our robotic
system. To solve this issue, we propose to predict the effects
of actions instead of predicting the actions per se. Such an
ability to predict the environmental state has been observed
in young infants and seems promising to improve the performance of our robotic system.
1. INTRODUCTION
Young infants, from the beginning to the middle of their
second year of life, are able to altruistically help others with
no expectation of future rewards [7, 5, 4]. Traditional approaches suggest that an early form of empathy, or emotional contagion, is the primary behavioral motivation for
young infants to act altruistically [7, 2, 3]. Yet, recent experiments tend to show that a more general source of motivation prompts infants to help others achieving their unfulfilled goal [4]. To better understand the origin of altruistic
behavior and to program this ability into robots, we developed a hypothesis for the emergence of altruistic behavior in
which infants are not motivated to help others based on emotional contagion, but in order to minimize the predictionerror (hereafter PE) between others’ predicted future actions
and current observations [1]. Although our results gave significant proofs that PE minimization could be used as a
∗email: jimmy.baraglia@ams.eng.osaka-u.ac.jp
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Human Robot Interaction 2015, Portland, OR, USA
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.
behavioral motivation for robots to help others, computing
PE based on action prediction could not solve the differences between the own and others’ perspective. Therefore,
our robotic system failed to reliably achieve the expected
helping behavior. To solve this new issue, we must change
the way our robot perceives others’ actions and the consequences of these actions on the environment. Warneken
and Tomasello [7] showed that infants from 14 months of
age could help others by handing out an out-of-reach object directly to others, with almost no cases where infants
kept the object. This seems to indicate that infants prefer
to perform actions that would help achieving others’ goals,
rather than imitating the predicted actions. Furthermore,
other evidences strongly suggest that infants, already from
the age of 3 to 5 months, represent actions in terms of goals,
which is the relation between actors and objects. [6, 8].
Based on these evidences, it is clear that infants predict
the goal of observed actions rather that the actions themselves. Our model then needs to predict the future goal, or
targeted state, of an action and to estimate PE when the
state is not achieved as predicted. Consequently, PE will
be minimized when the goal is reached either by others or
by the robot regardless of the mean. The rest of this paper
is organized as follows: first, each module of our model is
briefly described, then the expected results are presented.
Finally a conclusion based on our previous results and literature evidences is given.
2. ROBOTIC MODEL
Our robotic model is a continuation of the work presented
by Baraglia et al. [1]. This model consists of five modules
and tries to minimize PE by executing actions in the environment to reach a predicted state. The details of each
module are presented in the following sections.
2.1
Scene recognition
The scene recognition module recognizes the environment’s
state including objects and others. An important point here
is that others are not differentiated from objects, instead
they are detected as parts of the environment. The recognized signals were chosen based the developmental studies
previously presented [6, 8].
2.2
Action-state memory
The action-state memory is built as a Markov decision
process (hereafter MDP) based on the robot’s own experience of executing actions. When an action performed by
the robot changes the environment’s state, the action and
3. EXPECTED RESULTS
Our previous results presented in [1] showed that estimating PE based on the prediction of actions caused strong
perspective biases. For instance, if the experimenter was attempting to grasp a ball but failed during the reaching, our
robotic model predicted the next action as being ”grasping”
and performed the same action to minimize PE. This action was successful from the robot perspective, but failed
in helping the experimenter and could not replicate the behavior observed in infants. However, if the future state of
the environment is predicted instead of the action, we can
expect that the minimization of PE will lead to a behavior
that would be helpful from the experimenter’s perspective.
Indeed, when observing others failing to achieve an action,
the robot will first recognize the current state of the environment. In a second time, it will predict the future state
based on its own experience and finally perform an action
that can achieve the predicted state and minimize PE.
4. CONCLUSIONS
Figure 1: Example of action-state memory. A: the
system updates his action-state memory by experiencing the action ”Moving an object O2 toward
another object O1 ”. B. The system generalizes its
memory to other objects and recognizes the current
state of OH and O1 , namely S1 highlighted in green.
the new state are memorized. As we assumed that others
are not differentiated from the environment, the system’s
own experience can be generalized for the recognition of the
environment’s state. For instance in Fig. 1 A, the robot
experienced putting two objects close to each other and can
generalize this experience to recognize the state of OH and
O1 in Fig. 1 B.
2.3
State prediction
The state prediction module estimates the future state
based on the current observation and using the action-state
memory. The prediction is applied to all the states recognized by the scene recognition module and the targeted goal
is predicted as the possible future state with the highest
probability. In Fig. 1 B, the recognized state is S1 , thus the
predicted state would be the future state with the highest
probability, here S2 .
2.4
Estimation of prediction-error
The estimation of prediction-error module estimates PE
between the current state of the environment and the future
state predicted by the state prediction module. If the predicted state is not achieved within a predicted duration, PE
increases accordingly.
2.5
Minimization of prediction-error
The minimization of the prediction-error module tries to
minimize PE when its value becomes larger than a predefined threshold. Using the action-state memory and the predicted future state, the system performs an action to minimize PE. For example, in Fig. 1 B, if the predicted state is
S2 , the system will perform the action Ai and Ai+1 , namely
”move OH toward O1 ” and ”touch OH with O1 ” to reach S2 .
To solve the perspective difference, we hypothesized that
our system should predict the targeted goal (or state) of an
action instead of predicting the future action. By generalizing self experience to the recognition of objects’ state in the
scene, our robot is then able to minimize PE by performing an action that achieves the predicted state, regardless
of the perspective differences. Such an approach is strongly
supported by developmental studies and its benefices on the
helping performances of our robotic system seem promising. Future experiments will test our assumption and prove
whether the state prediction can indeed improve the emergence of altruistic behavior.
5. REFERENCES
[1] J. Baraglia, Y. Nagai, and M. Asada. Prediction error
minimization for emergence of altruistic behavior. 4th
International Conference on Development and Learning
and on Epigenetic Robotics, pages 281–286, Oct. 2014.
[2] F. B. M. de Waal. Putting the altruism back into
altruism: the evolution of empathy. Annual review of
psychology, 59:279–300, Jan. 2008.
[3] J. Decety and M. Svetlova. Putting together
phylogenetic and ontogenetic perspectives on empathy.
Developmental cognitive neuroscience, 2(1):1–24, Jan.
2012.
[4] B. Kenward and G. Gredebäck. Infants help a
non-human agent. PloS one, 8(9):e75130, Jan. 2013.
[5] H. Over and M. Carpenter. Eighteen-month-old infants
show increased helping following priming with
affiliation: Research report. Psychological Science,
20(10):1189–1193, Oct. 2009.
[6] J. a. Sommerville, A. L. Woodward, and A. Needham.
Action experience alters 3-month-old infants’
perception of others’ actions. Cognition, 96(1):B1–11,
May 2005.
[7] F. Warneken and M. Tomasello. Helping and
cooperation at 14 months of age. Infancy,
11(3):271–294, 2007.
[8] A. L. Woodward. Infants’ grasp of others’ intentions.
Current directions in psychological science, 18(1):53–57,
Feb. 2009.