Yoon Kevin 2007 2
Yoon Kevin 2007 2
Yoon Kevin 2007 2
Abstract— In order for robots to act as valuable assistants for tasks where the features that affect task flow must be explic-
non-expert users, they need to be able to learn new abilities and itly conveyed and not inferred. Additionally, because tasks
do so through natural methods of communication. Furthermore, are symbolically referenced with natural language labels,
it is often desirable that tasks be learned quickly without
having to provide multiple demonstrations. Training should they are transferable across heterogeneous robots that share
also be conducted in such a way that the user has a clear the same or a similar primitive behavior set.
understanding of the manner in which environmental features In this paper, we present some enhancements and mod-
affect the behavior of the learned activity, so that execution ifications to this task training technique that include the
behavior is predictable. ability to capture conditional looping so that repetitive, or
We present an interactive framework for teaching a robot the
flow of an activity composed of elements from a set of prim- cyclic tasks, can be created. Interrupt events that may occur
itive behaviors and previously trained activities. Conditional at any point during a task can also be specified to trigger
branching and looping, order-independent activity execution, contingency actions. Additionally, a new construct called a
and contingency (or interrupt) actions can all be captured by todolist has been added, which permits order-independent
our activity structures. Additional convenience functionality to activity execution. Moreover, tasks can now be trained “on
aid in the training process is also provided.
By providing a natural method of communicating production the fly” — that is, while training another task that uses
rules analogous to rigid programming structures, well-defined it — to support a top-down design approach while still
tasks can be trained easily. We demonstrate our task training permitting the bottom-up construction of tasks. Furthermore,
procedure on a mobile robot. locational context is no longer inferred automatically as this
is not always desirable in some situations. However, location-
I. I NTRODUCTION
specific actions can still be specified explicitly with a simple
In the future, robots will inevitably be employed as assis- grounding utterance.
tants or team partners. However, if such robots are ever to
gain widespread and long term acceptance, they will need to II. R ELATED W ORK
be capable of not only learning new tasks, but also learning Robot task learning and programming-by-demonstration
them from non-expert users. (PBD) has been explored by several groups. In [1], [2],
We have previously introduced a method for task training and [12], robots learn actuator trajectories or control policies
via dialog and demonstration in [11]. Therein we described from user task demonstrations. In [13], a task is built using
a collaborative natural language procedure for constructing gestures by discerning which primitive actions, from a base
tasks from a set of primitive behaviors and/or previously- set of capabilities, can be combined to conduct the task
trained tasks, which in turn could be used to build other demonstrated.
tasks. This modular task architecture supports an expanding Our method has the ability to discern, to a limited extent,
repertoire of abilities. Different training modes enable differ- which primitive actions should be combined to execute a
ent features, such as the ability to attach locational context to given task by way of inferring locational context on actions.
a given command, reducing the explanatory responsibilities We note, however, that this is not the main focus of our work,
of the human trainer. Preconditions on task actions serve as nor is it meant for deriving low-level control strategies. It is
a failure-handling mechanism that appropriately directs task primarily a method by which the control flow of a task, using
flow should an action fail. The robot also engages the human primitive actions and previously learned tasks, can be com-
in a verification dialog to resolve ambiguities in task flow municated through a training procedure employing natural
and, in so doing, brings about mutual understanding of the interaction, thereby converging to mutual task understanding
task representation. for both robot and user.
This understanding can be desirable, sometimes essential, Our work is largely inspired by [8] and [7]. In [8], a mobile
in situations where the time or opportunity to provide multi- robot is joysticked through multiple demonstrations of a task
ple demonstrations and/or make corrections through practice from which it generates a generalized task representation in
trials is unavailable, and the chance of the robot exhibiting the form of a directed acyclic graph (DAG). The task is then
unexpected behavior due to conditions unencountered during pruned down to a linear sequence through teacher feedback
training is unacceptable. The training dialog described herein in the form of verbal cues over multiple practice trials.
enables the human to quickly construct rigidly-formulated In [7], a stationary humanoid robot that understands some
speech, though it is also unable to speak itself, learns tasks Selector that, upon parsing given speech commands, places
by communicating through gestures and facial expressions. the appropriate activities in the Activity Repertoire onto the
Our approach employs a similar turn-taking framework for Current Activity List for execution.
instruction and task refinement, but we endow the robot with
the capability of speech which we believe conveys more
directly the robot’s understanding of the task and guides the
human more effectively in resolving ambiguities. In this way,
we obviate the need to refine a learned task through practice.
Similar dialog-driven interaction mechanisms have been
developed in the area of plan recognition, though primarily
in the Human-Computer Interaction (HCI), as opposed to
Human-Robot Interaction (HRI), domain. A plan recognition
algorithm is introduced in [10] and [6] where characteristics
of the collaborative setting are exploited to reduce the amount
of input required of the user. This recognition strategy,
however, requires some prior knowledge in the form of
Fig. 1. CMAssist software architecture
SharedPlans (or mutually-believed goals, actions, intentions)
and a set of recipes (or action plans for achieving goals). This
work differs from ours in that the goal is to help the user An activity is the encompassing term for behaviors, tasks,
accomplish tasks according to perceived intent whereas we and todolists which are described in more detail in Section
are striving to teach a robot new tasks. Our approach could IV. The Activity Repertoire is the collective map that asso-
potentially be used instead to build the recipes necessary for ciates natural language symbolic labels to known activities.
this plan recognition method to work. For example, “Go to” in the phrase “Go to the door” maps
In [9], an augmentation-based learning approach is de- directly to the navigation behavior which would be put onto
scribed. The task structure, including conditional branching the Current Activity List with the location parameter “door”.
and looping, is inferred from user demonstration. Manual An activity building behavior can also add new activities to
edits can also be made to fix incorrect task structures and the Activity Repertoire as will be shown in Section V-A.
constrain the induction procedure on subsequent demon- Though the various activity types have differing internal
strations. Again, this approach is explored in the software structures, they are all executed by the same function form
application domain and there is no effort to conduct a where the inputs are the sensors and a command object,
collaborative discourse with the user for natural interaction. and the outputs are an integer status flag and a new
Additionally, in our work, branching and looping structures command object.
are explicitly and quickly communicated by the user, rather
(status, command) = Activity(sensors, command)
than being inferred over multiple demonstrations.
A multi-modal interface for programming tasks is de- The sensors object gives an activity module access to
scribed in [4] that additionally allows the user to control task sensory data while command is an object that can be
priority during execution. Instruction-Based Learning [5] is modified by an activity to store actuator commands, such as
similar to our work in that it uses a base set of behaviors that motor velocities or speech output. A single command object
are associated with natural language symbolic labels and a is passed through each of the activities in the Current Activity
modular architecture for symbolic tasks. List so that commands requested by activities of lower prior-
None of these works, however, describe the ability to ity are visible to higher priority activities. Activities can take
convey branching or looping flow constructs within the task this information into account when actuator commands need
structure that are conditioned on explicitly-communicated to be overridden. For example, when the obstacle avoidance
features. Nor do they address the issue of structuring tasks behavior needs to decide whether to veer left or right to
for activities that need not be executed in the order in which circumvent an obstacle, it can check the command object
they were communicated. This severely limits robustness and to see in which direction the navigation behavior was trying
the types of tasks that can be trained. Through speech one to drive the robot and choose to go in a similar direction.
can very compactly format instructions for execution based The main execution loop then involves processing all of the
on detectable environmental states. No intention beliefs are activities in the Current Activity List with the given sensory
maintained that may result in unexpected behavior during data. When the last activity on the Current Activity List is
execution, but rather, by engaging the user in a true spoken completed, status is routed back to the Activity Selector
dialog, we can quickly train tasks with clearly defined which determines if behaviors need to be removed from the
execution flow that is necessarily understood by the user. Current Activity List. The command object is processed to
drive the actuators.
III. S YSTEM OVERVIEW The Activity Selector is triggered on speech input and
Figure 1 depicts a simple overview of the system architec- is responsible for inserting commanded activities, removing
ture we employ. Within the top-level behavior is an Activity conflicting ones, and removing completed or failed activities.
IV. ACTIVITY S TRUCTURES
A. Behaviors
A behavior maps low-level sensory data to actuator trajec-
tories in order to accomplish some high-level goal(s). The
robot is assumed to be preprogrammed with some basic set
of behaviors. For a mobile platform, these primitive skills
might include obstacle avoidance and high-level navigation (a) Linear (b) Conditional branching (c) Conditional looping
capabilities.
Fig. 3. Task flow structures
B. Tasks
The basic building block of a task is the task item
(Figure 2). A task item consists of three main components: a is the activity to execute when k is true and r is a boolean
a (potentially empty) precondition list, an activity and a list value determining whether or not the original task should be
of execution parameters, and a pointer list to subsequent resumed when either k is no longer true or a has completed.
task items. The precondition list contains the conditions that
must be satisfied before the action can be executed. There C. Todolists
are two types of preconditions: enabling and permanent. Todolists are a special type of activity that allows the
Enabling preconditions are evaluated only once before the user to specify a list of items that are to be executed in
task item’s activity is executed. Permanent preconditions are no particular order. These todolist items, as with task items,
monitored continuously for as long as the activity is being can refer to any activity: behaviors, tasks, and other todolists.
executed. As previously mentioned, an activity can refer to a There is nothing unique about the structure of a todolist.
behavior, a previously-trained task, or a todolist. Depending It is simply a list of disconnected activities that, unlike
on the completion status of the activity (i.e. success or fail), tasks, cannot capture conditional branching and looping.
the associated link is followed to the next task item to be It is rather the manner in which a todolist is executed
executed. that distinguishes it from the other activities enabling it to
accomplish unordered tasks as people do on a daily basis.
We currently employ a round-robin execution scheme
where we iteratively loop through the list and attempt each
item until it has either completed successfully or failed
maxN umT ries times, where maxN umT ries is specified
during training.
Fig. 2. Task item
Clearly, some optimal scheduling strategy to minimize
failed attempts could be applied here when taking into ac-
count information like estimated todolist item durations and
A task then is a temporally ordered sequence of task items
reasons for past failures. Item priority could be an additional
captured in a directed graph structure. They can represent
constraint that such a strategy might take into account. This
simple linear sequences such as in Figure 3(a). Here, the
is beyond the scope of this work where we simply provide a
robot executes Task items 1 through N in order. Tasks can
construct in which order-independent execution of activities
also represent conditional branching as shown in Figure 3(b).
is made possible.
Depending on the evaluation of <condition>, either Task
item 2a or Task item 2b will be evaluated followed by
V. T RAINING
whichever tasks follow it until the branches reconnect at Task
item N. Cyclic tasks can be represented by loops as shown The basic method behind the training approach we employ
in 3(c). For as long as <condition> is true, Task item 2 is allowing the user to convey production rules through the
and the subsequent task items inside the loop are executed. primary method of speech. Each recognized user utterance is
This is made possible by applying the while-condition as a mapped to one of three things: (1) an activity in the activity
permanent precondition on all task items inside the loop. repertoire that is to be appended to the current activity
For some tasks it may be necessary to execute contingency structure, (2) a control structure that affects where and how
activities, such as when some event occurs requiring special subsequent activities are appended to the current activity
action and the current task be put on hold. Rather than structure, or (3) a “special” command, such as a question
inserting if and while statements throughout the task, the that the user might ask during the training procedure.
user can optionally specify contingency event-action pairs Throughout the training procedure, the robot responds
that are checked for the duration of the task execution. Unlike with an affirmative “ok” after every user utterance to indicate
the previous conditional constructs, a contingency plan is not understanding. The robot will also ask the user questions
represented within the directed graph itself but is an attribute about parameters that were not defined when the user has
of the task structure. Each task has an associative structure finished training, thus guiding the user through dialog to-
that maps an interrupt event k to an action tuple (a, r), where wards a well-defined activity structure.
A. Training Tasks would begin the sing a song task if Paul was detected at
Task training is itself a behavior that can be invoked in any time during the task (i.e. while dancing or charging
one of two modes: dialog-only and dialog-and-observation batteries) and would continue to do so until the sing a song
modes. The former is invoked with the keyphrase “When I task completed or Paul became no longer visible. During
say T ” and the latter with “Let me show you what to do when training, the robot also asks the user if it should resume the
I say T ”, where T is the name of the task to be trained and original task after executing the contingency action.
is typically an imperative statement. In dialog-only mode, Special utterances can be used to indicate that the task
all commands must be issued to the robot verbally. In should be exited. “Exit task” and “The task has failed” create
dialog-and-observation mode, the robot invokes its following task items that when executed will terminate the task, the first
behavior such that it is always in the vicinity of the human with a success flag and the latter with a failure flag. (The
trainer as he moves around the environment. In this manner, task exits with a success flag by default even when “Exit
the robot can interpret deictic utterances like “come here”. In task” is not said.) This is particularly useful when tasks are
the previous work [11], this mode was used to automatically used in a todolist where the return status indicates whether
attach locational context to each command given by the user. a todolist item should be reattempted or not.
In an effort to provide a framework for the training of more As can be seen, this approach to task training places
general tasks — where it is not necessarily appropriate to more of the design burden on the user than some of the
assume that actions should be executed where they were PBD techniques mentioned in Section II, but it comes with
demonstrated — locational contexts are no longer assumed the added benefit of increased mutual task understanding
but can be easily and naturally anchored to subsequent between the user and robot and consequently more pre-
commands with the “come here” phrase. dictable execution behavior. Also, tasks cannot be overfit
Task flow control is communicated by keyphrases summa- to training set conditions because task flow depends on
rized in Table I. An example of a user utterance that creates explicitly specified features. Moreover, the natural interaction
a conditional branching structure (Figure 3(b)) is “If you see framework allows for quick and easy construction of tasks.
Kevin, say ‘Hi Kevin’. Otherwise, say ‘Where is Kevin?’ Figure 4 shows a simple schematic for this Task-building
before looking for Paul”. The resulting task would cause behavior where we can see the speech input being processed
the robot to say either “Hi Kevin” or “Where is Kevin” by the Speech Parser. Therein, we first check if the utterance
depending on whether Kevin was detected. It would then is a special command, such as those shown in Table II. If it is
begin the activity called looking for Paul. not, then we check if it is a flow control command and add
nodes or update pointers to the task under construction as
TABLE I appropriate. If it is not that, then we check if it corresponds
TASK FLOW COMMANDS to an activity that already exists in the Activity Repertoire.
Command Description
If so, then we add a task item containing the activity to
“If Appends a conditional node to the task graph. Sub- the task under construction. Finally, if the user has ended
<condition>” sequent commands are added to True branch. the task training sequence, the robot engages the human
“Otherwise” Causes subsequent commands to be added to the in a verification dialog to confirm the task description by
False branch of the current if node.
“before” Connects True and False branches of current if node reading it back to the human and to acquire any additional
with the following command. (Ends if block.) information that might be necessary, such as what to do
“While Appends a conditional node to the task graph. Sub- when an if condition does not hold and the otherwise case
<condition>” sequent commands are added to True branch and
preconditioned on <condition>. was not specified, before saving the task to the Activity
“After that” Routes execution flow to the current while node and Repertoire. The command object passed out of the Speech
appends subsequent commands to the False branch. Parser contains speech output commands as well as any
(Ends while loop.)
“Meanwhile if Adds a contingency event to the task object and maps motor commands set by the Follow behavior.
<condition>” it to the next activity command. If <condition>
becomes true at any point during task execution, the
specified activity is executed.
“Exit Task” Appends a node that exits task with success flag.
“The task has Appends a node that exits task with fail flag.
failed”
TABLE II
T RAINING HELPER FUNCTIONS
Fig. 9. Execution transcript for the Patrol the lab task “Let’s make a todolist called talk about the robots”
<Ok, let’s begin>
“Give the QRIO tour”
“Give the AIBO tour”
“Is that understood?”
<Yes, you said give the QRIO tour.
Give the AIBO tour. Is this correct?>
“Yes”
<How many times shall I attempt each
item before giving up?>
(a) Execution: Drive around lab (b) Execution: Interrupt (Kevin) “Two”
<Ok. Todolist training is complete. Thank you.>
Fig. 12. Training transcript for the Talk about the robots task
Fig. 13. Training transcript for the Give the lab tour task
B. Give the lab tour
The second task is Give the lab tour where the robot acts
as a tour guide by taking the visitors to each exhibit and then
asking a human expert to talk about it in more detail. The task
involves greeting visitors at the door, executing its subtasks
Give the QRIO tour and Give the AIBO tour, and then saying
goodbye at the center of the lab. The task is similar to the
one described in [11] but includes two main differences: (1)
the phrase “The task has failed” is used in the subtasks and
(2) the subtasks are contained in a todolist. This enables the
robot to recognize when a task on the todolist has failed and Fig. 14. Give the lab tour task training
to reattempt it at a later time. The transcript for the training
of the task Give the QRIO tour is shown in Figure 11. The
task Give the AIBO tour is trained similarly except that it in Figure 15 and visualized in Figure 16. After greeting the
invokes the help of Paul, instead of Kevin, to talk about the visitors at the door, the robot begins the Talk about the robots
AIBOs. The training transcript for the todolist Talk about todolist and goes first to the QRIOs where it finds that Kevin
the robots and the top-level task Give the lab tour is shown is not there and consequently the Give the QRIO tour task
in Figures 12 and 13, respectively. The visualization of the fails (Figure 16(a)). It then travels to the AIBOs where it sees
Give the lab tour is shown in Figure 14. Paul and requests that he speak about them (Figure 16(b)).
The execution of the Give the lab tour task is transcribed When done, Paul tells the robot he is finished and the robot
0 “Give the lab tour”
Section II and could potentially be augmented with trajectory
Goto the door
<Welcome to the multi-robot lab> learning and task generalization techniques described therein.
11 Goto the QRIOs Learning of new objects, people, and locations, which
<These are the QRIOs. Unfortunately, was investigated to some extent in [3], would be a ca-
Kevin is not around to talk about them.> pability worth integrating into our system so that tasks
33 Goto the AIBOs
can be conditioned on new features. This would involve
49 <These are the AIBOs. Paul, could you please
talk about the AIBOs?> improved spatial reasoning and deictic expression compre-
Paul talks about the AIBOs and then tells the hension which would be useful in enhancing the dialog-and-
robot that he is finished. observation mode of training. It would also be appropriate
Goto the QRIOs to symbolically parameterize tasks so that they are more
78 <These are the QRIOs. Kevin, could
generalized. The task Give the AIBO tour then could use
you please talk about the QRIOs?>
Kevin talks about the QRIOs and then the same code as the Give the QRIO tour task, only it would
tells the robot that he is finished. use and be conditioned on different feature parameters. This
Goto the lab center would not only decrease training times, but would require
106 <Thank you for visiting> fewer resources due to code sharing.
Fig. 15. Execution transcript for the Give the lab tour task R EFERENCES
[1] D. Bentivegna, C. Atkeson, and G. Cheng. Learning from observation
and practice at the action generation level. In IEEE International
decides to retry the failed items in the todolist, i.e. Give the Conference on Humanoid Robots, Karlsruhe and Munich, Germany,
September/October 2003.
QRIO tour. This time Kevin is there and the task is completed [2] S. Calinon and A. Billard. Incremental learning of gestures by
successfully (Figure 16(c)). The tour is completed when the imitation in a humanoid robot. In Proceedings of the 2007 ACM/IEEE
robot goes to the lab center and thanks the visitors for coming International Conference on Human-Robot Interaction, Washington,
D.C., March 2007.
(Figure 16(d)). [3] A. Haasch, S. Hohenner, S. Huewel, M. Kleinehagenbrock, S. Lang,
I. Toptsis, G.A. Fink, J. Fritsch, B. Wrede, and G. Sagerer. Biron
— the bielefeld robot companion. In Proceedings of International
Workshop on Advances in Service Robotics, pages 27–32, Stuttgart,
Germany, May 2004.
[4] S. Iba, C. J.J. Paredis, and P. K. Khosla. Interactive multi-modal robot
programming. In Proceedings of IEEE International Conference on
Robotics and Automation, Washington D.C., May 2002.
[5] S. Lauria, G. Bugmann, T. Kyriacou, and E. Klein. Mobile robot
programming using natural language. Robotics and Autonomous
Systems, 38(3–4):171–181, 2002.
[6] N. Lesh, C. Rich, and C. Sidner. Using plan recognition in human-
(a) Give QRIO tour failed. (No (b) Giving AIBO tour computer collaboration. In Proceedings of the Seventh International
Kevin) Conference on User Modelling, Banff, Canada, June 1999.
[7] A. Lockerd and C. Brezeal. Tutelage and socially guided robot
learning. In Proceedings of IEEE/RSJ International Conference on
Intelligent Robots and Systems, Sendai, Japan, September 2004.
[8] M. Nicolescu and M. Matarić. Natural methods for robot task learning:
Instructive demonstration, generalization and practice. In Proceedings
of the Second International Joint Conference on Autonomous Agents
and Multi-Agent Systems, Melbourne, Australia, July 2003.
[9] D. Oblinger, V. Castelli, , and L. Bergman. Augmentation-based
learning: combining observations and user edits for programming
by demonstration. In Proceedings of the International Conference
on Intelligent User Interfaces, pages 202–209, Sydney, Australia,
(c) Giving QRIO tour. (Kevin is (d) “Thank you for visiting”
January-February 2006.
there now)
[10] C. Rich, C. Sidner, and N. Lesh. Collagen: Applying collaborative
Fig. 16. Give the lab tour task execution discourse theory to human-computer interaction. In AI Magazine,
Special Issue on Intelligent User Interfaces, November 2001.
[11] P. E. Rybski, K. Yoon, J. Stolarz, and M. Veloso. Interactive robot task
training through dialog and demonstration. In Proceedings of the 2007
ACM/IEEE International Conference on Human-Robot Interaction,
VIII. S UMMARY AND F UTURE W ORK Washington D.C., March 2007.
[12] J. Saunders, C. L. Nehaniv, and K. Dautenhahn. Teaching robots by
We have presented an enhanced task training procedure moulding behavior and scaffolding the environment. In Human-Robot
that permits the user to easily communicate a rich set of task Interaction, Salt Lake City, Utah, March 2006.
flow structures. Through dialog and observation of the user as [13] R. M. Voyles, J. D. Morrow, and P. K. Khosla. Towards gesture-based
programming: Shape from motion primoridal learning of sensorimotor
he moves around, this framework allows for natural methods primitives. Robotics and Autonomous Systems, 22:361–375, November
of conveying rigid production rules to construct these flow 1997.
structures when training a task.
There still remains some avenues to explore in giving our
robot system true utility as a personal assistant. Indeed, our
work is complementary to much of the work described in