Avan 4

2018 IEEE International Conference on Systems, Man, and Cybernetics
Solving Home Robotics Challenges with Game

Theory and Machine Learning
James Lindsay Sidney Givigi
Department of Electrical and Computer Engineering Department of Electrical and Computer Engineering
Royal Military College of Canada Royal Military College of Canada
Kingston, Ontario, Canada Kingston, Ontario, Canada
email: James.Lindsay2@forces.gc.ca Email: sidney.givigi@rmc.ca
Abstract—The home environment introduces many challenges nature of a household can make it difficult for autonomous
for robots to operate, for the home is much more unstructured indoor robots to select the right paths to maximize resources.
than the environments in which industrial or commercial robots
Homes are also far more customized than commercial
are found. This paper looks at a path planning problem and
how to dynamically tune PID controllers that are to be used in spaces and industrial areas. This can manifest itself in that
the home environment. The method used for tuning is Learning most rooms within a home are different and can be made
Automata, specifically Finite Action Learning Automata for the of different materials. Very few homes just have one type of
prediction of the presence of people and a game of Continuous flooring. In most cases there is a mixture of carpet, tiles, and
Action Learning Automata for the derivation of PID controllers.
wood. This imposes problems when trying to design a robot.
Results show that the proposed method efficiently derives better
controllers for path planning when compared to a PID controller For example, when trying to design a wheeled robot for the
derived with a classical method. Furthermore, the method used home environment the range of the coefficient of friction is
to find acceptable waypoints shows that the robot is able to going to be very large. Having to deal with a large range in
approximate the location of people in a home-care application. the coefficient of friction means that one needs to have a low
Index Terms—Home Robotics; Game Theory; Learning Au- center of gravity and reserve resources such as battery power
tomata; Machine Learning; FALA; CALA
which makes the robot less efficient [2].
The constraints, restraints, and the environment in which
I. I NTRODUCTION
home robots must operate are different from industrial and
Industrial robotics is a very established field and the field commercial robotic applications. Therefore, many of the solu-
of commercial robotics is also maturing at a rapid rate [1]. tions that were developed for industrial and commercial robots
However, with the exception of some toy robots and robotic are either not relevant, not efficient, or need to be applied in
floor cleaners, the field of home robotics is still understudied a completely different manner to be of use in home robotics.
and not very well understood. This paper hopes to focus on two The paper is organized as follows. In Section II, we for-
problems that arise with a home robot and how those problems mulate the two problems we wish to solve in home robotics.
can be solved with existing machine learning techniques. Section III explores which machine learning techniques should
The home presents a very unstructured and dynamic envi- be used to solve the problems. Section IV provides a discus-
ronment. The home is an unstructured environment because sion of how game theory is applied to the solutions. Section V
of the humans and animals that occupy it. Indoor robots are discusses the design of the robot for the home environment.
required to interact closely with and around unpredictable In Section VI, we discuss the experiments developed to prove
people and animals. In industrial and commercial robotic that our design solves the home environment problem. Finally,
systems, designers have more control on how humans interact Section VII concludes the paper.
with a robot and the type of interactions that are likely to occur.
For example, a section of a factory might be blocked off to II. A HOME ROBOTICS APPLICATION
human access or every factory worker knows that the robots The Turtlebot was selected as the home robotics platform
follow lines painted on the floor of the factory. In commercial for its relative low cost. A Turtlebot consists of open source
applications, the robot at the front of the store is designed to software and commercial off-the-self components, namely an
greet people and provide assistance while the robot near the iRobot create and a Microsoft Kinect camera system. Another
checkout is designed to assist with checkout activities (cashier, advantage of using the Turtlebot is that there is already an ad-
bagger, gift wrapper, etc.). In a typical home however, within vanced 3-dimensional simulation environment called Gazebo,
a very short amount of time a perfectly straight and clear path which has an advanced physics engine and simulates the real
between two points can become cluttered, making a known world to a very high degree. An empty Gazebo environment
path either unusable or adding a significant amount of extra was used for all simulations in this paper.
distance which means extra resources would need to be spent Moreover, a Turtlebot could be easily modified to be used
in moving between two points. The unpredictable and dynamic as a serving robot for, since it has a flat top, it could be used
2577-1655/18/$31.00 ©2018 IEEE 1294

DOI 10.1109/SMC.2018.00226
to either place a tray of food or drinks on it, and move to can safely be applied. This type of learning requires two pro-
different locations within a house. Furthermore, it could also cedures: the first modifies an action while the second evaluates
be used in a home-care facility to transport medicine without the result of the action. Different RL methodologies differ on
the use of human operators, potentially lowering the risk of how these two procedures are carried out and calculated. An
infections. advantage that RL has over other learning methodologies is
The problem that arises for a serving robot is that humans that it does not need to know any other information about the
move throughout the house in no predefined way. This imposes environment except the result of the action [4].
two challenges: one, moving the Turtlebot from the base (food Among different RL methodologies for noisy, dynamic, un-
and drink preparation area) to spots in the house that are structured environments, one possible methodology is Learn-
likely to contain humans; and two, tuning the Turtlebot so ing Automata (LA). LA differs from other RL methodologies
that it maintains good performance even though it does not in that the search for the optimal action is conducted over the
know, a priori, its path, its weight and center of gravity as the probability distribution of the action set. In fact, the action
robot constantly changes as different items are loaded on and at any given time is randomly chosen, while the environment
removed from it. Since this is a non-holonomic environment, gives feedback to change the probability distribution to reflect
Proportional Integral Derivative (PID) controllers are used to reality. This gives the LA methodology a large amount of
control the steering and speed of the Turtlebot; however, tuning flexibility when designing learning solutions which means the
these controllers becomes a challenge in the home setting. methodology can be used for a wide variety of applications
In a typical home, the floor could consist of carpet, hard- and can be used in ever-changing dynamic environments [5].
wood, and ceramic tile. This means that the controller has to LA work as follows: at each time step, the automata choose
learn what the optimal parameters are for the PID controllers an action at random based on the probability distribution. The
based on how much time the robot spends on each type of response from the environment is observed, and the probability
floor. It also must adjust for the changing weight and center distribution is updated based on that response. Then the pro-
of gravity of the robot, since this is never constant for any cedure is repeated. This paper looks at two sub methodologies
long periods of time. Furthermore, there are multiple paths the of LA: the Finite Action Learning Automata (FALA): and the
Turtlebot could take to reach locations, and not every location Continuous Action Learning Automata (CALA). The FALA
is guaranteed to have people in it. methodology is used when the automata have a finite number
With these factors in mind, the Turtlebot controller works of actions. The FALA is the most widely studied and mature
as follows: it selects an area and move the robot to it; the Learning Automata [6]. We use a quadruple to formally
camera observes if anyone is in that area; if there are people, describe a FALA:
the robot stays there for a predetermined amount of time. Then
it moves on to the next area. Meanwhile, the PID controller Γ = (A, B, τ, p(k)) (1)
is being dynamically tuned to keep up with the ever changing
environment information it is receiving. Finally, in order to where A = {α1 , α2 , · · · , αr } is the set of finite actions, B is
plan a path and tune the PID controllers, the Turtlebot needs the set of all reinforcement signals to the automaton, τ is the
to determine its own position and orientation with a SLAM learning algorithm that is used for updating action probabilities
algorithm and through the use of an extended Kalman filter in and p(k) is the probability vector at time step k. The learning
the odometry module. algorithm for updating p(k) is generally of the form:
To implement this scenario on hardware in a noisy environ- p(k + 1) = τ(p(k), a(k), B(k)) (2)
ment, two problems must be overcome: if the controller can
reliably move to positions in the house that has unpredictable where a(k) = [α1 , · · · , αi , · · · , αr ] is a vector that contains all
humans in it; and if a controller can learn about its changing the possible actions, but if action i is chosen, then αi = 1 and
environment so that it can optimize its performance. all the other positions in the vector are zero. For this paper, τ
is the linear reward inaction algorithm (LR−I ), which updates
III. LEARNING AUTOMATA the probability vector p(k), if the action (αi ) at time step k
Current research in robotics involves applying machine was successful (B(k) = 1), by increasing the probability of pi
learning methodologies to robotic problems. Using the prob- while normalizing the probabilities of the other actions. If αi at
lems and constraints identified so far, we can use the type time step k is not successful (B(k) = 0) than p(k + 1) = p(k).
of feedback an agent gets as the distinguishing factor to A FALA interacting with its environment is depicted in
determine what learning methodologies are most suitable. In Figure 1 [7]. At time step k, the environment is in state s(k),
general, there are three categories: supervised, unsupervised, the FALA agent randomly selects an action, αi , based off
and Reinforcement Learning (RL) [3]. the probability vector. This action affects the environment so
In the previous sections we have determined that the feed- the state changes to s(k + 1). Then the supervisor measures
back we get from the home environment is noisy, unstructured, the response of the environment and immediately sends the
and constantly changing. As well, our robot interacts with its reinforcement signal.
environment for large periods of time and the environment is Sometimes learning tasks only requires finding the right
forgiving enough that mistakes can be tolerated, therefore RL value for a real-world parameter or finding the right value
1295
One of the differences between CALA and FALA is that
CALA need two reinforcement signals (βμ(k) and βx(k) ) while
FALA only require one per cycle. However, some CALA
algorithms set βμ to zero and you can still get valid results,
but the algorithm takes much longer to converge.
One drawback of using CALA is that one can only find the
local maximum and not the global maximum. Thus by having
a larger value for the initial variance, CALA often converges
to much better local maxima. Since the path planner on the
Turtlebot has a discrete amount of actions, a FALA controller
can best decide which area to move to. Whereas, a CALA
controller would be best suited to find the gains of the PID
Figure 1. FALA Algorithm controllers that are used to steer and control the speed of the
Turtlebot. However, there are six gains associated with those
PID controllers, how will the CALA agents work together to
of one continuous variable. We could do this with a FALA calculate all six?
controller, but we would have to discretize the range of
possibilities. But this may not be feasible in many situations
IV. GAME THEORY
due to issues of discretization: it might be too coarse, or as
it gets finer, the values might become too large causing the Game theory can be defined as “the study of mathemat-
system to considerably slow down. A way to solve this would ical models of conflict and cooperation between intelligent
be to use a different type of automata, the CALA. rational decision-makers” [9]. Game theory tries to mimic the
The concept of CALA was introduced by Santharam [8]. A decentralized nature of human behaviour. This would allow a
CALA controller works in the continuous space of actions group of relatively simple robots, or LA agents, the ability
represented by the real line. CALA work the same way to accomplish far more sophisticated tasks that they could
as FALA do, but the probability distribution is a Gaussian achieve by themselves.
with mean and standard deviation that are updated whenever All games consist of at least three items: a players’ set, P;
reinforcement is received from the environment. Like all LA, an action set, Ai ; and a utility function Ui . The players’ set
CALA are very good at dealing with noisy inputs. has all the participants of the game, i.e., P = {p1 , p2 , · · · , pn }.
CALA learn the value of x which maximizes the function The action set lists all the actions available to a specific player
f (x), where the function f (x) is unknown and only noisy Ai = {a1i , a2i , · · · , am
i } where ai is the first action available to
1
values of the function are given [5]. For example, at every time player i. The utility function is a function that each player tries
step k the CALA chooses a real number x(k) at random, but to maximize at each time step (also referred to as a reward
it bases its choice on its current action probability distribution function). As well, the action set may be considered as the
based off of its normal distribution of N(μ(k), φ ((k))), where strategy set since actions are chosen in accordance with a
μ(k) is the mean and σ (k) is the standard deviation at time strategy that is designed to maximize some reward.
step k. The parameter φ puts a lower bound on how small A game G, then, can be expressed as a triplet
the standard deviation can be so that it never equals zero. The
standard deviation can never equal zero because this would G = (P; A;U) (5)
mean that learning could not occur and the reinforcement
equations would be undefined. As discussed in section III, LA agents can be very effective
The CALA then obtains a response from the environment, at learning environmental parameters. However, in many real
which is interpreted as reinforcements βμ(k) and βx(k) respec- world robotic examples, more than one parameter must be
tively. The CALA then has to update μ(k) and σ (k) with the learned. Luckily LA agents can be grouped where each
following learning algorithms: individual LA agent tries to solve its own objective while also
βx(k) − βμ(k) x(k) − μ(k) contributing to a complex global objective. These groups of
μ(k + 1) = μ(k) + λ (3) LA agents are called a team or a game.
φ (σ (k)) φ (σ (k))
In order to keep the computational costs relatively low, most
βx(k) − βμ(k) x(k) − μ(k) 2 LA are relatively simple but very effective and efficient [5].
σ (k + 1) = σ (k) + λ [( ) − 1] Therefore, game theory can be used to combine LA agents
φ (σ (k)) φ (σ (k)) so that robots exhibit complex behaviors, while being imple-
−λ K(σ (k) − σL ) (4) mented on relatively computationally inexpensive hardware.
where λ is the learning parameter (0 < λ < 1), K is a constant Since the 1990s game theoretical problems have been solved
(large positive number) and σL is the lower bound on σ [5]. with FALA agents [10, 11]. Recently FALA games have been
This process then repeats itself at each time step until μ(k) used to tune quadrotors [12], get quadrotors to cooperate to
does not change appreciably and σ (k) approaches σL . build structures [7], and used to learn a strategy in the pursuer-
1296
evader game [13]. A game of FALA consists of a set of FALA
quadruples presented in (1):
G = Γ1 , Γ2 , · · · , Γ r (6)
In such a game, each player or FALA simultaneously
chooses an action based on their own current probability
distributions. The state of the team then changes within the
environment and an individual reinforcement is sent to each
automaton [14]. By using LA games, Nash equilibria can be
learned by repeated plays of the game, thus solving multi-
criteria optimization problems [5]. As well, LA are completely
decentralized, so there is no requirement to exchange infor-
mation or even know about the players or if they are part
of a game, which makes designing the game and controllers
considerably easier. CALA agents can also be used in LA
games, especially in common payoff games. If the learning
step-size, λ is sufficiently small then a CALA team converges
to a modal point. CALA, then, can be efficiently used in
solving optimization problems of regression functions in noisy
environments with applications including signal processing,
and neural networks. For example, one can use CALA to per-
form backpropagation for learning the weights in feedforward Figure 2. Six player CALA Game
neural networks when the training samples are noisy [5].
In this paper, PID controllers for the speed and steering
how its actions interacted with the environment. Therefore, the
of the Turtlebot were tuned via a CALA game. The play-
robot operates in discrete time steps, k. Before the Turtlebot
ers’ set would be the six gains for the two PIDs, namely
leaves the base, the path planner builds a route that contains the
P = {Pu , Iu , Du , Pz , Iz , Dz }, where u is the robot’s speed and z
(x, y, z) desired coordinates for every time step. It builds these
is its orientation. The Euclidean distance between the desired
coordinates with the assumption that the robot is going to be
position and actual position is the common payoff to all
moving at a constant speed of 0.4 m/s. The Turtlebot uses two
CALAs, which means that only one critic is needed for both
PID controllers to control its speed and its orientation. These
steering and speed.
PID controllers have CALA agents that update the individual
Therefore, each CALA agent randomly selects a number
gains of the PIDs based on a critic mechanism that uses the
that is used as gain, based on the mean value of their
Euclidean distance between the desired and actual coordinates.
probability distribution and standard deviation. The only time
the standard deviation for any and all CALA agents changes Figure 3 is a block diagram of the Turtlebot describing
is when the team of CALA agents produce a positive effect how a FALA controller is used to select a path and a CALA
in trying to help the Turtlebot achieve its goal of minimizing controller tunes PID controllers. The simulation environment
the distance between the actual and desired point. Figure 2 is was free of obstacles so that the Turtlebot could make mistakes
a graphical depiction of the two PID controllers in a CALA and learn from them. The environment consisted of three
game. As one can see from the figure, all six CALA agents points to move to and a base (Figure 4), and the tuning of the
get the same feedback (reward) from the environment. CALA PID gains consists of the Turtlebot traveling the same
course {1 → 2 → 3 → 4 → 1 · · ·} until all the gains converge.
V. DESIGN OF THE ROBOT The path planning was performed by a FALA agent that uses
The robot for the home environment discussed in this paper a linear reward inaction algorithm to update the probability
should be completely autonomous and be used to move from distribution. The FALA agent works as follows. Initially points
a food and drink preparation area (its base) to where people 2, 3, and 4 have the same starting probability of a human
are most likely to congregate and then return to its base. being present (pi = 1/3, ∀i ∈ 1, 2, 3). While at the base (point
At a high level, the path planner uses a FALA controller 1 in Figure 4), the FALA agent chooses at random one point
to select, from a series of predefined points, a goal point to where to navigate. The path planner then builds a list of the
move to and, once at that point, the robot surveys the area desired coordinates and sends the list to the CALA controller.
for a human. If the robot finds a human, then it gives itself a Once at the selected point, the robot checks to see if a human
reward and update its probability distribution, if no human is is present and then updates its probability vector for the
at the point the probability distribution is updated with zero action associated with that point. The linear reward inaction
and the robot returns to base. algorithm guarantees that the probabilities add up to one. If
At a lower level, the Turtlebot gets input from its sensors no person is present at the point, the probability distribution is
every 5 Hz (0.2 s) which gives the Turtlebot feedback on not updated. The Turtlebot then turns around and returns to the
1297
Table I
R ESULTS
Type of training Average Standard

mean deviation
square
error
CALA game trained Controllers 1.00349 m2 0.29337 m
PID Controllers tuned with Ziegler-Nichols 1.38918 m2 0.05559 m
for a certain period of time, the robot becomes unstable as it

was still learning its dynamic model. Also, when the robot’s
center of gravity changes, LA took time to learn the new
dynamics and update their PID gain. However, since changes
are assumed to be within certain boundaries (the total mass
transported was not too high), instability was not observed.
Each one of the two proposed approaches (the CALA
tuned PID controllers and the FALA path planner) were tested
separately in two sets of experiments in order to assess their
feasibility.
In the first experiment, a CALA tuned Turtlebot was
compared to a derived PID derived with a classical method,
Figure 3. Block diagram of the Turtlebot
but without a explicit model for the robot. This experiment
consisted of a Turtlebot traversing the course (Figure 4) with
tuned PID gains and with learned PID gains, where the
distance between the actual and desired poses were being
measured. Then the performance (how close to the desired path
the robot was) was compared. A square course was chosen
since the 90 degrees right turns should cause the robot to
overshoot and then overcorrect. The number of time steps
between each point was set to 100; therefore, the course was
400 time steps long.
Each CALA agent for the Turtlebot was initialized with the
following values: mean (μ) = 0.3, standard deviation (σL ) =
Figure 4. Robot learning environment 0.005, learning parameter (λ ) = 0.0002, large positive constant
(K) = 10.0, and lower bound (φ ) = 0.0005. To train the
Turtlebot, the robot traverses the course continuously for 50
base, and the process repeats until the probabilities converge.
laps. This means that it has 20,000 time steps to converge the
The CALA PID gains are updated as follows. The con-
parameter standard deviation towards the defined lower bound.
trollers receive the robots current position (x, y) as well as
However, the Turtlebot only has 200 right turn opportunities.
its angular orientation z. The controller then calculates the
Upon inspection of the CALA agents after training, all the
error between the robots’ current pose (x, y, z) and desired
standard deviations had converged to the lower bound. The
pose (xd , yd , zd ). The error is then used to calculate the control
PID gains converged to the following values:Pu = 0.3989 ,
command using the six selected CALA agents associated with
Iu = 0.2056, Du = 0.4656, Pz = 0.5535, Iz = 0.22189, Dz =
the two PID controllers. A critic function evaluates if the
0.3475. These gains were then placed on a Turtlebot to run
distance between the actual and desired points increases or
the course. The simulation was executed for 10 laps and the
decreases. If the distance decreased between points, then all
calculated average mean square error was 1.00349 m2 , with a
the CALAs are rewarded. This means that the mean value for
standard deviation of 0.29337 m.
their probability distribution is changed to the previous value
We then used a classically derived PID controller for
and that the standard deviation decreases. Whereas, if the critic
comparisons. We again ran the simulation 10 times and the
function finds out that the distance between points increased,
results are shown in Table I. From the table, one can see that
then there is no change to the mean value but the standard
the training provided better results, and thus agents playing a
deviation increases.
CALA game can learn to adjust the parameters of controllers
VI. SIMULATIONS AND RESULTS of a Turtlebot through its interaction with its environment.
Kober et al.[15] describe the difficulty of on-line tuning of In the second experiment, we wanted to assess if the FALA
controllers, which is relevant for this paper. During training, could approximate the probability distribution of presence of
1298
people, a very important ability for robots in home environ- [3] S. Russell and P. Norvig, Artificial Intelligence: A Mod-
ments. For instance, in a house reception people tend to group ern Approach, 3rd ed. Upper Saddle River, NJ, USA:
around certain points of the environment (depending on the Prentice Hall Press, 2009.
position of the furniture, for example). The probabilities of [4] A. R. N. Ravari and H. D. Taghirad, “A novel hybrid
the robot finding people in each one of the points of Figure 3 fuzzy-pid controller for tracking control of robot ma-
were set to the following: point 2 = 50%, point 3 = 30% and nipulators,” in 2008 IEEE International Conference on
point 4 = 20%. Point 1 is considered to be the base and no Robotics and Biomimetics, Feb 2009, pp. 1625–1630.
person is allowed in its vicinity. [5] M. A. L. Thathachar and P. S. Sastry, Networks of
At the beginning of the learning procedure each point Learning Automata: Techniques for Online Stochastic
is represented with a probability equal to 33%, i.e., they Optimization. Secaucus, NJ, USA: Springer-Verlag New
are equiprobable. After letting the robot execute 100 trials, York, Inc., 2003.
the probabilities of the robot moving to each one of the [6] Y. Guo, H. Ge, Y. Yan, Y. Huang, and S. Li, “A novel
points were: 50.203% for point 2, 29.524% for point 3, and continuous action-set learning automaton algorithm,” in
19.273% for point 4. The numbers are clearly identifying Proceedings of the 1st International Congress on Signal
and converging to the actual probability distribution. Thus and Information Processing, Networking and Computers
the FALA controller is an effective way to ensure that the (ICSINC 2015), 2015.
Turtlebot is using its resources effectively. [7] S. R. B. dos Santos, S. Givigi, C. Nascimento, J. Fernan-
We do not claim that LA are the best or only method to solve des, L. Buonocore, and A. de Almeida Neto, “Iterative
home robotics problems; instead, our goal is to provide a proof decentralized planning for collective construction tasks
of concept that shows high and low level design decisions with quadrotors,” J. Intell. Robotics Syst., vol. 90, no.
which can lead to successful home robotic applications. 1-2, pp. 217–234, 2018.
VII. CONCLUSION AND FUTURE WORK [8] G. Santharam, P. Sastry, and M. Thathachar, “Continuous
action set learning automata for stochastic optimization,”
Predictions for the growth of the home robotics market have
Journal of the Franklin Institute, vol. 331, no. 5, pp.
been put forward for the past decade or so [16]; however, these
607–628, 1994.
predictions have not come to past. Home automation presents
[9] R. B. Myerson, Game Theory: Analysis of Conflict,
many unconstrained challenges to robotic researchers and there
1st ed. Harvard University Press, 1997.
are many new and existing machine learning algorithms and
[10] J. Miller, “The coevolution of automata in the repeated
methodologies that can be applied to these areas.
prisoner dilemma,” Journal of Economic Behavior and
Over the past decades LA have shown how they can deal
Organization, vol. 29, no. 1, pp. 87–112, 1996.
with unconstrained challenges in a very efficient manner.
[11] P. S. Sastry, V. V. Phansalkar, and M. A. L. Thathachar,
However, a constraint on LA agents are that they are relatively
“Decentralized learning of nash equilibria in multi-
simplistic. By using game theory we can get groups of LA to
person stochastic games with incomplete information,”
exhibit complex behavior. This is significant because home
IEEE Transactions on Systems, Man, and Cybernetics,
robotic designers now have an option to use algorithms that
vol. 24, no. 5, pp. 769–777, May 1994.
are tolerant of noise and have low computational complexity.
[12] P. T. Jardine, S. N. Givigi, and S. Yousefi, “Experimental
This paper focused on how to apply RL and, more specifi-
results for autonomous model-predictive trajectory plan-
cally, LA to two problems that the home environment presents.
ning tuned with machine learning,” in 2017 Annual IEEE
Simulations were offered to demonstrate the feasibility of
International Systems Conference (SysCon), April 2017.
implementing these algorithms. CALA and FALA were ef-
[13] S. N. Givigi and H. M. Schwartz, “Decentralized strategy
ficiently used to solve the problem of tuning controllers for
selection with learning automata for multiple pursuer-
robots as well as to infer the behavior of humans.
However, the solutions outlined must be extended if they are evader games,” Adaptive Behavior, vol. 22, no. 4, pp.
to become used in robotics. One area that needs to be attacked 221–234, 2014.
is intelligent obstacle avoidance, in which LA could also be [14] S. R. B. dos Santos, S. N. Givigi, and C. L. Nascimento,
used. Furthermore, a better understanding of the limitations of “Autonomous construction of multiple structures using
FALA and CALA to derive controllers need to be developed. learning automata: Description and experimental valida-
This could imply the derivation of more advanced controllers tion,” IEEE Systems Journal, vol. 9, no. 4, pp. 1376–
such as Model Predictive Controllers (MPC). 1387, Dec 2015.
[15] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement
R EFERENCES learning in robotics: A survey,” The International Journal
[1] T. Deyle, “Why indoor robots for commercial spaces are of Robotics Research, vol. 32, no. 11, pp. 1238–1274,
the next big thing in robotics,” IEEE Spectrum, 2017. 2013.
[2] B. Armstrong-Hélouvry, P. Dupont, and C. Canudas de [16] B. Gates, “A robot in every home,” Scientific American,
Wit, “A survey of models, analysis tools and compensa- vol. 296, pp. 58–65, 2007.
tion methods for the control of machines with friction,”
Automatica, vol. 30, no. 7, pp. 1083–1138, Jul. 1994.
1299

Avan 4

Uploaded by

Copyright:

Available Formats

Avan 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Avan 4

Uploaded by

Copyright:

Available Formats

2018 IEEE International Conference on Systems, Man, and Cybernetics

Solving Home Robotics Challenges with Game

2577-1655/18/$31.00 ©2018 IEEE 1294

Type of training Average Standard

for a certain period of time, the robot becomes unstable as it

You might also like