Avan 4
Avan 4
Avan 4
Abstract—The home environment introduces many challenges nature of a household can make it difficult for autonomous
for robots to operate, for the home is much more unstructured indoor robots to select the right paths to maximize resources.
than the environments in which industrial or commercial robots
Homes are also far more customized than commercial
are found. This paper looks at a path planning problem and
how to dynamically tune PID controllers that are to be used in spaces and industrial areas. This can manifest itself in that
the home environment. The method used for tuning is Learning most rooms within a home are different and can be made
Automata, specifically Finite Action Learning Automata for the of different materials. Very few homes just have one type of
prediction of the presence of people and a game of Continuous flooring. In most cases there is a mixture of carpet, tiles, and
Action Learning Automata for the derivation of PID controllers.
wood. This imposes problems when trying to design a robot.
Results show that the proposed method efficiently derives better
controllers for path planning when compared to a PID controller For example, when trying to design a wheeled robot for the
derived with a classical method. Furthermore, the method used home environment the range of the coefficient of friction is
to find acceptable waypoints shows that the robot is able to going to be very large. Having to deal with a large range in
approximate the location of people in a home-care application. the coefficient of friction means that one needs to have a low
Index Terms—Home Robotics; Game Theory; Learning Au- center of gravity and reserve resources such as battery power
tomata; Machine Learning; FALA; CALA
which makes the robot less efficient [2].
The constraints, restraints, and the environment in which
I. I NTRODUCTION
home robots must operate are different from industrial and
Industrial robotics is a very established field and the field commercial robotic applications. Therefore, many of the solu-
of commercial robotics is also maturing at a rapid rate [1]. tions that were developed for industrial and commercial robots
However, with the exception of some toy robots and robotic are either not relevant, not efficient, or need to be applied in
floor cleaners, the field of home robotics is still understudied a completely different manner to be of use in home robotics.
and not very well understood. This paper hopes to focus on two The paper is organized as follows. In Section II, we for-
problems that arise with a home robot and how those problems mulate the two problems we wish to solve in home robotics.
can be solved with existing machine learning techniques. Section III explores which machine learning techniques should
The home presents a very unstructured and dynamic envi- be used to solve the problems. Section IV provides a discus-
ronment. The home is an unstructured environment because sion of how game theory is applied to the solutions. Section V
of the humans and animals that occupy it. Indoor robots are discusses the design of the robot for the home environment.
required to interact closely with and around unpredictable In Section VI, we discuss the experiments developed to prove
people and animals. In industrial and commercial robotic that our design solves the home environment problem. Finally,
systems, designers have more control on how humans interact Section VII concludes the paper.
with a robot and the type of interactions that are likely to occur.
For example, a section of a factory might be blocked off to II. A HOME ROBOTICS APPLICATION
human access or every factory worker knows that the robots The Turtlebot was selected as the home robotics platform
follow lines painted on the floor of the factory. In commercial for its relative low cost. A Turtlebot consists of open source
applications, the robot at the front of the store is designed to software and commercial off-the-self components, namely an
greet people and provide assistance while the robot near the iRobot create and a Microsoft Kinect camera system. Another
checkout is designed to assist with checkout activities (cashier, advantage of using the Turtlebot is that there is already an ad-
bagger, gift wrapper, etc.). In a typical home however, within vanced 3-dimensional simulation environment called Gazebo,
a very short amount of time a perfectly straight and clear path which has an advanced physics engine and simulates the real
between two points can become cluttered, making a known world to a very high degree. An empty Gazebo environment
path either unusable or adding a significant amount of extra was used for all simulations in this paper.
distance which means extra resources would need to be spent Moreover, a Turtlebot could be easily modified to be used
in moving between two points. The unpredictable and dynamic as a serving robot for, since it has a flat top, it could be used
1295
One of the differences between CALA and FALA is that
CALA need two reinforcement signals (βμ(k) and βx(k) ) while
FALA only require one per cycle. However, some CALA
algorithms set βμ to zero and you can still get valid results,
but the algorithm takes much longer to converge.
One drawback of using CALA is that one can only find the
local maximum and not the global maximum. Thus by having
a larger value for the initial variance, CALA often converges
to much better local maxima. Since the path planner on the
Turtlebot has a discrete amount of actions, a FALA controller
can best decide which area to move to. Whereas, a CALA
controller would be best suited to find the gains of the PID
Figure 1. FALA Algorithm controllers that are used to steer and control the speed of the
Turtlebot. However, there are six gains associated with those
PID controllers, how will the CALA agents work together to
of one continuous variable. We could do this with a FALA calculate all six?
controller, but we would have to discretize the range of
possibilities. But this may not be feasible in many situations
IV. GAME THEORY
due to issues of discretization: it might be too coarse, or as
it gets finer, the values might become too large causing the Game theory can be defined as “the study of mathemat-
system to considerably slow down. A way to solve this would ical models of conflict and cooperation between intelligent
be to use a different type of automata, the CALA. rational decision-makers” [9]. Game theory tries to mimic the
The concept of CALA was introduced by Santharam [8]. A decentralized nature of human behaviour. This would allow a
CALA controller works in the continuous space of actions group of relatively simple robots, or LA agents, the ability
represented by the real line. CALA work the same way to accomplish far more sophisticated tasks that they could
as FALA do, but the probability distribution is a Gaussian achieve by themselves.
with mean and standard deviation that are updated whenever All games consist of at least three items: a players’ set, P;
reinforcement is received from the environment. Like all LA, an action set, Ai ; and a utility function Ui . The players’ set
CALA are very good at dealing with noisy inputs. has all the participants of the game, i.e., P = {p1 , p2 , · · · , pn }.
CALA learn the value of x which maximizes the function The action set lists all the actions available to a specific player
f (x), where the function f (x) is unknown and only noisy Ai = {a1i , a2i , · · · , am
i } where ai is the first action available to
1
values of the function are given [5]. For example, at every time player i. The utility function is a function that each player tries
step k the CALA chooses a real number x(k) at random, but to maximize at each time step (also referred to as a reward
it bases its choice on its current action probability distribution function). As well, the action set may be considered as the
based off of its normal distribution of N(μ(k), φ ((k))), where strategy set since actions are chosen in accordance with a
μ(k) is the mean and σ (k) is the standard deviation at time strategy that is designed to maximize some reward.
step k. The parameter φ puts a lower bound on how small A game G, then, can be expressed as a triplet
the standard deviation can be so that it never equals zero. The
standard deviation can never equal zero because this would G = (P; A;U) (5)
mean that learning could not occur and the reinforcement
equations would be undefined. As discussed in section III, LA agents can be very effective
The CALA then obtains a response from the environment, at learning environmental parameters. However, in many real
which is interpreted as reinforcements βμ(k) and βx(k) respec- world robotic examples, more than one parameter must be
tively. The CALA then has to update μ(k) and σ (k) with the learned. Luckily LA agents can be grouped where each
following learning algorithms: individual LA agent tries to solve its own objective while also
βx(k) − βμ(k) x(k) − μ(k) contributing to a complex global objective. These groups of
μ(k + 1) = μ(k) + λ (3) LA agents are called a team or a game.
φ (σ (k)) φ (σ (k))
In order to keep the computational costs relatively low, most
βx(k) − βμ(k) x(k) − μ(k) 2 LA are relatively simple but very effective and efficient [5].
σ (k + 1) = σ (k) + λ [( ) − 1] Therefore, game theory can be used to combine LA agents
φ (σ (k)) φ (σ (k)) so that robots exhibit complex behaviors, while being imple-
−λ K(σ (k) − σL ) (4) mented on relatively computationally inexpensive hardware.
where λ is the learning parameter (0 < λ < 1), K is a constant Since the 1990s game theoretical problems have been solved
(large positive number) and σL is the lower bound on σ [5]. with FALA agents [10, 11]. Recently FALA games have been
This process then repeats itself at each time step until μ(k) used to tune quadrotors [12], get quadrotors to cooperate to
does not change appreciably and σ (k) approaches σL . build structures [7], and used to learn a strategy in the pursuer-
1296
evader game [13]. A game of FALA consists of a set of FALA
quadruples presented in (1):
G = Γ1 , Γ2 , · · · , Γ r (6)
In such a game, each player or FALA simultaneously
chooses an action based on their own current probability
distributions. The state of the team then changes within the
environment and an individual reinforcement is sent to each
automaton [14]. By using LA games, Nash equilibria can be
learned by repeated plays of the game, thus solving multi-
criteria optimization problems [5]. As well, LA are completely
decentralized, so there is no requirement to exchange infor-
mation or even know about the players or if they are part
of a game, which makes designing the game and controllers
considerably easier. CALA agents can also be used in LA
games, especially in common payoff games. If the learning
step-size, λ is sufficiently small then a CALA team converges
to a modal point. CALA, then, can be efficiently used in
solving optimization problems of regression functions in noisy
environments with applications including signal processing,
and neural networks. For example, one can use CALA to per-
form backpropagation for learning the weights in feedforward Figure 2. Six player CALA Game
neural networks when the training samples are noisy [5].
In this paper, PID controllers for the speed and steering
how its actions interacted with the environment. Therefore, the
of the Turtlebot were tuned via a CALA game. The play-
robot operates in discrete time steps, k. Before the Turtlebot
ers’ set would be the six gains for the two PIDs, namely
leaves the base, the path planner builds a route that contains the
P = {Pu , Iu , Du , Pz , Iz , Dz }, where u is the robot’s speed and z
(x, y, z) desired coordinates for every time step. It builds these
is its orientation. The Euclidean distance between the desired
coordinates with the assumption that the robot is going to be
position and actual position is the common payoff to all
moving at a constant speed of 0.4 m/s. The Turtlebot uses two
CALAs, which means that only one critic is needed for both
PID controllers to control its speed and its orientation. These
steering and speed.
PID controllers have CALA agents that update the individual
Therefore, each CALA agent randomly selects a number
gains of the PIDs based on a critic mechanism that uses the
that is used as gain, based on the mean value of their
Euclidean distance between the desired and actual coordinates.
probability distribution and standard deviation. The only time
the standard deviation for any and all CALA agents changes Figure 3 is a block diagram of the Turtlebot describing
is when the team of CALA agents produce a positive effect how a FALA controller is used to select a path and a CALA
in trying to help the Turtlebot achieve its goal of minimizing controller tunes PID controllers. The simulation environment
the distance between the actual and desired point. Figure 2 is was free of obstacles so that the Turtlebot could make mistakes
a graphical depiction of the two PID controllers in a CALA and learn from them. The environment consisted of three
game. As one can see from the figure, all six CALA agents points to move to and a base (Figure 4), and the tuning of the
get the same feedback (reward) from the environment. CALA PID gains consists of the Turtlebot traveling the same
course {1 → 2 → 3 → 4 → 1 · · ·} until all the gains converge.
V. DESIGN OF THE ROBOT The path planning was performed by a FALA agent that uses
The robot for the home environment discussed in this paper a linear reward inaction algorithm to update the probability
should be completely autonomous and be used to move from distribution. The FALA agent works as follows. Initially points
a food and drink preparation area (its base) to where people 2, 3, and 4 have the same starting probability of a human
are most likely to congregate and then return to its base. being present (pi = 1/3, ∀i ∈ 1, 2, 3). While at the base (point
At a high level, the path planner uses a FALA controller 1 in Figure 4), the FALA agent chooses at random one point
to select, from a series of predefined points, a goal point to where to navigate. The path planner then builds a list of the
move to and, once at that point, the robot surveys the area desired coordinates and sends the list to the CALA controller.
for a human. If the robot finds a human, then it gives itself a Once at the selected point, the robot checks to see if a human
reward and update its probability distribution, if no human is is present and then updates its probability vector for the
at the point the probability distribution is updated with zero action associated with that point. The linear reward inaction
and the robot returns to base. algorithm guarantees that the probabilities add up to one. If
At a lower level, the Turtlebot gets input from its sensors no person is present at the point, the probability distribution is
every 5 Hz (0.2 s) which gives the Turtlebot feedback on not updated. The Turtlebot then turns around and returns to the
1297
Table I
R ESULTS
1298
people, a very important ability for robots in home environ- [3] S. Russell and P. Norvig, Artificial Intelligence: A Mod-
ments. For instance, in a house reception people tend to group ern Approach, 3rd ed. Upper Saddle River, NJ, USA:
around certain points of the environment (depending on the Prentice Hall Press, 2009.
position of the furniture, for example). The probabilities of [4] A. R. N. Ravari and H. D. Taghirad, “A novel hybrid
the robot finding people in each one of the points of Figure 3 fuzzy-pid controller for tracking control of robot ma-
were set to the following: point 2 = 50%, point 3 = 30% and nipulators,” in 2008 IEEE International Conference on
point 4 = 20%. Point 1 is considered to be the base and no Robotics and Biomimetics, Feb 2009, pp. 1625–1630.
person is allowed in its vicinity. [5] M. A. L. Thathachar and P. S. Sastry, Networks of
At the beginning of the learning procedure each point Learning Automata: Techniques for Online Stochastic
is represented with a probability equal to 33%, i.e., they Optimization. Secaucus, NJ, USA: Springer-Verlag New
are equiprobable. After letting the robot execute 100 trials, York, Inc., 2003.
the probabilities of the robot moving to each one of the [6] Y. Guo, H. Ge, Y. Yan, Y. Huang, and S. Li, “A novel
points were: 50.203% for point 2, 29.524% for point 3, and continuous action-set learning automaton algorithm,” in
19.273% for point 4. The numbers are clearly identifying Proceedings of the 1st International Congress on Signal
and converging to the actual probability distribution. Thus and Information Processing, Networking and Computers
the FALA controller is an effective way to ensure that the (ICSINC 2015), 2015.
Turtlebot is using its resources effectively. [7] S. R. B. dos Santos, S. Givigi, C. Nascimento, J. Fernan-
We do not claim that LA are the best or only method to solve des, L. Buonocore, and A. de Almeida Neto, “Iterative
home robotics problems; instead, our goal is to provide a proof decentralized planning for collective construction tasks
of concept that shows high and low level design decisions with quadrotors,” J. Intell. Robotics Syst., vol. 90, no.
which can lead to successful home robotic applications. 1-2, pp. 217–234, 2018.
VII. CONCLUSION AND FUTURE WORK [8] G. Santharam, P. Sastry, and M. Thathachar, “Continuous
action set learning automata for stochastic optimization,”
Predictions for the growth of the home robotics market have
Journal of the Franklin Institute, vol. 331, no. 5, pp.
been put forward for the past decade or so [16]; however, these
607–628, 1994.
predictions have not come to past. Home automation presents
[9] R. B. Myerson, Game Theory: Analysis of Conflict,
many unconstrained challenges to robotic researchers and there
1st ed. Harvard University Press, 1997.
are many new and existing machine learning algorithms and
[10] J. Miller, “The coevolution of automata in the repeated
methodologies that can be applied to these areas.
prisoner dilemma,” Journal of Economic Behavior and
Over the past decades LA have shown how they can deal
Organization, vol. 29, no. 1, pp. 87–112, 1996.
with unconstrained challenges in a very efficient manner.
[11] P. S. Sastry, V. V. Phansalkar, and M. A. L. Thathachar,
However, a constraint on LA agents are that they are relatively
“Decentralized learning of nash equilibria in multi-
simplistic. By using game theory we can get groups of LA to
person stochastic games with incomplete information,”
exhibit complex behavior. This is significant because home
IEEE Transactions on Systems, Man, and Cybernetics,
robotic designers now have an option to use algorithms that
vol. 24, no. 5, pp. 769–777, May 1994.
are tolerant of noise and have low computational complexity.
[12] P. T. Jardine, S. N. Givigi, and S. Yousefi, “Experimental
This paper focused on how to apply RL and, more specifi-
results for autonomous model-predictive trajectory plan-
cally, LA to two problems that the home environment presents.
ning tuned with machine learning,” in 2017 Annual IEEE
Simulations were offered to demonstrate the feasibility of
International Systems Conference (SysCon), April 2017.
implementing these algorithms. CALA and FALA were ef-
[13] S. N. Givigi and H. M. Schwartz, “Decentralized strategy
ficiently used to solve the problem of tuning controllers for
selection with learning automata for multiple pursuer-
robots as well as to infer the behavior of humans.
However, the solutions outlined must be extended if they are evader games,” Adaptive Behavior, vol. 22, no. 4, pp.
to become used in robotics. One area that needs to be attacked 221–234, 2014.
is intelligent obstacle avoidance, in which LA could also be [14] S. R. B. dos Santos, S. N. Givigi, and C. L. Nascimento,
used. Furthermore, a better understanding of the limitations of “Autonomous construction of multiple structures using
FALA and CALA to derive controllers need to be developed. learning automata: Description and experimental valida-
This could imply the derivation of more advanced controllers tion,” IEEE Systems Journal, vol. 9, no. 4, pp. 1376–
such as Model Predictive Controllers (MPC). 1387, Dec 2015.
[15] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement
R EFERENCES learning in robotics: A survey,” The International Journal
[1] T. Deyle, “Why indoor robots for commercial spaces are of Robotics Research, vol. 32, no. 11, pp. 1238–1274,
the next big thing in robotics,” IEEE Spectrum, 2017. 2013.
[2] B. Armstrong-Hélouvry, P. Dupont, and C. Canudas de [16] B. Gates, “A robot in every home,” Scientific American,
Wit, “A survey of models, analysis tools and compensa- vol. 296, pp. 58–65, 2007.
tion methods for the control of machines with friction,”
Automatica, vol. 30, no. 7, pp. 1083–1138, Jul. 1994.
1299