Discover millions of ebooks, audiobooks, and so much more with a free trial

Only €10,99/month after trial. Cancel anytime.

Multi-Agent Machine Learning: A Reinforcement Approach
Multi-Agent Machine Learning: A Reinforcement Approach
Multi-Agent Machine Learning: A Reinforcement Approach
Ebook423 pages3 hours

Multi-Agent Machine Learning: A Reinforcement Approach

Rating: 0 out of 5 stars

()

Read preview

About this ebook

The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-player grid games—two player grid games, Q-learning, and Nash Q-learning. Chapter 5 discusses differential games, including multi player differential games, actor critique structure, adaptive fuzzy control and fuzzy interference systems, the evader pursuit game, and the defending a territory games. Chapter 6 discusses new ideas on learning within robotic swarms and the innovative idea of the evolution of personality traits.

• Framework for understanding a variety of methods and approaches in multi-agent machine learning.

• Discusses methods of reinforcement learning such as a number of forms of multi-agent Q-learning

• Applicable to research professors and graduate students studying electrical and computer engineering, computer science, and mechanical and aerospace engineering

LanguageEnglish
PublisherWiley
Release dateAug 26, 2014
ISBN9781118884485
Multi-Agent Machine Learning: A Reinforcement Approach

Related to Multi-Agent Machine Learning

Related ebooks

Electrical Engineering & Electronics For You

View More

Related articles

Reviews for Multi-Agent Machine Learning

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Multi-Agent Machine Learning - H. M. Schwartz

    Preface

    For a decade I have taught a course on adaptive control. The course focused on the classical methods of system identification, using such classic texts as Ljung [1, 2]. The course addressed traditional methods of model reference adaptive control and nonlinear adaptive control using Lyapunov techniques. However, the theory had become out of sync with current engineering practice. As such, my own research and the focus of the graduate course changed to include adaptive signal processing, and to incorporate adaptive channel equalization and echo cancellation using the least mean squares (LMS) algorithm. The course name likewise changed, from Adaptive Control to Adaptive and Learning Systems. My research was still focused on system identification and nonlinear adaptive control with application to robotics. However, by the early 2000s, I had started work with teams of robots. It was now possible to use handy robot kits and low-cost microcontroller boards to build several robots that could work together. The graduate course in adaptive and learning systems changed again; the theoretical material on nonlinear adaptive control using Lyapunov techniques was reduced, replaced with ideas from reinforcement learning. A whole new range of applications developed. The teams of robots had to learn to work together and to compete.

    Today, the graduate course focuses on system identification using recursive least squares techniques, some model reference adaptive control (still using Lyapunov techniques), adaptive signal processing using the LMS algorithm, and reinforcement learning using Q-learning. The first two chapters of this book present these ideas in an abridged form, but in sufficient detail to demonstrate the connections among the learning algorithms that are available; how they are the same; and how they are different. There are other texts that cover this material in detail [2–4].

    The research then began to focus on teams of robots learning to work together. The work examined applications of robots working together for search and rescue applications, securing important infrastructure and border regions. It also began to focus on reinforcement learning and multiagent reinforcement learning. The robots are the learning agents. How do children learn how to play tag? How do we learn to play football, or how do police work together to capture a criminal? What strategies do we use, and how do we formulate these strategies? Why can I play touch football with a new group of people and quickly be able to assess everyone's capabilities and then take a particular strategy in the game?

    As our research team began to delve further into the ideas associated with multiagent machine learning and game theory, we discovered that the published literature covered many ideas but was poorly coordinated or focused. Although there are a few survey articles [5], they do not give sufficient details to appreciate the different methods. The purpose of this book is to introduce the reader to a particular form of machine learning. The book focuses on multiagent machine learning, but it is tied together with the central theme of learning algorithms in general. Learning algorithms come in many different forms. However, they tend to have a similar approach. We will present the differences and similarities of these methods.

    This book is based on my own work and the work of several doctoral and masters students who have worked under my supervision over the past 10 years. In particular, I would like to thank Prof. Sidney Givigi. Prof. Givigi was instrumental in developing the ideas and algorithms presented in Chapter 6. The doctoral research of Xiaosong (Eric) Lu has also found its way into this book. The work on guarding a territory is largely based on his doctoral dissertation. Other graduate students who helped me in this work include Badr Al Faiya, Mostafa Awheda, Pascal De Beck-Courcelle, and Sameh Desouky. Without the dedicated work of this group of students, this book would not have been possible.

    H. M. Schwartz

    Ottawa, Canada

    September, 2013

    References

    [1] L. Ljung, System Identification: Theory for the User. Upper Saddle River, NJ: Prentice Hall, 2nd ed., 1999.

    [2] L. Ljung and T. Soderstrom, Theory and Practice of Recursive Identification. Cambridge, Massachusetts: The MIT Press, 1983.

    [3] R. S. Sutton and A. G. Barto, Reinforcement learning: An Introduction. Cambridge, Massachusetts: The MIT Press, 1998.

    [4] Astrom, K. J. and Wittenmark, B., Adaptive Control. Boston, Massachusetts: Addison-Wesley Longman Publishing Co., Inc., 2nd ed., 1994, ISBN = 0201558661.

    [5] L. Bu scedil oniu and R. Babuška, and B. D. Schutter, A comprehensive survey of multiagent reinforcement learning, IEEE Trans. Syst. Man Cybern. Part C, Vol. 38, no. 2, pp. 156–172, 2008.

    Chapter 1

    A Brief Review of Supervised Learning

    There are a number of algorithms that are typically used for system identification, adaptive control, adaptive signal processing, and machine learning. These algorithms all have particular similarities and differences. However, they all need to process some type of experimental data. How we collect the data and process it determines the most suitable algorithm to use. In adaptive control, there is a device referred to as the self-tuning regulator. In this case, the algorithm measures the states as outputs, estimates the model parameters, and outputs the control signals. In reinforcement learning, the algorithms process rewards, estimate value functions, and output actions. Although one may refer to the recursive least squares (RLS) algorithm in the self-tuning regulator as a supervised learning algorithm and reinforcement learning as an unsupervised learning algorithm, they are both very similar. In this chapter, we will present a number of well-known baseline supervised learning algorithms.

    1.1 Least Squares Estimates

    The least squares (LS) algorithm is a well-known and robust algorithm for fitting experimental data to a model. The first step is for the user to define a mathematical structure or model that he/she believes will fit the data. The second step is to design an experiment to collect data under suitable conditions. Suitable conditions usually means the operating conditions under which the system will typically operate. The next step is to run the estimation algorithm, which can take several forms, and, finally, validate the identified or learned model. The LS algorithm is often used to fit the data. Let us look at the case of the classical two-dimensional linear regression fit that we are all familiar with:

    1.1 equation

    In this a simple linear regression model, where the input is the sampled signal c01-math-0002 and the output is c01-math-0003 . The model structure defined is a straight line. Therefore, we are assuming that the data collected will fit a straight line. This can be written in the form

    1.2 equation

    where c01-math-0005 and c01-math-0006 . How one chooses c01-math-0007 determines the model structure, and this reflects how one believes the data should behave. This is the essence of machine learning, and virtually all university students will at some point learn the basic statistics of linear regression. Behind the computations of the linear regression algorithm is the scalar cost function, given by

    1.3 equation

    The term c01-math-0009 is the estimate of the LS parameter c01-math-0010 . The goal is for the estimate c01-math-0011 to minimize the cost function c01-math-0012 . To find the optimal value of the parameter estimate c01-math-0013 , one takes the partial derivative of the cost function c01-math-0014 with respect to c01-math-0015 and sets this derivative to zero. Therefore, one gets

    1.4 equation

    Setting c01-math-0017 , we get

    1.5 equation

    Solving for c01-math-0019 , we get the LS solution

    1.6 equation

    where the inverse, c01-math-0021 exists. If the inverse does not exist, then the system is not identifiable. For example, if in the straight line case one only had a single point, then the inverse would not span the two-dimensional space and it would not exist. One needs at least two independent points to draw a straight line. Or, for example, if one had exactly the same point over and over again, then the inverse would not exist. One needs at least two independent points to draw a straight line. The matrix c01-math-0022 is referred to as the information matrix and is related to how well one can estimate the parameters. The inverse of the information matrix is the covariance matrix, and it is proportional to the variance of the parameter estimates. Both these matrices are positive definite and symmetric. These are very important properties which are used extensively in analyzing the behavior of the algorithm. In the literature, one will often see the covariance matrix referred to as c01-math-0023 . We can write the second equation on the right of Eq. (1.4) in the form

    1.7 equation

    and one can define the prediction errors as

    1.8 equation

    The term within brackets in Eq. (1.7) is known as the prediction error or, as some people will refer to it, the innovations. The term c01-math-0026 represents the error in predicting the output of the system. In this case, the output term c01-math-0027 is the correct answer, which is what we want to estimate. Since we know the correct answer, this is referred to as supervised learning. Notice that the value of the prediction error times the data vector is equal to zero. We then say that the prediction errors are orthogonal to the data, or that the data sits in the null space of the prediction errors. In simplistic terms, this means that, if one has chosen a good model structure c01-math-0028 , then the prediction errors should appear as white noise. Always plot the prediction errors as a quick check to see how good your predictor is. If the errors appear to be correlated (i.e., not white noise), then you can improve your model and get a better prediction.

    One does not typically write the linear regression in the form of Eq. (1.2), but typically will add a white noise term, and then the linear regression takes the form

    1.9 equation

    where c01-math-0030 is a white noise term. Equation (1.9) can represent an infinite number of possible model structures. For example, let us assume that we want to learn the dynamics of a second-order linear system or the parameters of a second-order infinite impulse response (IIR) filter. Then we could choose the second-order model structure given by

    1.10

    equation

    Then the model structure would be defined in c01-math-0032 as

    1.11

    equation

    In general, one can write an arbitrary c01-math-0034 th-order autoregressive exogenous (ARX) model structure as

    1.12

    equation

    and c01-math-0036 takes the form

    1.13

    equation

    One then collects the data from a suitable experiment (easier said than done!), and then computes the parameters using Eq. (1.6). The vector c01-math-0038 can take many different forms; in fact, it can contain nonlinear functions of the data, for example, logarithmic terms or square terms, and it can have different delay terms. To a large degree, one can use ones professional judgment as to what to put into c01-math-0039 . One will often write the data in the matrix form, in which case the matrix is defined as

    1.14 equation

    and the output matrix as

    1.15 equation

    Then one can write the LS estimate as

    1.16 equation

    Furthermore, one can write the prediction errors as

    1.17 equation

    We can also write the orthogonality condition as c01-math-0044 .

    The LS method of parameter identification or machine learning is very well developed and there are many properties associated with the technique. In fact, much of the work in statistical inference is derived from the few equations described in this section. This is the beginning of many scientific investigations including work in the social sciences.

    1.2 Recursive Least Squares

    The LS algorithm has been extended to the RLS algorithm. In this case, the parameter estimate is developed as the machine collects the data in real time. In the previous section, all the data was collected first, and then the parameter estimates were computed on the basis of Eq. (1.6). The RLS algorithm is derived by assuming a solution to the LS algorithm and then adding a single data point. The derivation is shown in Reference [1]. In the RLS implementation, the cost function takes a slightly different form. The cost function in this case is

    1.18 equation

    where c01-math-0046 . The term c01-math-0047 is known as the forgetting factor. This term will place less weight on older data points. As such, the resulting RLS algorithm will be able track changes to the parameters. Once again, taking the partial derivative of c01-math-0048 with respect to c01-math-0049 and setting the derivative to zero, we get

    1.19

    equation

    The forgetting factor should be set as c01-math-0051 . If one sets the forgetting factor near 0.95, then old data is forgotten very quickly; the rule of thumb is that the estimate of the parameters c01-math-0052 is approximately based on c01-math-0053 data points.

    The RLS algorithm is as follows:

    1.20

    equation

    One implements Eq. (1.20) by initializing the parameter estimation vector c01-math-0055 to the users best initial estimate of the parameters, which is often simply zero. The covariance matrix c01-math-0056 is typically initialized to a relatively large diagonal matrix, and represents the initial uncertainty in the parameter estimate.

    One can implement the RLS algorithm as in Eq. (1.20), but the user should be careful that the covariance matrix c01-math-0057 is always positive definite and symmetric. If the c01-math-0058 matrix, because of numerical error by repeatedly computing the RLS, ceases to be positive definite and symmetric, then the algorithm will diverge. There are a number of well-developed algorithms to ensure that the c01-math-0059 matrix remains positive definite. One can use a square-roots approach whereby the c01-math-0060 matrix is factored into its Cholesky factorization or the c01-math-0061 factorization. Such methods are described in Reference [1].

    Let us examine Eq. (1.20) and notice that the update to the parameter estimate is the previous estimate plus a matrix c01-math-0062 times the current prediction error. We will see this structure in almost every algorithm that will be described in machine learning. In this case, we have an actual correct answer, which is the measurement c01-math-0063 , and we call such algorithms supervised learning.

    1.3 Least Mean Squares

    In the field of signal processing, there are a few commonly used techniques to model or characterize the dynamics of a communications channel and then compensate for the effects of the channel on the signal. These techniques are referred to as channel equalization and echo cancellation. There are numerous books on adaptive signal processing and adaptive filtering [2]. Most of these techniques use the least mean squares (LMS) approach to identify the coefficients of a model of the channel. Once again, as in the LS and RLS algorithms, we must choose an appropriate model structure to define the communication channel dynamics. In the field of signal processing, one would typically use what is known as a finite impulse response (FIR) filter as the underlying model structure that describes the system. To maintain consistency with the previous section, one can write the channel dynamics as

    1.21

    equation

    where c01-math-0065 is the output of the filter, or the communications channel, at time step c01-math-0066 , c01-math-0067 are the filter coefficients that we want to estimate or learn, and c01-math-0068 is the input signal. Typically, the signal c01-math-0069 is the communication signal that we want to recover from the output signal c01-math-0070 . We define an error signal

    1.22 equation

    where c01-math-0072 . This is the same signal as the prediction error in Eq. (1.8). The LMS algorithm defines the cost function as the expected value of the prediction errors as

    1.23 equation

    We can write the squared error term as

    1.24

    equation

    We take the expectation and get

    1.25

    equation

    We then define the variance c01-math-0076 as the mean squared power, and the cross-correlation vector is defined as c01-math-0077 . Then we define the information matrix, which is almost the same matrix as in Section 1.1, as c01-math-0078 . If the system statistics are stationary, that is, the statistics do not change, the terms c01-math-0079 , and c01-math-0080 , are constants and the cost function, as a function of changing c01-math-0081 , will have the shape of a bowl. The cost function c01-math-0082 can be written as

    1.26 equation

    Once again, as in Eq. (1.4), to find the optimal parameter estimate c01-math-0084 to minimize the cost function, we take the partial derivative of the cost function c01-math-0085 with respect to c01-math-0086 , and determine the value of c01-math-0087 that sets the partial derivative to zero. We can take the partial derivative of c01-math-0088 as

    1.27 equation

    We then compute the partial derivative for each of the terms on the right-hand side of Eq. (1.27). Taking each term separately, we get

    1.28 equation

    equation

    Substituting into Eq. (1.27), we get

    1.29 equation

    Solving for c01-math-0093 , we get the solution for the optimal parameter estimate as

    1.30 equation

    Equation (1.30) is the well-known Wiener solution. However, the Wiener solution in Eq. (1.30) requires the computation of the inverse of a large matrix c01-math-0095 . Notice the similarity between the Wiener solution and the LS solution in Eq. (1.6). Let us say that we want to estimate the expectations in Eq. (1.25); then we would get the average by computing

    1.31 equation

    Substituting the above values into Eq. (1.30), we get the LS solution given by Eq. (1.6). In essence, the LMS Wiener solution and the LS solution are essentially the same.

    In the world of signal processing and in particular adaptive signal processing, the processing speed is very important. Furthermore, the model structure used in adaptive signal processing, especially for communication applications, can have many parameters. It would not be unusual to have c01-math-0097 terms in the c01-math-0098 vector, which means the term c01-math-0099 in Eq. (1.21) would be c01-math-0100 . In that case, the c01-math-0101 matrix will be c01-math-0102 , which would be prohibitively large to take the inverse of in Eq. (1.30). As such, a gradient steepest descent method is normally implemented. This is a very common technique throughout the fields of engineering and is very similar to the well-known Newton–Raphson method of finding the zeros and roots of various functions. The steepest descent method is an iterative method. The idea is to start with an initial guess of the parameter values: often one will simply choose zero for the parameter values. In the lexicon of signal processing, one would refer to the parameters as tap weights. Then one iteratively adjusts the parameters such that one moves down the cost function along the gradient. Let us say that the current estimate of the parameter vector is c01-math-0103 ; then we compute the next value of the parameter vector as

    1.32 equation

    where c01-math-0105 is the gradient and is given by the derivative of the cost function with respect to the parameter estimation vector, c01-math-0106 as defined in Eq. (1.29). Then, substituting for c01-math-0107 in Eq. (1.32), we get

    1.33 equation

    In recursive form, it is written as

    1.34 equation

    We can also write Eq. (1.34) in the form

    1.35 equation

    where c01-math-0111 . One may recognize, from systems theory, that if the eigenvalues of c01-math-0112 are

    Enjoying the preview?
    Page 1 of 1