1118 Natl Sci Rev, 2020, Vol. 7, No.



Special Topic: Games in Control Systems

Game theory, learning, and control systems
Jeff S. Shamma
Game theory is the study of interacting decision makers, whereas control systems involve the design of intelligent decision-making devices.
When many control systems are interconnected, the result can be viewed through the lens of game theory. This article discusses both long
standing connections between these fields as well as new connections stemming from emerging applications.

Game theory is the study of interacting components where no single actor has any player i and every alternative action
decision makers [1], i.e. settings in which full information on the state of the en- a i ,
the quality of an actor’s decision depends vironment or full authority over the de- U i (a 1∗ , . . . , a i∗ , . . . , a n∗ )
on the decisions of others. In commut- cisions over the network. A representa-
≥ U i (a 1∗ , . . . , a i , . . . , a n∗ ),
ing, the congestion experienced on a road tive application is the smart grid [10],
depends on a vehicle’s path, as well as where a distributed network of pro- i.e. each player’s action is optimal with re-
the paths taken by other vehicles. In sumers make decisions on production, spect to the actions of other players.
auctions, the outcome depends on one’s consumption, and storage of energy in re- Nash equilibrium is an example of
own bid as well as the bids of others. In sponse to evolving demand and environ- a solution concept for a game, which
competitive markets, market share de- mental conditions. is a proposed outcome given the spec-
pends on both a firm’s pricing as well as A more recent connection between ification of the elements of a game.
the pricing of its competitors. game theory and control systems is in the Such an interpretation can be problem-
While game theory traditionally has area of game-theoretic learning [11,12]. atic. A game may have multiple Nash
been studied within the realm of math- To set up the discussion, we first de- equilibria, resulting in an issue of non-
ematical social sciences, there are also fine a (non-cooperative) game by (i) a set uniqueness. Another lingering question
strong ties to control systems [2,3]. of players; (ii) for each player, a set of ac- is how agents might reach a Nash equi-
A longstanding connection is the set- tions; and (iii) for each player, a utility librium, especially given that agents have
ting of zero-sum, or minimax, games. In function that quantifies a player’s satisfac- limited knowledge about the utility func-
zero-sum games, there are two players, tion with the collective actions of all play- tions of other agents or even observations
and a benefit to one player is a detri- ers. More formally, we can write the util- of the actions of other agents. Indeed,
ment to the other. A classical example ity function of the ith player as Ui (a1 , a2 , even computing a Nash equilibrium can
is pursuit-evasion games [4]. A common . . ., ai , . . ., an ), where (a1 , . . ., an ) is the have intractable computational complex-
perspective in control systems is that one action profile of n players, ai is the action ity [13]. Nonetheless, Nash equilibrium
player is the controller and the oppos- of the ith player, and Ui (·) is a real-valued is widely used as representative of the
ing player is an adversarial environment, function where the ith player prefers the outcome of a game-theoretic model.
e.g. exogenous disturbances [5] or model action profile a = (a1 , . . ., an ) over a  = The study of game-theoretic learning
misspecification [6]. The controller seeks (a 1 , . . . , a n ) whenever Ui (a) > Ui (a ) partially addresses these issues by shift-
to optimize a specified performance ob- (i.e. larger utility is better). ing the discussion away from Nash equi-
jective, whereas the adversarial environ- An important concept in game the- librium and towards how players might
ment seeks to reduce achieved perfor- ory is the Nash equilibrium, which is an reach a Nash equilibrium through some
mance. There has been renewed interest action profile (a 1∗ , . . . , a n∗ ) such that for sort of online or adaptive learning pro-
in zero-sum games in the area of security (a) (b) cess. Such learning processes evolve over
[7,8], where security measures are to be
stages, e.g. t = 0, 1, 2, . . ., and can be
taken against a variety of adversarial at- represented as
tacks ranging from intrusion to data cor-
ruption to privacy violation. Actuate Decide Sense
a i (t) ∼ LR[Ii (t); U i (·)],
Another area of interest is in dis-
Control system
tributed or networked control systems where the action of the ith player at
[9], motivated by applications such Figure 1. (a) A traditional control system ar- stage t is determined by the learning rule,
as power networks, transportation chitecture with centralized information and LR[·], that acts on the information avail-
networks, and multi-robot systems. In authority. (b) A distributed or networked con- able to player i up to stage t, Ii (t), as well
such settings, illustrated in Fig. 1, there trol system architecture with multiple interact- as Ui (·), the specific utility function of
is a large number of decision making ing decision makers. player i. A learning rule may be stochastic,

