NWZ 163

1118 Natl Sci Rev, 2020, Vol. 7, No.
7 PERSPECTIVES
INFORMATION SCIENCE
Special Topic: Games in Control Systems

Game theory, learning, and control systems
Jeff S. Shamma
Game theory is the study of interacting decision makers, whereas control systems involve the design of intelligent decision-making devices.
When many control systems are interconnected, the result can be viewed through the lens of game theory. This article discusses both long
standing connections between these fields as well as new connections stemming from emerging applications.
Downloaded from https://academic.oup.com/nsr/article/7/7/1118/5610794 by guest on 24 May 2024

Game theory is the study of interacting components where no single actor has any player i and every alternative action
decision makers [1], i.e. settings in which full information on the state of the en- a i ,
the quality of an actor’s decision depends vironment or full authority over the de- U i (a 1∗ , . . . , a i∗ , . . . , a n∗ )
on the decisions of others. In commut- cisions over the network. A representa-
≥ U i (a 1∗ , . . . , a i , . . . , a n∗ ),
ing, the congestion experienced on a road tive application is the smart grid [10],
depends on a vehicle’s path, as well as where a distributed network of pro- i.e. each player’s action is optimal with re-
the paths taken by other vehicles. In sumers make decisions on production, spect to the actions of other players.
auctions, the outcome depends on one’s consumption, and storage of energy in re- Nash equilibrium is an example of
own bid as well as the bids of others. In sponse to evolving demand and environ- a solution concept for a game, which
competitive markets, market share de- mental conditions. is a proposed outcome given the spec-
pends on both a firm’s pricing as well as A more recent connection between ification of the elements of a game.
the pricing of its competitors. game theory and control systems is in the Such an interpretation can be problem-
While game theory traditionally has area of game-theoretic learning [11,12]. atic. A game may have multiple Nash
been studied within the realm of math- To set up the discussion, we first de- equilibria, resulting in an issue of non-
ematical social sciences, there are also fine a (non-cooperative) game by (i) a set uniqueness. Another lingering question
strong ties to control systems [2,3]. of players; (ii) for each player, a set of ac- is how agents might reach a Nash equi-
A longstanding connection is the set- tions; and (iii) for each player, a utility librium, especially given that agents have
ting of zero-sum, or minimax, games. In function that quantifies a player’s satisfac- limited knowledge about the utility func-
zero-sum games, there are two players, tion with the collective actions of all play- tions of other agents or even observations
and a benefit to one player is a detri- ers. More formally, we can write the util- of the actions of other agents. Indeed,
ment to the other. A classical example ity function of the ith player as Ui (a1 , a2 , even computing a Nash equilibrium can
is pursuit-evasion games [4]. A common . . ., ai , . . ., an ), where (a1 , . . ., an ) is the have intractable computational complex-
perspective in control systems is that one action profile of n players, ai is the action ity [13]. Nonetheless, Nash equilibrium
player is the controller and the oppos- of the ith player, and Ui (·) is a real-valued is widely used as representative of the
ing player is an adversarial environment, function where the ith player prefers the outcome of a game-theoretic model.
e.g. exogenous disturbances [5] or model action profile a = (a1 , . . ., an ) over a = The study of game-theoretic learning
misspecification [6]. The controller seeks (a 1 , . . . , a n ) whenever Ui (a) > Ui (a ) partially addresses these issues by shift-
to optimize a specified performance ob- (i.e. larger utility is better). ing the discussion away from Nash equi-
jective, whereas the adversarial environ- An important concept in game the- librium and towards how players might
ment seeks to reduce achieved perfor- ory is the Nash equilibrium, which is an reach a Nash equilibrium through some
mance. There has been renewed interest action profile (a 1∗ , . . . , a n∗ ) such that for sort of online or adaptive learning pro-
in zero-sum games in the area of security (a) (b) cess. Such learning processes evolve over
[7,8], where security measures are to be
Environment
stages, e.g. t = 0, 1, 2, . . ., and can be
taken against a variety of adversarial at- represented as
Environment
tacks ranging from intrusion to data cor-
ruption to privacy violation. Actuate Decide Sense
a i (t) ∼ LR[Ii (t); U i (·)],
Another area of interest is in dis-
Control system
tributed or networked control systems where the action of the ith player at
[9], motivated by applications such Figure 1. (a) A traditional control system ar- stage t is determined by the learning rule,
as power networks, transportation chitecture with centralized information and LR[·], that acts on the information avail-
networks, and multi-robot systems. In authority. (b) A distributed or networked con- able to player i up to stage t, Ii (t), as well
such settings, illustrated in Fig. 1, there trol system architecture with multiple interact- as Ui (·), the specific utility function of
is a large number of decision making ing decision makers. player i. A learning rule may be stochastic,
PERSPECTIVES Shamma 1119
wherein the action is a randomized out- ysis of game-theoretic learning. A learn- REFERENCES
come according to a probability distribu- ing rule is a type of dynamical system,
1. Osborne M and Rubinstein A. A Course in Game
tion generated by the learning rule. For and so interacting learning agents con-
Theory. Cambridge: MIT Press, 1994.
example, in reinforcement learning, an stitute a feedback interconnection of dy-
2. Başar T and Olsder G. Classics in Applied Math-
action is selected with a probability that is namical systems with special structures
ematics: Dynamic Noncooperative Game Theory.
proportional to the cumulative utility that emerging from game-theoretic learning.
Philadelphia: Society for Industrial and Applied
it has garnered in the past. Recent work includes exploiting under-
Mathematics, 1999.
There is a very interesting and lying passivity properties associated with
3. Jonathan PH. IEEE Contr Syst Mag 2017; 37: 5–8.
complicating factor that distinguishes game-theoretic learning rules (see the tu-
4. Isaacs R. Differential Games: A Mathematical
game-theoretic learning from other torial paper [16] and references therein).
Theory with Applications to Warfare and Pur-
learning formulations such as reinforce- These research directions represent
suit, Control and Optimization. New York: Wiley,
ment learning. An implicit assumption complementary paradigms for game-
1965.
Downloaded from https://academic.oup.com/nsr/article/7/7/1118/5610794 by guest on 24 May 2024

in learning is that there is a stationary theoretic learning. In the first, game-
5. Başar T and Bernhard P. Systems and Control:
environment, and so, over time, one theoretic learning is being used as a
Foundations and Applications. H-infinity-Optimal
can determine what actions are more prescriptive approach to programming
Control and Related Minimax Design Problems.
effective. However, in game-theoretic engineered devices. In the second,
Basel: Birkhäuser, 1991.
learning, the environment comprises game-theoretic learning is a descriptive
6. Bernhard P. Robust control and dynamic games.
other learning agents. Learning in the approach to modeling evolving human
In Başar T and Zaccour G (eds.). Handbook of Dy-
presence of other learners results in a decision making. Going forward, there is
namic Game Theory. Cham: Springer, 2016, 1–30.
non-stationary environment from the a significant opportunity for game theory
7. Tambe M. Security and Game Theory: Algorithms,
perspective of any individual agent. and control systems that blend these two
Deployed Systems, Lessons Learned. Cambridge:
Indeed, depending on both the structure perspectives.
Cambridge University Press, 2011.
of the underlying game and the specific In the emerging area of cyber–
8. Manshaei MH, Zhu Q and Alpcan T et al. ACM
learning rule, outcomes can range from physical–social systems, the distributed
Comput Surv 2013; 45: 25.
convergence to a Nash equilibrium (or decision architecture in Fig. 1 is a
9. Marden JR and Shamma JS. Game theory and
other solution concepts, most notably possible mix of both programmable
distributed control. InYoung H and Zamir S
correlated equilibrium [12]) to prefer- devices and human decision makers. An
(eds.). Handbook of Game Theory with Economic
ential selection of some Nash equilibria application area is in smart cities [17],
Applications, Vol. 4. Amsterdam: Elsevier, 2015,
over others to non-convergence and where (i) human drivers may share the
861–99.
even chaotic behavior. This notion of road with autonomous vehicles; (ii)
10. Saad W, Han Z and Poor HV et al. IEEE Signal Pro-
embedding learning agents in a common human users must be incentivized into
cess Mag 2012; 29: 86–105.
environment recently has gained signif- participating in energy demand response
11. Young H. Arne Ryde Memorial Lectures: Strategic
icant attention in the context of training while monitored by IoT devices; and
Learning and Its Limits. Oxford: Oxford University
neural networks through so-called (iii) humans and robots interact in
Press, 2004.
generative adversarial networks [14]. unstructured environments, such as in
12. Hart S. Econometrica 2005; 73: 1401–30.
The game-theoretic learning frame- assistive robotics. In such applications,
13. Daskalakis C, Goldberg PW and Papadimitriou
work leads to two significant connections the perspectives of both game theory
CH. SIAM J Comput 2009; 39: 195–259.
to control systems. First, game-theoretic and control theory come together, where
14. Salimans T, Goodfellow I and Zaremba W et al.
learning offers an approach to design- game theory models interactive decision
Improved techniques for training GANs. In Pro-
ing distributed control systems [9], as making and control systems methods
ceedings of the 30th International Conference
illustrated in Fig. 1, where the compo- address evolutionary dynamics while
on Neural Information Processing Systems. Red
nents are programmable engineered de- mitigating the uncertainty inherent in
Hook: Neural Information Processing Systems
vices. An example is in area coverage human decision making.
Foundation, 2016, 2234–42.
problems [15], where mobile sensors are
15. Cortes J, Martinez S and Karatas T et al. IEEE
to explore an unknown environment. The
main idea is to view each component as
FUNDING Trans Robot Autom 2004; 20: 243–55.
16. Park S, Martins N and Shamma J. From pop-
a player in a game. The system designer This work was supported by funding from King
Abdullah University of Science and Technology ulation games to payoff dynamics models: a
must endow these artificial players with
(KAUST). passivity-based approach. In 2019 IEEE 58th Con-
both incentives (i.e. utility functions) and
ference on Decision and Control (CDC). Piscat-
adaptive control laws (i.e. learning rules) Conflict of interest statement. None declared. away: IEEE, 2019, 6584–601.
that induce a desirable collective behav-
17. Cassandras CG. Engineering 2016; 2: 156–8.
ior through local interactions that re- Jeff S. Shamma
spect the underlying distributed decision Computer, Electrical and Mathematical Science National Science Review
architecture. and Engineering Division, King Abdullah University 7: 1118–1119, 2020
A second connection is that control of Science and Technology, Saudi Arabia doi: 10.1093/nsr/nwz163
theory offers new insights into the anal- E-mail: jeff.shamma@kaust.edu.sa Advance access publication 4 November 2019

NWZ 163

Uploaded by

Copyright:

Available Formats

NWZ 163

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NWZ 163

Uploaded by

Copyright:

Available Formats

1118 Natl Sci Rev, 2020, Vol. 7, No.

Special Topic: Games in Control Systems

Downloaded from https://academic.oup.com/nsr/article/7/7/1118/5610794 by guest on 24 May 2024

Downloaded from https://academic.oup.com/nsr/article/7/7/1118/5610794 by guest on 24 May 2024

You might also like