DISTRIBUTED MULTI-AGENT BASED TRAFFIC
MANAGEMENT SYSTEM
Balaji Parasumanna Gokulan
B.E., University of Madras
A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF
PHILOSOPHY
DEPARTMENT OF ELECTRICAL AND COMPUTER
ENGINEERING
NATIONAL UNIVERSITY OF SINGAPORE
2011
ACKNOWLEDGEMENTS
First and foremost, I would like to express my deepest gratitude to my supervisor,
Dr.Dipti Srinivasan without whose guidance, support, and encouragement it would
have been impossible for me to finish this work. I would like to thank
Dr.Lee Der-Horng and Dr. P.Chandrashekar for their help and guidance during my
research work.
I would also like to thank all my colleagues in the lab for making it an ideal
environment to perform research. My special thanks goes to Mr.Seow Hung Cheng,
who took extra effort to ensure all the facilities, equipments and software are available
to us at all time.
My stay in Singapore would not have been fun-filled without my friends. Some of my
friends who deserve a special mention are: Vishal Sharma, Krishna Agarwal, Krishna
Mainali,
R.P.Singh,
Sahoo
Sanjib
Kumar,
D.Shyamsundar,
Raju
Gupta,
J.Sundaramurthy, Anupam Trivedi and Atul Karande. The fun filled discussions
ranging from politics to movies at Technoedge canteen every evening, the intense
tennis sessions and joint music lessons we had together will stay as a sweet memory
for my entire lifetime.
I would like to thank my wife Soumini for her patience and support during the final
thesis writing phase. My acknowledgement would be incomplete without a special
mention of my parents and sister. I am greatly indebted to my parents and my sister
for their support and unconditional love they showered during my entire PhD studies.
Last but not least, I gratefully acknowledge the financial support offered by National
University of Singapore during the course of my postgraduate studies in Singapore.
i
TABLE OF CONTENTS
ABSTRACT
vii
LIST OF FIGURES
ix
LIST OF TABLES
xii
LIST OF DEFINITIONS
xiii
LIST OF ABBREVIATIONS
xiv
1
Introduction
1
1.1 Brief Overview of Multi-agent systems……..................................................4
1.2
Main objectives of the research......................................................................6
1.3 Main contributions ..........................................................................................6
1.4 Structure of dissertation............................................................................. .......8
2
Distributed multi-agent system
10
2.1 Notion of multi-agent system .......... .............................................................10
2.1.1 Multi-agent system..............................................................................15
2.2 Classification of multi-agent system............................................ ..................19
2.2.1 Agent taxonomy...................................................................................19
2.3 Overall agent organization ................................................. ...........................21
2.3.1 Hierarchical organization ........................................................ ............22
2.3.2
Holonic agent organization ................................................................24
2.3.3
Coalitions............................................................................................25
2.3.4 Teams .................................................................................................27
2.4 Communication in multi-agent system ............................ .............................29
2.4.1 Local communication .........................................................................29
2.4.2 Blackboards ........................................................................................30
2.4.3 Agent communication language .........................................................31
2.5 Decision making in multi-agent system ....................................... .................36
ii
2.5.1 Nash equilibrium .................................................................................39
2.5.2 The iterated elimination method .........................................................40
2.6 Coordination in multi-agent system ............................................................ 40
2.6.1 Coordination through protocol ............................................................42
2.6.2 Coordination via graphs ......................................................................44
2.6.3 Coordination through belief models ...................................................45
2.7 Learning in multi-agent system ....................................................................45
2.7.1 Active learning ...................................................................................46
2.7.2
Reactive learning ...............................................................................47
2.7.3
Learning based on consequences .......................................................48
2.8 Summary .......................................................................................................51
3
Review of advanced signal control techniques
52
3.1 Classification of traffic signal control methods ............................................52
3.1.1
Fixed time control ..............................................................................52
3.1.2 Traffic actuated control ......................................................................54
3.1.3 Traffic adaptive control ......................................................................57
3.1.3a SCATS/GLIDE ........................................................................59
3.1.3b SCOOT ....................................................................................62
3.1.3c MOTION .................................................................................64
3.1.3d TUC .........................................................................................65
3.1.3e UTOPIA/SPOT ........................................................................67
3.1.3f OPAC .......................................................................................69
3.1.3g PRODYN .................................................................................71
3.1.3h RHODES .................................................................................71
3.1.3i
Hierarchical Multiagent System (HMS) .................................73
3.2 Summary .......................................................................................... .............78
4
Design of proposed multi-agent architecture
79
iii
4.1 Proposed agent architecture . .........................................................................79
4.2 Data collection module ..................................................................................82
4.3 Communication module ................................................................................85
4.4 Decision module ............................................................................................88
4.5 Knowledge base and data repository module ................................................88
4.6 Action implementation module .....................................................................89
4.7 Backup module ..............................................................................................90
4.8 Summary .......................................................................................................90
5
Design of hybrid intelligent decision systems
91
5.1 Overview of type-2 fuzzy sets .......................................................................91
5.1.1 Union of fuzzy sets ........................................................................... ..96
5.1.2 Intersection of fuzzy sets .....................................................................96
5.1.3 Complement of fuzzy sets ....................................................................97
5.1.4 Karnik Mendel algorithm for defuzzification ......................................97
5.1.5 Geometric defuzzification ................................................................ ...98
5.2 Appropriate situations for applying type-2 FLS ..........................................100
5.3 Classification of the proposed decision systems .........................................101
5.4 Type-2 fuzzy deductive reasoning decision system ....................................101
5.4.1 Traffic data inputs and fuzzy rule base ..............................................102
5.4.2 Inference engine ................................................................................107
5.5 Geometric fuzzy multi-agent system ...........................................................110
5.5.1 Input fuzzifier ....................................................................................110
5.5.2 Inference engine ................................................................................114
5.6 Symbiotic evolutionary type-2 fuzzy decision system ................................118
5.6.1 Symbiotic evolution ..........................................................................120
5.6.2 Proposed symbiotic evolutionary GA decision system .....................123
5.6.3 Crossover ...........................................................................................129
iv
5.6.4 Mutation ............................................................................................129
5.6.5 Reproduction .....................................................................................130
5.7 Q-learning neuro-type2 fuzzy decision system ............................................131
5.7.1 Proposed neuro-fuzzy decision system .............................................133
5.7.2 Advantages of QLT2 decision system ...............................................138
5.8 Summary ......................................................................................................138
6
Simulation platform
140
6.1 Simulation test bed .......................................................................................140
6.2 PARAMICS .................................................................................................143
6.3 Origin-Destination matrix ............................................................................144
6.4 Performance metrics ....................................................................................148
6.4.1 Travel time delay ...............................................................................148
6.4.2 Mean speed ........................................................................................149
6.5 Benchmarks .................................................................................................150
6.6 Summary ......................................................................................................151
7
Results and discussions
152
7.1 Simulation scenarios ....................................................................................152
7.1.1 Peak traffic scenario ..........................................................................153
7.1.2 Events ................................................................................................153
7.2 Six hour, two peak traffic scenario ..............................................................154
7.3 Twenty four hour, two peak traffic scenario ................................................163
7.4 Twenty four hour, eight peak traffic scenario ..............................................170
7.5 Link and lane closures .................................................................................177
7.6 Incidents and accidents ................................................................................179
7.7 Summary ....................................................................................................183
8
Conclusions
185
v
8.1 Overall conclusions .....................................................................................185
8.2 Main contributions ......................................................................................187
8.3 Recommendation for future research work .................................................188
LIST OF PUBLICATIONS
191
REFERENCES
192
vi
ABSTRACT
Traffic congestion is a major recurring problem faced in many countries in the world
due to increased urbanization and availability of affordable vehicles. Congestion
problem can be dealt with in a number of ways – Increasing the capacity of the roads,
promoting alternate modes of transportation or making efficient use of the existing
infrastructure. Among these, the most feasible option is to improve the usage of
existing roads. Adjustment of the green time in signals to allow more vehicles to cross
the intersection has been the widely accepted method for solving congestion problem.
Green time essentially dictates the time during which vehicles are allowed to cross an
intersection, thereby avoiding conflicting movements of vehicles and improving
safety at an intersection.
Conventional and traditional traffic signal control methods have shown limited
success in optimizing the timings in signals due of the lack of accurate mathematical
models of traffic flow at an intersection and uncertainties associated with the traffic
data. Traffic flow refers to the number of vehicles crossing an intersection every hour.
The traffic environment is dynamic and traffic signal timings at one intersection
influences the traffic flow rate at the connected intersection. This necessitates the use
of hybrid computational intelligent models to predict the traffic flow and influence of
the neighbouring intersection signals on the green signal timings. Increased
communication overheads, reliability issues, data mining, and real-time control
requirements limits the use of centralized traffic signal controls. These limitations are
overcome by distributed traffic signal controls. However, a major disadvantage with
distributed signal control is the partial view of each computing entity involved in the
calculation of green time at an intersection. In order to improve the global view,
communication and learning capabilities needs to be incorporated in the computing
vii
entity to create a model of the neighbouring computing entities. Multi-agent systems
provide such an distributed architecture with learning and communication capabilities.
In this dissertation, a distributed multi-agent architecture capable of learning from the
traffic environment and communicating with the neighbouring intersections is
developed. Four computational intelligent decision systems with different internal
architectures were developed. First two approaches were offline trained methods
using deductive reasoning. The third approach was based on online batch learning
method to co-evolve the membership functions and rule base in type-2 fuzzy decision
system. The fourth decision system developed is an online shared reward Q-learning
based neuro-type2 fuzzy network.
Performance of the proposed multi-agent based traffic signal controls for different
traffic simulation scenarios were evaluated using a simulated urban road traffic
network of Singapore. Comparative analysis performed over the benchmark traffic
signal controls – Hierarchical Multi-agent Systems (HMS) and GLIDE (Green Link
Determine) indicated considerable improvement in travel time delay and mean speed
of vehicles when using proposed multi-agent based traffic signal control methods.
viii
LIST OF FIGURES
Figure 1.1: Typical three phase traffic signal cycle time indicating phase splits and
right of way ...................................................................................................................2
Figure 2.1: Typical Building Blocks of an Autonomous Agent ..................................15
Figure 2.2: Classification of a multi agent system based on different attributes .........21
Figure 2.3: A hierarchical agent architecture ..............................................................23
Figure 2.4: An example of superholon with nested holon resembling the hierarchical
multi agent system ......................................................................................................25
Figure 2.5: Coalition multi agent architecture with overlapping group................... ....27
Figure 2.6: Team based multi agent architecture with a partial view of the other agent
teams ..........................................................................................................................28
Figure 2.7: Message passing communication between agents .....................................30
Figure 2.8a: Blackboard communication between agents........................................ ....31
Figure 2.8b: Blackboard communication using remote the communication between
agents ..........................................................................................................................31
Figure 2.9: KQML – Layered language structure.................................................... ....35
Figure 2.10: Payoff matrix for the prisoner‟s dilemma problem .................................38
Figure 2.11: Modified payoff matrix for the prisoner‟s dilemma problem ..................40
Figure 3.1: Architecture of hierarchical multi agent system .......................................74
Figure 3.2: Internal neuro-fuzzy architecture of the decision module in zonal control
agent..............................................................................................................................76
Figure 4.1: Overall structure of the proposed multi agent system ...............................80
Figure 4.2: Internal structure of the proposed multi agent system ..............................81
Figure 4.3: Induction loop detectors at intersection ....................................................82
Figure 4.4: Working of induction loop detectors .........................................................82
Figure 4.5: FIPA query protocol .................................................................................87
Figure 4.6: Typical communication flow between agents at traffic intersection .........88
ix
Figure 5.1: Block Diagram of Type-2 fuzzy sets ........................................................92
Figure 5.2: Type-1 fuzzy Gaussian membership function ...........................................93
Figure 5.3: Type-2 fuzzy Gaussian membership function with fixed mean and varying
sigma ...........................................................................................................................94
Figure 5.4: Ordered coordinates geometric consequent set showing two of the closed
polygon .......................................................................................................................99
Figure 5.5: Block diagram of T2DR multi-agent weighted input decision system ...103
Figure 5.6: Antecedent and consequent membership function ..................................104
Figure 5.7: GFMAS agent architecture .....................................................................110
Figure 5.8: Block diagram of geometric type-2 fuzzy system ..................................112
Figure 5.9: Fuzzified antecedents and consequents in a GFMAS .............................113
Figure 5.10: Rule base for the GFMAS signal control .............................................115
Figure 5.11: Geometric defuzzification process based on Bentley-Ottman plane
sweeping algorithm ...................................................................................................118
Figure 5.12: Block diagram of symbiotic evolution complete solution obtained by
combing partial solutions .........................................................................................122
Figure 5.13: A representation of the islanded symbiotic evolutionary algorithm
population .................................................................................................................124
Figure 5.14: A block diagram representation of the symbiotic evolution in the
proposed symbiotic evolutionary genetic algorithm .................................................125
Figure 5.15: Structure of the chromosome for membership function cluster island ..126
Figure 5.16: Structure of chromosome of the rule base cluster island .......................127
Figure 5.17: Structure of the proposed neuro-type2 fuzzy decision system (QLT2) 135
Figure 5.18: Structure of type-2 fuzzy system with modified type reducer ...............137
Figure 6.1: Layout of the simulated road network of Central Business District in
Singapore ..................................................................................................................142
Figure 6.2: Screenshot of PARAMICS modeller software ........................................144
Figure 6.3: Snapshot of SCATS traffic controller and the controlled intersection ....145
Figure 6.4: Origin-Destination matrix indicating trip counts ....................................146
Figure 6.5: Traffic release profile for a three hour single peak traffic simulation ....147
x
Figure 6.6: Profile demand editor for a twenty four hour eight peak traffic simulation
scenario .....................................................................................................................148
Figure 7.1: Vehicle release profile for a six hour, two peak traffic scenario ............154
Figure 7.2: Mean travel time delay of vehicles for six hour, two peak traffic scenario
................................................................................................................................. ...160
Figure 7.3: Average speed of vehicle inside the network for six hour, two peak traffic
scenario ....................................................................................................................161
Figure 7.4: Total number of vehicles inside the road network for a six hour, two peak
traffic ........................................................................................................................162
Figure 7.5: Actual mean speed of vehicle inside the road network ...........................162
Figure 7.6: Vehicle release traffic profile for twenty four hour, two peak traffic
scenario ....................................................................................................................164
Figure 7.7: Total mean delay of vehicles for twenty four hour, two peak traffic
scenario ....................................................................................................................164
Figure 7.8: Average speed of vehicles inside the network for twenty four hour, two
peak traffic scenario .................................................................................................165
Figure 7.9: Vehicles inside the network for a twenty four hour, two peak traffic
simulation scenario ...................................................................................................166
Figure 7.10: Twenty four hour, eight peak traffic release profile .............................170
Figure 7.11: Total mean delay experienced for a twenty four hour, eight peak traffic
scenario .....................................................................................................................176
Figure 7.12: Mean speed of vehicles for a twenty four hour, eight peak traffic scenario
.. .................................................................................................................................176
Figure 7.13: Number of vehicles inside the network for a twenty four hour, eight peak
traffic scenario ..........................................................................................................177
Figure 7.14: Two lane closure – Mean travel time delay of vehicles ........................178
Figure 7.15: Single lane closure – Mean travel time delay of vehicles .....................179
Figure 7.16: Single incident simulation – Multiple peak traffic scenario ..................181
Figure 7.17: Two incidents simulation – Multiple peak traffic scenario ...................181
xi
LIST OF TABLES
Table 5.1: Mapping of flow and neighbour state inputs to consequents weighting
factor output ...............................................................................................................105
Table 5.2 : Mapping of flow and queue input to consequents green time output ......106
Table 7.1: Mean travel time delay and speed of vehicles for a six hour, two peak
traffic scenario ..........................................................................................................155
Table 7.2: Total number of vehicles inside the network at the end of each hour of
simulation for a six hour, two peak traffic scenario ..................................................157
Table 7.3: Standard deviation and confidence interval of the mean travel time delay
for six hour, two peak traffic scenario ......................................................................159
Table 7.4: Percentage improvement over HMS signal control ..................................163
Table 7.5: Comparison of mean delay, speed and number of vehicles for twenty four
hour, two peak traffic scenario ..................................................................................167
Table 7.6: Percentage improvement of travel time delay and speed over HMS control
for twenty four hour, two peak traffic scenario .........................................................168
Table 7.7: Standard deviation and confidence interval for a twenty four hour, two
peak traffic mean travel time delay ...........................................................................169
Table 7.8: Travel time delay of vehicles at the end of peak period for twenty four
hour, eight peak traffic scenario ................................................................................171
Table 7.9: Total mean speed of vehicle inside the network for twenty four hour, eight
peak traffic scenario ..................................................................................................172
Table 7.10: Vehicles inside the network for twenty four hour, eight peak traffic
scenario .....................................................................................................................172
Table 7.11: Standard deviation and confidence interval of travel time delay for twenty
four hour, eight peak traffic simulation ....................................................................174
Table 7.12:Percentage improvement of travel time delay and mean speed over HMS
signal control .............................................................................................................175
Table 7.13 : Comparison of the proposed signal control methods with HMS in terms
of computation and communication ........................................................................182
xii
LIST OF DEFINITIONS
Green time
Duration or period of time during which vehicles in a lane are
allowed to cross an intersection.
Phase
A signal phase can be defined as an unique set of traffic signal
movements, where a movement is controlled by a number of traffic
signal lights that changes colour at one time.
Cycle
The time required for one full cycle of signal indications, given in
seconds.
Cycle length
Time taken to complete all phases at an intersection. Cycle time
includes the green time, amber time and all red time of every phase
in use at an intersection.
Right of way
Lanes with green signals to allow the flow of vehicles.
Split
Total time allocated to each phase in a cycle. It is composed of
green time, amber or yellow time and all red time.
Offset
Time lag between the start of green time in a phase of signals at
nearby connected intersections to allow free flow of vehicles
without facing any red signal.
Saturation flow The maximum number of vehicles from a lane group that would
pass through the intersection in one hour under the prevailing traffic
and roadway conditions if the lane group was given a continuous
green signal for that hour.
Delay
The total stopped time per vehicle for each lane in the road traffic
network.
xiii
LIST OF ABBREVIATIONS
AI
Artificial Intelligence
MAS
Multi-Agent System
HMS
Hierarchical Multi-agent System
GLIDE
Green Link Determining system
T2DR
Type-2 Fuzzy Deductive Reasoning decision system
GFMAS
Geometric Fuzzy Multi-Agent System
QLT2
Q-Learning neuro-Type2 fuzzy decision system
QLT1
Q-Learning neuro-Type1 fuzzy decision system
SET2
Symbiotic Evolutionary Type-2 fuzzy decision system
GAT2
Genetic algorithm tuned Type-2 fuzzy decision system
SCATS
Sydney Coordinated Adaptive Traffic System
SCOOT
Split Cycle Offset Optimization Technique
FIPA
Foundation for Intelligent Physical Agents
ACL
Agent Communication Language
RL
Reinforcement Learning
xiv
CHAPTER 1
INTRODUCTION
Traffic congestion is a major recurring problem faced in many countries due to the
increased level of urbanization and the availability of cheaper vehicles. One of the
options to reduce congestion is to construct newer infrastructure to accommodate the
increased vehicle count. However, it is highly infeasible in developing countries
where space is a major constraint. Second most feasible option is improving the usage
of the existing roads through optimization of traffic signal timings. This can alleviate
the congestion levels experienced at intersections by evenly distributing the travel
delay among all the vehicles, thereby reducing the travel time of vehicles inside the
road network and providing a temporal separation for vehicles with right of way in a
link.
Traffic signal controls the movement of traffic by adjusting the split of each phase
assigned in a total cycle time and by modifying the offset. Split refers to the total time
allocated to each phase in a cycle, right of way refers to the lanes with green signal
and allowable movement during a specific phase, and offset is the time lag between
the start of green time for successive intersections, which is required to ensure a free
flow of vehicles (progression) with minimum wait time along a specific direction. The
breakdown of a three-phase cycle at an intersection is shown in Figure 1.1 to elucidate
the terms split, phase, cycle length, offset, progression and right of way.
1
Signal 2
Cycle length
Signal 1
Signal 3
offset
n
sio
Ve
le
hi c
og
Pr
res
Distance between
intersections
Figure 1.1. Typical three phase traffic signal cycle time indicating phase splits and
right of way
Traffic signal timing optimization or split adjustment to change the green time of a
phase maximizes the throughput of the vehicles at the controlled intersection and
helps in maintaining the degree of saturation of all the links connected to the
intersection without compromising the safety of vehicles inside the road network.
Computing an optimal value of green time in a phase is an extremely complex task as
the signal timings at the intersection affects the traffic flow in the connected
intersections.
Early traffic signal control schemes were typically designed for isolated intersections,
as these form the basic components of road traffic network and can be easily
modelled. Based on the type of control used, the traffic signal controls can be
classified into three types:
Pre-timed or Fixed control
Traffic responsive Control
Traffic adaptive control
2
One of the first mathematical models developed for calculating the green time with
an objective to reduce the average delay experienced by vehicles inside a road
network was proposed in [1] and formed the basis for the fixed time traffic signal
controls. The green time of each phase in a signal was calculated offline using
historical traffic flow pattern collected from the urban arterial roads. The designed
traffic controller was not capable of handling any sudden variations in the traffic from
the pattern used to calculate the green time. Further, offline estimation methods are
prone to losses when switching between signal plans, especially with rapid traffic
changes.
In order to overcome these limitations, traffic responsive methods that changes the
signal timings based on the traffic experienced at the intersection were introduced.
Though these signal controls improved the traffic congestion over fixed time signal
controls, lack of ability to foresee the traffic condition, faulty sensors, and
environmental conditions affect its performance.
Traffic adaptive methods are intelligent traffic signal control methods with an ability
to predict the traffic flow and adjust the timings online. Based on the type of
architecture used, the traffic adaptive methods can be classified into two types.
Centralized control
Semi-distributed control
Distributed Control
Centralized traffic signal controls determine the network wide signal timings at a
central location. The traffic data collected from each intersection is sent to a central
server that compute the timings required at each intersection for the specific traffic
3
flow experienced at the intersection. Centralized traffic controls require large amount
of traffic data to be communicated from the intersection to the control centre. This
increases the communication overhead to a large extent. Further, raw data sent from
the intersection needs to be sorted and ordered according to the phase timing
calculation thereby increasing the computational overhead. The performance is also
affected because of the traffic data loss and addition of noise to the data.
Semi-distributed traffic signal controls improved the reliability of the traffic signal
controls by using hierarchical structure. Though the communication cost is lesser than
in centralized control, the cost is still substantially high. With increase in the traffic
network size, the control becomes complex and difficult to handle.
In distributed traffic signal controls, traffic signal at each intersection needs to be
controlled by a computing entity. The signal timings for the intersection are computed
autonomously using the local data collected from the sensors connected to the
intersection. However, the restricted view of the sensors limits the traffic view
available to each computing entity. In order to improve the global traffic view and
improve the performance of the signal control, the controls need to learn,
communicate, and adapt dynamically. This requirement is satisfied by the multi-agent
systems with hybrid computational intelligent decision systems with communication
capabilities. Computational intelligent methods are required as only approximate
mathematical models of traffic flow at an arterial intersection are available.
1.1.
BRIEF OVERVIEW OF MULTI-AGENT SYSTEMS
An agent can be viewed as a self-contained, concurrently executing thread of control
that encapsulates some state, and communicates with its environment, and possibly
4
other agents through some sort of message passing [2] between agents. Agent-based
systems offer advantages where independently developed components must
interoperate in a heterogeneous environment, e.g., the internet. Agent-based systems
are increasingly applied in a wide range of areas including telecommunications, BPM
(Business process modelling), computer games, distributed system control and robotic
systems. The significant advantage of the agent system in contrast to simple
distributed problem solving is that the environment is an integral part of the agent.
Multi-Agent Systems(MAS) is a branch of distributed artificial intelligence that
emphasizes the joint behaviour of agents with some degree of autonomy and
complexities arising from their interactions. Multi-agent systems allow the subproblems of a constraint satisfaction problem to be subcontracted to different problem
solving agents with their own individual interests and goals. This increases the speed
of operation, creates parallelism, and reduces the risk of system collapse due to single
point failure. Though generalized multi-agent platform could be used for solving
different problems, it is a common practise to design a tailor made multi-agent
architecture according to the application. Multi-agent systems are able to
synergistically combine the various computational intelligent techniques for attaining
a superior performance by combining the advantages of various techniques into a
single framework. MAS also provides extra degree of freedom to model the behaviour
of the system to be as competitive or coordinating, with each method having its own
merits and demerits.
5
1.2.
MAIN OBJECTIVES OF THE RESEARCH
The main objective of this dissertation is to develop a new distributed, multiple
interacting autonomous agent based traffic signal control architecture to provide
effective traffic signal optimization strategies for online optimization of the signal
timings for arterial road traffic network.
The objective is also to develop an effective distributed online and batch learning
method for optimization of the signal phase timings and rule base adaptation by
integrating well-known computational intelligent techniques in the agent decision
system. In doing so, this dissertation also seeks to create useful generalized multiagent systems for solving problems similar to the distributed traffic signal control.
Apart from the objectives related to MAS and traffic signal control, this dissertation
also seeks to develop an efficient computational intelligent method of type-reduction
to reduce the complexity associated with type-2 fuzzy inference mechanism.
1.3.
MAIN CONTRIBUTIONS
The main contributions of this research are in the conceptualization, development and
application of a distributed multi-agent architecture to urban traffic signal timing
optimization problem. The significant contributions in the design front are as follows.
The development of a generalized distributed multi-agent framework with
hybrid computational intelligent decision making capabilities for
homogeneous agent structure.
6
The development of deductive reasoning method for the construction of
membership functions, rule base of type-2 fuzzy sets and calculating the
level of cooperation required between agents.
The development of cooperation strategies in multi-agent system through
internal belief model by incorporating communicated neighbour agent
status information.
The development of symbiotic evolutionary learning method for
coevolving membership functions and rule base for the type-2 fuzzy
decision system.
The development of modified Q-learning technique with shared reward
values for solving distributed urban traffic signal control problem.
The development and relocation of the modified type-reducer using neural
networks to reduce the computational complexity associated with sorting
and defuzzification process in interval type-2 fuzzy sets.
The development of traffic simulation scenarios to test the reliability and
responsiveness of the developed traffic signal controls.
The developed multi agent decision system produced promising results from
experiments conducted on simulated road traffic network for different traffic
simulation scenarios.
7
1.4.
STRUCTURE OF DISSERTATION
The dissertation consists of eight chapters, and is organized as follows:
Chapter 1 gives a brief introduction of the background on traffic control problem,
multi-agent system, the research objectives and the main contributions.
Chapter 2 provides a detailed discussion on distributed multi agent system. It provides
a classification of the multi agent system based on the overall agent architecture. The
merits and demerits of the various architectures are discussed followed by a
description of the communication and coordination techniques used in multi agent
systems. It also provides a brief overview of the learning techniques used for evolving
the agents to better adapt to the changes in environment.
Chapter 3 describes the various problems associated with urban traffic signal control
and some of the promising solution to these problems. A brief overview of the various
traffic signal timing optimization methods and their workings are presented. The
benchmark
traffic
signal
optimization
methods
(Hierarchical
multi
agent
system(HMS) and Green link determining system (GLIDE)) used for validating the
proposed agent based traffic control system are discussed.
Chapter 4 introduces the proposed distributed multi agent architecture for urban
traffic signal timing optimization. The internal structure of the agents and the
functionality of each block in an agent are discussed in detail.
Chapter 5 introduces four different types of decision systems used in the proposed
multi-agent based traffic signal control. A brief overview of the type-2 fuzzy sets and
8
symbiotic evolutionary genetic algorithm are presented. Design of the decision system
based on deductive reasoning, symbiotic evolutionary learning, and Q-learning
method is presented in detail. The advantages and disadvantages of the proposed
decision systems are highlighted.
Chapter 6 describes in detail, the modelling of a large, complex urban traffic network,
Central Business District of Singapore using PARAMICS modeller software. Details
of creating the origin-destination matrix used for trip assignment and routing of
vehicles inside the simulated road network using the data collected is presented. This
chapter provides details of using profile editor to create the traffic release pattern for
simulation runs. It also details the performance metrics used to evaluate the
performance of the proposed multi-agent systems.
Chapter 7 details the various simulation scenarios used to test the proposed multi
agent systems. The travel time delay and speed of vehicles inside the road network for
various traffic scenarios using different multi-agent decision control strategies are
compared. A detailed analysis of the results and the improvements achieved using
proposed signal controls over benchmark traffic controllers are presented.
Chapter 8 concludes the thesis and provides recommendations for future research
work.
9
CHAPTER 2
DISTRIBUTED MULTI-AGENT SYSTEMS
In the previous chapter, a brief introduction of the traffic signal timing optimization
problem and suitability of distributed control methods in solving the problem was
presented. In order to construct an efficient distributed autonomous multi-agent traffic
signal control system with all the required functionalities, it is essential to identify the
proper architecture, communication protocol, coordination mechanism and learning to
be used.
This chapter provides a detailed review of distributed multi-agent systems, and their
architecture, taxonomy, decision making , communication requirements, coordination
techniques, and learning methods. This forms the basis for proper design,
conceptualisation and implementation of multi-agent systems for real world
applications. This chapter also discusses in detail the advantages and disadvantages of
various multi-agent architectures, their implementation methodologies, and highlights
the significant contributions made by researchers in this field.
2.1
NOTION OF MULTI-AGENT SYSTEMS
Distributed artificial intelligence (DAI) is a subfield of Artificial Intelligence [3] that
has gained considerable importance due to its ability to solve complex real-world
problems. The primary focus of research in the field of distributed artificial
intelligence have been in three different areas. These are parallel AI, Distributed
problem solving (DPS) and Multi-agent systems (MAS). Parallel AI primarily refers
10
to methodologies used to facilitate classical AI [4-10]techniques applied to distributed
hardware architectures like multiprocessor or cluster based computing.
The main aim of parallel AI is to develop parallel computer architectures, languages
and algorithms to increase the speed of operation. Parallel AI is primarily directed
towards solving the performance problems of AI systems and not with the conceptual
advances in understanding the nature of reasoning and intelligent behaviour among
group of agents. Distributed problem solving is similar to parallel AI but considers
methodologies of solving a problem by sharing the resources and knowledge between
large number of cooperating modules known as “Computing entity”. In distributed
problem solving, communication between computing entities, quantity of information
shared are predetermined and embedded in design of the computing entity.
Distributed problem solving is rigid due to the embedded strategies and consequently
offers little or no flexibility.
In contrast to distributed problem solving, Multi-agent systems (MAS) [11-13] deal
with the behaviour of computing entities available to solve a given problem. Multiagent research is concerned with coordinating intelligent behaviour among all agents–
methodology to coordinate the knowledge, goals, skills and plans jointly to solve a
problem. In a multi-agent system each computing entity is referred to as an agent.
MAS can be defined as a network of individual agents that share knowledge and
communicate with each other in order to solve a problem that is beyond the scope of a
single agent. It is imperative to understand the characteristics of the individual agent
or computing entity to distinguish a simple distributed system from a multi-agent
system.
11
A system with one agent is usually referred to as conventional artificial intelligence
technique and a system with multiple agents are called as artificial society. Since
distributed systems involve multiple agents, the main issues and the foundations of
distributed artificial intelligence are the organisation, co-ordination, and cooperation[14] between the agents.
Multi-agent systems are at the confluence of a wide variety of research disciplines and
technologies, notably artificial intelligence, object-oriented programming, humancomputer interfaces, and networking[15, 16] . Some of the technologies that have
influenced the development of multi-agent systems are as follows
Database and knowledge-base technology
Concurrent computing
Cognitive sciences
Computational linguistics
Econometric models
Biological immune systems
As a result of the existence of such a diversity of contribution, the agents and the
multi-agent systems paradigm are diluted in a multitude of perspectives. Researchers
in the field of artificial intelligence have so far failed to agree on a consensus
definition of the word "Agent". The first and foremost reason for this is the
universality of the word “Agent”. It cannot be owned by a single community.
Secondly, the agents can be present in many physical forms, from robots to computer
networks. Thirdly, the application domain of the agent is vastly varied and is
12
impossible to generalize. Researchers have used terms like softbots (software agents),
knowbots (Knowledge agents), taskbots (task-based agents) based on the application
domain where the agents are employed [17]. The most agreed definition of agent is
that of Russell and Norvig. They define an agent as a flexible autonomous entity
capable of perceiving the environment through the sensors connected to it. The agents
act on the environment through actuators. The definition provided does not cover the
entire range of characteristics that an agent should possess. Sycara [15] presented
some of the most important characteristics that define an agent and are as follows.
Situatedness: This refers to the interaction of an agent with the environment
through the use of sensors, and the resultant actions of the actuators.
Environment in which an agent is present is an integral part of its design. All
of the inputs are received directly as a consequence of the agents interactions
with its environment. The agent's directly act upon the environment through
the actuators and do not serve merely as a meta level advisor. This attribute
differentiates agent systems from expert systems, where the decision making
node or entity suggests changes through a middle agent without directly
influencing the environment.
Autonomy: This can be defined as the ability of an agent to choose its actions
independently without external intervention by other agents in the network (in
case of multi-agent systems) or human interference. These attribute protect the
internal states of an agent from external influence. It also isolates an agent
from instability caused by external disturbances.
13
Inferential capability: The ability of an agent to work on abstract goal
specifications such as deducing an observation by information generalization.
This could be done by mining relevant information from the available data.
Responsiveness: The ability to perceive the condition of an environment and
respond to it in a timely fashion to take account of any changes in the
environment. This latter property is of critical importance in real-time
applications.
Pro-activeness: Agents must exhibit a good response to opportunistic
behaviour. This is to improve the actions that are goal-directed rather than
only being responsive to a specific change in the environment. Agents must
have the ability to adapt to any changes in the dynamic environment.
Social behaviour: Even though the agent‟s decision must be free from external
intervention, it must still be able to interact with external sources when the
need arises, to achieve a specific goal. It must also be able to share this
knowledge and help other agents (MAS) solve a specific problem. That is,
agents must be able to learn from the experience of other communicating
entities which may be human, other agents in the network, or statistical
controllers.
Apart from the above mentioned properties, some of the other important
characteristics are mobility, temporal continuity, veracity, collaborative behaviour and
rationality. If the agent can satisfy only some of the above mentioned properties like
autonomy, social ability, reactivity and pro-activeness, the agent is said to exhibit a
weak notion of agency[18].
14
For an agent to have a strong notion of agency, in addition to the above properties, the
agent is required to conceptualise or implement concepts that are more applicable to
human like knowledge, belief, intention, obligation or emotion. Another way of
giving agents human-like attributes is to represent them visually as animated
characters in applications involving human machine interactions. Strong notion of
agency tends to be the intersection of all the aspects of different fields that influence
the multi-agent systems.
communication
Perception
Reasoning / Inference engine
Reactive
Belief
Model
Goals
History
utility
Action
Figure 2.1. Typical building blocks of an autonomous agent
It is however extremely difficult to characterize agents based only on these properties.
The characterization of an agent must also be based on the complexity involved in the
design, the performed function, and the rationality exhibited. A typical building block
of an autonomous agent is shown in Figure 2.1.
2.1.1 Multi-agent System
A Multi-Agent System (MAS) is an extension of the basic agent technology.
Definition of multi-agent system can be obtained by the extension of the definition of
distributed problem solvers [19] and can be defined as a loosely coupled network of
autonomous agents that work together as a society aiming at solving problems that
15
would generally be beyond the problem solving capability of an individual agent.
According to [20], the characteristics of a multi-agent systems are:
Each agent has incomplete information or capabilities for solving the overall
problem to be tackled by the system and thus has a very limited viewpoint.
Lack of global control - The behaviour of the system is influenced by the
collective behaviour of individual agents actions and their experiences.
Decentralization of resources.
Multi-agent systems have been widely adopted in many application domains because
of the benefits it offers. Some of the advantages of using MAS technology in large
systems [21] are the following:
An increase in the speed and efficiency of operation due to parallel
computation and asynchronous operation.
A graceful degradation of the system when one or more of the agents fail. It
thereby increases the reliability and robustness of the system.
Scalability and flexibility- Agents can be introduced dynamically into the
environment.
Reduced cost- This is because individual agents cost much less than a
centralized architecture.
Reusability - Agents have a modular structure and hence can be easily reused
without major modifications in other systems or upgraded more easily than a
monolithic system.
16
Though multi-agent systems have features that are more beneficial than single agent
systems, they also present some critical challenges. Some of the challenges are
highlighted in the following section.
Environment: In a multi-agent system, the action of an agent not only modifies
its own environment but also that of its neighbours. This necessitates that each
agent must predict the action of the other agents, in order to decide the optimal
action that would be goal directed. This type of concurrent learning could
result in non-stable behaviour and can possibly cause chaos. The problem is
further complicated if the environment is dynamic. In such conditions, each
agent needs to differentiate between the effects caused due to actions of other
agents and variations in the environment.
Perception: In a distributed multi-agent system, the agents are scattered all
over the environment. Each agent has a limited sensing capability because of
the limited range and coverage of the sensors connected to it. This limits the
view available to each of the agents‟ in the environment. Therefore decisions
based on the partial observations made by each of the agents‟ could be suboptimal, which in turn affects the global objective.
Abstraction: In agent system, it is assumed that an agent knows its entire
action space and mapping of the state space to action space could be
performed by the experience gained by each agent. In MAS, every agent does
not experience all of the states. To create a map, it must be able to learn from
the experience of other agents with similar capabilities or decision making
powers. In the case of cooperating agents with similar goals, this can be done
by creating communication channel between the agents. In case of competing
17
agents it is not possible to share the information as each of the agent tries to
increase its own chance of winning. It is therefore essential to quantify how
much of the local information and the capabilities of other agent must be
known to create an improved modelling of the environment.
Conflict resolution: Conflicts stem from the lack of global view available to
each of the agent. An action selected by an agent to modify a specific internal
state may be ineffective for another agent. Under these circumstances,
information on the constraints, action preferences and goal priorities of agents
must be shared to improve cooperation. A major problem is knowing when to
communicate this information and to which of the agents.
Inference: In a single agent system, inference could be easily drawn by
mapping the State Space to the Action Space based on trial and error methods.
However in MAS, this is difficult as the environment is being modified by
multiple agents that may or may not be interacting with each other. Further,
MAS may consist of heterogeneous agents, that is agents having different
goals and capabilities. Instead of exhibiting a cooperative behaviour, the
agents might be competing with each other for a resource. This necessitates
identifying a suitable inference mechanism according to the capabilities of
each agent to achieve global optimal solution.
It is not necessary to use multi-agent systems for all applications. Some specific
application domains which may require interaction with different people or
organizations
having conflicting or common goals can utilize the advantages
presented by MAS in its design.
18
2.2
CLASSIFICATION OF MULTI-AGENT SYSTEM
Classification of MAS is a difficult task as it can be done based on several different
attributes such as Architecture [22], Learning [23-25], Communication [22],
Coordination [26]. A general classification encompassing most of these features is
shown in Figure 2.2.
2.2.1
Agent Taxonomy
Based on the internal architecture of the individual agents in a multi-agent system, it
may be classified into two types:
1)
Homogeneous structure
Heterogeneous structure
Homogeneous structure
In a homogeneous architecture, all agents in the multi-agent system have similar
internal architecture. Internal architecture refers to the local goals, sensor capabilities,
internal states, inference mechanism and possible action states [27]. The difference
between the agents is its physical location and the part of the environment where the
specified action is implemented. Each agent receives its inputs from different parts of
the environment. There may be overlap in the sensor inputs received. In a typical
distributed environment, overlap of sensory inputs is rarely present [28].
2)
Heterogeneous structure
In a heterogeneous architecture, the agents may differ in their ability, structure, or
functionality [29]. Based on the dynamics of the environment and the location of the
particular agent, the actions chosen by an agent may differ [30]from the agent located
19
in a different part with the same functionality. Heterogeneous architecture helps in
modelling applications much closer to real-world [31]. Each agent can have different
local goals that may contradict the objective of other agents. A typical example of this
can be seen in the Predator-Prey game [31]. Here both the prey and the predator can
be modelled as agents. The objectives of the prey and predator agents are likely to be
in direct contradiction to one other.
20
Agent system
Architecture
Internal
Homogeneous
Agent
Characteristics
Protocol
KAOS
FIPA
Reasoning
Heterogeneous
Learning
Multiagent
Perception
Complete
Partial
fixed
Hierarchy
Action
Communication
Local
Network
Holonic
Adaptive
Coalition
Active
Team
Reactive
consequencebased
Mobile
Negotiation
Method
Blackboard
Broker
Mediator
Goals
Single
Multiple
Figure 2.2 . Classification of a multi agent system based on the use of different
attributes
2.3
OVERALL AGENT ORGANIZATION
Classification of the multi-agent system based on the organisational paradigm gives a
great insight of the strengths and weaknesses of the various types of agent
organizations. Based on the organisation structure, the multi-agent system can be
classified into four major categories, namely
Hierarchical
21
2.3.1
Holonic
Coalitions
Teams
Hierarchical organization
Hierarchical Organization [32] is one of the earliest organizational design in multiagent systems. Hierarchical architecture has been applied to a large number of
distributed problems. In the hierarchical agent architecture, the agents are arranged in
a typical tree like structure. The agents at different levels on the tree structure have
different levels of autonomy. The data from the lower levels of hierarchy typically
flow upwards to agents with a higher hierarchy. The control signal or supervisory
signals flow from higher to a lower level of hierarchy [33]. Figure 2.3 shows a typical
Three Hierarchical Multi-Agent Architecture. The flow of control signals is from a
higher to lower priority agents.
According to the distribution of control between the agents, hierarchical architecture
can be further classified as being a simple or uniform hierarchy.
Simple Hierarchy: In a simple hierarchy [34], the decision making authority is
bestowed to a single agent at the highest level of the hierarchy. The problem with a
simple hierarchy is that a single point failure of the agent in the highest hierarchy may
cause the entire system to fail.
Uniform Hierarchy: In a uniform hierarchy, the authority is distributed among the
various agents in order to increase the efficiency and fault tolerance in the event of a
single or multi-point failures. Decisions are made only by agents with appropriate
22
amount of information. These decisions are sent up the hierarchy only where there is a
conflict of interest between agents at different levels of hierarchy.
[33] provides an example of a uniform hierarchical multi-agent system applied to an
urban traffic signal control problem. The objective is to provide a distributed control
and computation of traffic signal timings. This is to reduce the total delay time
experienced by vehicles in a road network. In [32],a three level hierarchical multiagent system (HMS) was developed. The agents at the lowest level of hierarchy is the
intersection agents. Each signal is modelled as an agent and decide their actions
autonomously. The zonal agents are one level above the intersections agents in the
hierarchy and communicates with a group of intersections. Zonal agents in turn
communicate with a central supervisory Regional agent, which occupies the top
position in the hierarchy. The intersection decides the optimal green time. This is
based on the local information collected at each of the intersections. The agents at the
higher level of the hierarchy modify decision of the lower hierarchical agents if there
is a conflict of interest or the overall delay experienced by a group of intersections
increases due to a selected action. Here, the overall control is uniformly distributed
among the agents. Disadvantage with uniform hierarchy, is the amount and the type of
information to be transmitted to the agents at higher level of hierarchy. This is a nontrivial problem which gets complicated as the network size increases.
Level 1
A1
A2
A5
A3
A6
A7
Level 2
A4
A8
A9
A10
Level 3
Figure 2.3. A Hierarchical Agent Architecture
23
2.3.2
Holonic agent organization
A 'Holon' is a stable and coherent or fractal structure that consists of several 'holons'
as its sub-structure and is itself a part of a larger framework. The concept of a holon
was proposed by Arthur Koestler [35] to explain the social behaviour of biological
species. However, the hierarchical structure of the holon and its interactions have
been used to model large organizational behaviours in manufacturing and business
domains [36-38].
In a holonic multi-agent system, an agent that appears as a single entity may be
composed of many sub-agents bound together by commitments. The sub-agents are
not bound by any hard constraints or pre-defined rules but through commitments.
These refer to the relationships agreed to by all of the participating agents inside the
holon.
Each holon appoints or selects a Head Agent that can communicate with the
environment or with other agents located in the environment. Selection of the head
agent is usually based on the resource availability, communication capability and the
internal architecture of each agent. In a homogeneous multi-agent system, the
selection can be random and a rotation policy similar to the policy used in distributed
wireless sensor networks is employed. In the heterogeneous architecture, head
selection is based on the capability of each agent. The holons formed may group
further in accordance to benefits foreseen in forming a coherent structure. They form
Superholons. Figure 2.4.
shows a Superholon formed by grouping two holons.
Agents A1 and A4 are the heads of the holons and communicate with agent A7, which
is the head of the superholon. The architecture appears to be similar to that of
24
hierarchical organization. However in holonic architecture, cross tree interactions and
overlapping group formations are allowed.
The superiority of holonic multi-agent organization and the performance
improvements achieved while using holonic group was demonstrated in [38]. The
abstraction in the internal working structure of holons provides an increased degree of
freedom in selecting the behaviour. A major disadvantage [39] is the lack of a model
or knowledge of the internal architecture of the holons. This makes it difficult for
other agents to predict the resulting actions of the holons .
A2
A3
A1
A5
A6
A7
A4
Figure 2.4. An example of Superholon with Nested Holons resembling the
Hierarchical MAS
2.3.3 Coalitions
In coalition architecture, a group of agents come together for a short time to increase
the utility or performance of the individual agents in a group. The coalition ceases to
exist when the performance goal is achieved. Figure 2.5. shows a typical coalition
multi-agent system. The agents forming the coalition may have either a uniform or a
hierarchical architecture. Even when using a uniform architecture, it is possible to
have a leading agent to act as a representative of the coalition group. The overlap of
25
agents among coalition groups is allowed as this increases the common knowledge
within the coalition group and helps in the construction of belief model. However the
presence of overlapping agents increases the complexity in computation of the
negotiation strategy. Coalition architecture is difficult to maintain in a dynamic
environment due to the shift in the performance of coalition group. It may be
necessary to regroup agents in order to maximize the system performance.
Theoretically, forming a single group consisting of all the agents in the environment
will maximize the performance of the system. This is because each agent has access
to all the information and resources necessary to calculate the conditions for optimal
action. It is practically not feasible to form such a coalition due to constraints on the
communication and resources.
The number of coalition groups created must be minimized in order to reduce the cost
associated with creating and dissolving a coalition group. The group formation may
be pre-defined based on a threshold set for performance measure or alternatively
could be evolved online.
In [40], a coalition multi-agent architecture for urban traffic signal control was
developed. Each intersection was modelled as an agent with capability to decide the
optimal green time required for that intersection. A distributed neuro-fuzzy inference
engine was used to compute the level of cooperation required and the members of the
dynamically formed coalition group.
The coalition groups reorganize and regroup dynamically with respect to the changing
traffic input pattern. Disadvantage is the increased computational complexity involved
in creating ensembles or coalition groups. The coalition MAS may have a better short
term performance than other agent architectures [41].
26
Coalition 2
A2
A1
A4
Coalition 1
A3
Coalition 3
A5
A7
A6
A8
A9
A10
A11
A12
Overlap
coalition
Coalition 4
Figure 2.5. Coalition multi agent architecture using overlapping groups
2.3.4 Teams
Team MAS architecture [42] is similar to coalition architecture in design except that
the agents in a team work together to increase the overall performance of the group
rather than each working as individual agents. The interactions between agents within
a team can be quite arbitrary, and the goals or the roles assigned to each of the agents
can vary with time based on improvements resulting from the team performance.
Reference [43] deals with a team based multi-agent architecture having a partially
observable environment. In other words, teams that cannot communicate with each
other has been proposed for the Arthur's bar problem. Each team decides on whether
to attend a bar or not by means of predictions based on the previous behavioural
pattern and the crowd level experienced which is the reward or the utility received
associated with the specific period of time. Based on the observations made in [43], it
can be concluded that a large team size is not beneficial under all conditions.
27
Consequently some compromise must be made between the amount of information,
number of agents in the team and the learning capabilities of the agents.
Large teams offer better visibility of the environment and larger amount of relevant
information. However, learning or incorporating the experiences of individual agents
into a single framework team is affected. A smaller team size offers faster learning
possibilities but result in sub-optimal performance due to a limited view of the
environment. Tradeoffs between learning and performance need to be made in the
selection of the optimal team size. This increases the computational cost much greater
than that experienced in coalition multi-agent system architecture. Figure 2.6. shows a
typical team based on architecture with partial view. The team 1 and 3 can see each
other but not teams 2 ,4 and vice versa. The internal behaviour of the agents and their
roles are arbitrary and vary with teams even in homogeneous agent structure.
Team 4
Team 1
A2
A6
A1
A5
A4
A3
A8
A7
Team 2
A10
A9
A12
A11
Team 3
A14
A13
A16
A15
Figure 2.6. Team based multi agent architecture with a partial view of the other teams
Variations and constraints on aspects of the four agent architecture mentioned earlier
in this chapter can form other architectures such as federations, societies and
congregations. Most of these architectures are inspired by behavioural patterns in
28
governments, institutions and large industrial organizations. A detailed description of
these architectures, their formation and characteristics may be found in [42].
2.4
COMMUNICATION IN MULTI-AGENT SYSTEMS
Communication is one of the crucial components in multi-agent systems that needs
careful consideration. Unnecessary or redundant intra-agent communication can
increase the cost and cause instability. Communication in a multi-agent system can be
classified into two types. This is based on the architecture of the agent system and the
type of information communicated between the agents. In [22], the various issues with
homogeneous and heterogeneous MAS architecture has been described and
demonstrated using a predator/prey and robotic soccer games. On the basis of the
information communication between the agents [44], MAS can classified as local
communication or message passing and network communication or Blackboard.
Mobile communication can be categorized into class of local communication.
2.4.1 Local communication
Local communication lack memory and intermediate communication media to store
information and act as a facilitator respectively. The term message passing is used to
emphasize the direct communication between the agents. Figure 2.7. shows the
structure of the message passing communication between agents. In this type of
communication, the information flow is bidirectional. It creates a distributed
architecture and reduces the bottleneck caused by failure of central agents. This type
of communication has been used in [33, 45, 46].
29
2.4.2
Blackboards
Another way of exchanging information between agents is through Blackboards [47].
Agent-based blackboards, like federation systems, use grouping to manage the
interactions between agents. In blackboard communication, a group of agents share a
data repository used for efficient storage and retrieval of data actively shared between
the agents. The repository can hold both the design data as well as the control
knowledge and are accessible to the agents. The type of data that can be accessed by
an agent can be controlled through the use of a control shell. This acts as a network
interface that notifies the agent when relevant data is available in the repository. The
control shell can be programmed to establish different types of coordination among
the agents. Neither the agent groups nor the individual agents in the group need to be
physically located near the blackboards. It is possible to establish communication
between various groups by remote interface communication. The major issue is the
loss of critical information due to the failure of blackboards. This could render the
group of agents useless depending on the information stored in the specific
blackboard. However, it is possible to establish some redundancy and share resources
between various blackboards. Figure 2.8a. shows a single blackboard with the group
of agents associated with it. Figure 2.8b. shows blackboard communication between
two different agent groups and also the location of facilitator agents in each group.
Agent 1
Agent 2
Agent 3
Figure 2.7. Message Passing Communication between agents
30
Agent 1
Information sharing
Agent 3
Black
board
Agent 4
Agent 2
Figure 2.8a. Blackboard type communication between agents
Agent 1
Agent 2
Control
Shell
Agent 3
Agent 4
Control
Shell
Network
Interface
Blackboard
1
Network
Interface
Blackboard
2
Remote communication
Figure 2.8b. Blackboard communication using remote communication between agent
groups
2.4.3
Agent Communication Language
An increase in the number of agents and the heterogeneity of the group necessitates a
common framework to help in proper interaction and information sharing. This
common framework is provided by the agent communication languages (ACL). The
elements that are of prime importance in the design of ACL were highlighted in [48,
49].
Agent common language provides the necessary interaction format (Protocol) that can
be understood by all of the participating agents. The communication language also
31
provides a shared ontology where the message communicated has the same meaning
in all contexts and allows agent independent semantics. In order to perform the task of
agents effectively, agents depend heavily on expressive communication with other
agents to perform the requests, to propagate the information capabilities, and to
negotiate with other agents. Designing a proper communication language has two
major problems.
Inconsistencies in the use of syntax or vocabulary. Same words could have
entirely different or even conflicting meanings with respect to different agents
Incompatibilities between different programs using different words or
expressions to convey the same information.
There are two popular approaches in the design of an agent communication language,
procedural approach and declarative approach. In Procedural approach, the
communication between the agents is modelled as sharing of procedural directives.
Procedural directives could be task specific working instructions or general working
mechanism of the agent. Scripting languages are commonly used in the procedural
approach. Some of the most common scripting languages employed are JAVA, TCL,
Applescript and Telescript.
The major disadvantage of procedural approach is the necessity of providing
information on the recipient agent, which in most cases is not known or only partially
known. In case of making a wrong model assumption, the procedural approach may
have a destructive effect on the performance of the agents. The second major concern
is with the merging of shared procedural scripts into a relevant single large executable
script for the agent. Owing to these disadvantages, the procedural approach is not the
preferred method for designing agent communication language.
32
In the declarative approach, the agent communication language is designed and based
on the sharing of declarative statements that specifies definitions, assumptions,
assertions, axioms etc. For the proper design of an ACL using a declarative approach,
the declarative statements must be sufficiently expressive to encompass the use of a
wide-variety of information. This would increase the scope of the agent system and
also avoid the necessity of using specialized methods to pass certain functions. The
declarative statements must be short and precise. An increase in the length of the
declarative statements affects the cost of communication between agents and also
increases the probability of information corruption. The declarative statements also
needs to be simple enough to avoid the use of a high level language to code. To meet
all the above requirements of the declarative approach based ACL, the ARPA
knowledge sharing effort had devised an agent communication language.
The ACL designed consists of three parts [49]: A Vocabulary part, "Inner language"
and "Outer language". The Inner language is responsible for the translation of the
communication information into a logical form that is understood by all agents. There
is still no consensus on a single language and many inner language representations
like KIF (Knowledge Interchange Format)[50], KRSL, LOOM are available. The
linguistic representation created by these inner languages are concise, unambiguous
and context-dependent. The receivers must derive from them the original logical
form. For each linguistic representation, ACL maintains a large vocabulary
repository. A good ACL maintains this repository open-ended so that modifications
and additions can be made to include increased functionality. The repository must
also maintain multiple ontology‟s and its usage will depends on the application
domain.
33
Knowledge Interchange Format [51] is one of the best known inner languages and it is
an extension of the First-Order Predicate Calculus (FOPC). Some of the information
that can be encoded using KIF are simple data, constraints, negations, disjunctions,
rules, meta-level information that aids in the final decision process. It is not possible
to use just the KIF for information exchange as much implicit information needs to be
embedded. This is so that the receiving agent can interpret it with a minimal
knowledge of the sender's structure. This is difficult to achieve as the packet size
grows with the increase in embedded information. To overcome this bottleneck, a
high level language that utilizes the inner language as its backbone were introduced.
These high-level languages make the information exchange independent of the
content syntax and ontology. One well known Outer language that satisfies this
category is the KQML (Knowledge Query and Manipulation Language) [52]. A
typical information exchange between two agents utilizing the KQML and KIF agent
communication language is as follows.
(ask :Content (geolocation lax(?long ?lat))
: language KIF
:ontology STD_GEO
: from location_agent
: to location_server
: label Query- "Query identifier")
(tell : content "geolocation(lax, [55.33,45.56])"
: language standard_prolog
34
: ontology STD_GEO)
The KQML is conceived as both message format and message handling protocol to
facilitate smooth communication between agents. From the above example provided,
it can be seen that KQML consists of three layers (Figure 2.9): A communication
layer which indicates the origin and destination agent information and query label or
identifier, a message layer that specifies the function to be performed (eg: In the
example provided, the first agent asks for the geographic location and the second
agent replies to the query), and a content layer to provide the necessary details to
perform the specific query.
Communication layer
Message Layer
Content
Layer
Figure 2.9. KQML - Layered language structure
In KQML, the communication layer is at a low level and packet oriented. A stream
oriented approach is yet to be developed. The communication streams could be built
on TCP/IP, RDP, UDP or any other packet communication media. The content layer
specifies the language to be employed by the agent. It should be noted that agents can
use different languages to communicate with each other and interpretation can be
performed locally using higher level languages.
35
2.5
DECISION MAKING IN MULTI-AGENT SYSTEM
Multi-agent decision making is different from a simple single agent decision system.
The uncertainty associated with the effects of a specific action on the environment
and the dynamic variation in the environment as a result of the action of other agents
makes multi-agent decision making a difficult task. Usually the decision making in
MAS is considered as a methodology to find a joint action or the equilibrium point
which maximizes the reward received by every agent participating in decision making
process. The decision making in MAS can be typically modelled as a game theoretic
method. Strategic game is the most simplest form of decision making process. Here
every agent chooses its actions at the beginning of the game and the simultaneous
execution of the chosen action by all agents.
A strategic game [53] consists of a set of players - in multi-agent scenario, the agents
are assumed to be the players.
For each player, there is a set of actions
For each player, the preferences over a set of action profiles
There is a payoff associated with each of the combination of action values for the
participating players. The payoff function is assumed to be predefined and known in
the case of a simple strategic game. It is also assumed that the actions of all agents are
observable and is a common knowledge available to all agents. A solution to a
specific game is the prediction of the outcome of the game making the assumption
that all participating agents are rational.
36
The prisoner's dilemma is a best case for demonstrating the application of game
theory in decision making involving multiple agents. The prisoner's dilemma problem
can be states as :
“Two suspects involved in the same crime are interrogated independently. If both the
prisoner's confess to the crime, each of them will end up spending three years in
prison. If only one of the prisoner confesses to the crime, the confessor is free while
the other person will spend four years in prison. If they both do not confess to the
crime, each will spend a year in prison.”
This scenario can be represented as a strategic game.
Players Two suspects involved in the crime
Actions Each agent's set of actions is {Not confess, confess}
Preferences Ordering of the action profile for agent 1, from best to worst case
scenario, is {confess, Not confess}, {Not Confess, Not confess}, {Confess,
Confess} and {Not confess, Confess}. Similar ordering could be performed by
agent 2.
A payoff matrix that represents the particular preferences of the agents needs to be
created. Simple payoff matrix can be
u1{Confess, Not confess} =3,
u1{Not confess, Not confess}=2.
u1{Confess, Confess}=1,
u1{Not confess, confess}=0.
37
Similarly the utility or payoff for agent 2 can be represented as
u2{Not confess, confess}=3,
u2(Not confess, Not confess}=2,
u2{confess, Not confess}=0
u2{confess, confess}=1.
The reward or payoff received by each agent for choosing a specific joint action can
be represented in a matrix format called as payoff matrix table. The problem depicts a
scenario where the agents can gain if they cooperate with each other but there is also a
possibility to be free if a confession is made. The particular problem can be
represented as a payoff matrix as shown in Figure 2.10. In this case it can be seen that
the solution "Not confess" is strictly dominating. By strictly dominating solution, it
means that a specific action of an agent always increases the payoff of the agent
irrespective of the other agents actions.
Agent 1
Not Confess
Confess
Not Confess
2,2
0,3
Confess
3,0
1,1
Agent 2
Figure 2.10. Payoff matrix for the Prisoner's Dilemma Problem
38
However, there can be variations to the prisoner's dilemma problem by introducing an
altruistic preference while still calculating the payoff of the actions. Under this
circumstance, there is no action strictly dominated by the other.
2.5.1
Nash equilibrium
To obtain the best solution based on the constructed payoff matrix, the most common
method employed is the Nash Equilibrium [54], which can be stated as follows:
A Nash Equilibrium is an action profile a* with the property that no player i can do
better by choosing an action different from a* of i, given that every other player
adheres to a* of j.
In the most idealistic conditions, where the components of the game are drawn
randomly from a collection of populations or agents, a Nash equilibrium corresponds
to a steady state value. In a strategic game, there always exists a Nash equilibrium but
it is not necessarily a unique solution. Examining the payoff matrix in Figure 2.11
shows that {confess, confess} is the Nash equilibrium for the particular problem. The
action pair {confess, confess} is a Nash equilibrium because given that agent 2
chooses to confess, agent 1 is better off choosing confess than Non confess. By a
similar argument with respect to agent 2 it can be concluded that {confess, confess} is
a Nash Equilibrium. In particular, the incentive to have a free ride on confession
eliminates any possibility of selecting mutually desirable outcome of the type {Not
Confess, Not Confess}. If the payoff matrix could be modified to add value based on
the trust or reward to create altruistic behaviour and feeling of indignation, then the
subtle balance that exists shifts and the problem would have a multiple number of
Nash equilibrium points as shown in Fig.2.11.
39
Agent 1
Not Confess
Not Confess
Confess
2,2
-2,-1
-1,-2
1,1
Agent 2
Confess
Figure 2.11. Modified Payoff matrix for the Prisoner's Dilemma Problem
In modified prisoner‟s dilemma problem, a single action does not always dominate
and multiple equilibrium points exist. To obtain a solution for this type of problem,
coordination between the agents is an essential requirement.
2.5.2 The Iterated elimination method
The solution to the Prisoner's dilemma problem can also be obtained by using the
iterated elimination method [55]. In this method, the strongly dominating actions are
iteratively eliminated until no more actions are strictly dominating. The iterated
elimination method assumes that all agents exhibit a rational behaviour and will not
choose a strictly dominant solution. This method is weaker than the Nash equilibrium
as it finds the solution by means of a algorithm. Iterated elimination method fails
when there are no strictly dominant actions available in the solution space. This limits
the applicability of the method in multi-agent scenario where mostly weaklydominant actions are encountered.
2.6
COORDINATION IN MULTI-AGENT SYSTEM
Coordination is the central issue in the design of multi-agent systems. Agents are
seldom stand-alone systems and usually involve more than one agent working in
parallel to achieve a common goal. When multiple agents are employed to achieve a
40
goal, it is necessary to coordinate or synchronize their actions to ensure the stability of
the system. Coordination between agents increases the chances of attaining a optimal
global solution. In [56], major reasons necessitating coordination between the agents
were highlighted. The requirements are
To prevent chaos and anarchy
To meet global constraints
To utilize distributed resources, expertise and information
To prevent conflicts between agents
To improve the overall efficiency of the system
Coordination can be achieved by applying constraints on the joint action choices of
each agent or by utilizing the information collated from neighbouring agents. These
are used to compute the equilibrium action point that could effectively enhance the
utility of all the participating agents. Applying constraints on the joint actions requires
an extensive knowledge of the application domain. This may not be readily available.
It necessitates the selection of the proper action taken by each agent based on the
equilibrium point computed. However, the payoff matrix necessary to compute the
utility value of all action choices might be difficult to determine. The dimension of
the payoff matrix grows exponentially with an increase in the number of agents and
the available action choices. This may create a bottleneck when computing the
optimal solution.
The problem of dimensional explosion can be solved by dividing the game into a
number of sub-games that can be more effectively solved. A simple mechanism which
41
can reduce the number of action choices is to apply constraints or assign roles to each
agent. Once a specific role is assigned, the number of permitted action choices is
reduced and simplifies the computation of payoff matrix. This approach is of
particular importance in a distributed coordination mechanism. However, in
centralized coordination techniques this is not a major concern as it is possible to
construct belief models for all agents. The payoff matrix can be computed centrally
and communicated to all of the agents as a shared resource.
Centralized coordination is adopted from the basic client/server model of
coordination. Most of the centralized coordination techniques uses blackboards as a
way in which to exchange information. Master agent schedules of all the connected
agents are required to read and write information from and to the central information
repository. Some of the commonly adopted client/server models are KASBAH[57]
and MAGMA[58]. The model uses a global blackboard to achieve the required
coordination. Disadvantage in using the centralized coordination is the disintegration
of the system due to single point failure of the repository or the mediating agent.
Further, use of the centralized coordination technique is contradictory to the basic
assumption of DAI[56].
2.6.1 Coordination through protocol
A classic coordination technique among agents in a distributed architecture is through
the communication protocol. Protocol is usually written in high level language and
specifies the method of coordination between the agents as a series of task and
resource allocation methods. The most widely used protocol is the Contract Net
Protocol [59] which facilitates the use of distributed control for cooperative task
execution. The protocol specifies the information to be communicated between the
42
agents and the format of information dissemination. A low-level communication
language such as KIF that can handle the communication streams is assumed to be
available. The protocol engages in negotiation between the agents to arrive at an
appropriate solution. The negotiation process must adhere to the following
characteristics
Negotiation is a local process between agents and it involves no central control
Two way communication between all participating agents exists
Each agent makes its evaluation based on its own perception of the
environment
The final agreement is made through a mutual selection of the action plan
Each agent assumes the role of Manager and Contractor as necessary. The manager
essentially serves to break a larger problem into smaller sub-problems and finds
contractors that can perform these functions effectively. A contractor can become a
manager and decompose the sub-problem so as to reduce the computational cost and
increase efficiency. The manager contracts with a contractor through a process of
bidding. In the bidding process, the manager specifies the type of resource required
and a description of the problem to be solved. Agents that are free or idle and have the
resources required to perform the operation submits a bid indicating their capabilities.
The manager agent then evaluates the received bids, chooses an appropriate contractor
agent and awards the contract. In case of non-availability of any suitable contracting
agent, the manager agent waits for a pre-specified period before rebroadcasting the
contract to all agents. The contracting agent may negotiate with the manager agent
seeking an access to a particular resource as a condition before accepting the contract.
43
The FIPA model [60] is the best example of an agent platform that utilizes the
contract net protocol to achieve coordination in between the agents.
FIPA -
Foundation for Intelligent Physical Agents is a model developed to standardize agent
technology. The FIPA has its own ACL (Agent Communication Language) that
serves as the backbone for the high-level contract net protocol.
Disadvantage of the protocol based coordination is the assumption of the existence of
a cooperative agents. The negotiation strategy is passive and does not involve any
punitive measures to force an agent to adopt a specific strategy. Usually a common
strategy is achieved through iterative communication, where the negotiation
parameters are modified progressively to achieve equilibrium. This makes the
contract net protocol to be communication intensive.
2.6.2 Coordination via graphs
Coordination graphs were introduced in [61] to serve as a framework to solve large
scale distributed coordination problems. In coordination graphs, each problem is subdivided into smaller problems that are easier to solve. The main assumption with
coordination graphs is that the payoffs can be expressed as a linear combination of the
local payoffs of the sub-game. Based on this assumption, algorithm such as variable
elimination method can compute the optimal joint actions by iteratively eliminating
agents and creating new conditional functions that compute the maximal value the
agent can achieve given the action of other agents on which it depends. The joint
action choice is only known after the completion of the entire computation process,
which scales with the increase in agents and available action choices and is of concern
in time critical processes. An alternate method using max-plus which reduces the
44
computation time required was used in [58]. This was to achieve coordination in
multi-agent system when applied to urban traffic signal control.
2.6.3 Coordination through belief models
In scenarios where time is of critical importance, coordination through protocols fail
to succeed when an agent with a specific resource to solve the sub-problem rejects the
bid. In such scenarios, agents with an internal belief model of the neighbouring agents
could solve the problem. The internal belief model could be either evolved by
observing the variation in the dynamics of the environment or developed based on
heuristic knowledge and domain expertise. When the internal model is evolved, the
agent has to be intelligent enough to differentiate between the change in its
environment due to the actions of other agents and natural variations occurring in the
environment. In [28], a heuristics based belief model has been employed to create
coordination between agents and to effectively change the green time. In [62],
evolutionary methods combined with neural networks have been employed to
dynamically compute the level of cooperation required between the agents. This is
based on the internal state model of the agents. The internal state model was updated
using reinforcement learning methods. A disadvantage using the coordination based
on belief model for the agents is an incorrect model could cause chaos due to the
actions selected.
2.7
LEARNING IN MULTI-AGENT SYSTEM
The learning of an agent can be defined as building or modifying the belief structure
based on the knowledge base, input information available and the consequences or
actions needed to achieve the local goal [63]. Based on the above definition, agent
learning can be classified into three types.
45
Active learning
Reactive learning
Learning based on consequence
In active and reactive learning, the update of the belief part of the agent is given
preference over an optimal action selection strategy as a better belief model could
increase the probability of the selection of an appropriate action.
2.7.1 Active learning
Active learning can be described as a process of analysing the observations to create a
belief or internal model of the corresponding situated agent's environment. The active
learning process can be performed by using a deductive, inductive or probabilistic
reasoning approach.
In the deductive learning approach, the agent draws a deductive inference to explain a
particular instance or state-action sequence using its knowledge base. Since the result
learned is implied or deduced from the original knowledge base which already exists,
the information learnt by each agent is not a new but useful inference. The local goal
of each agent could form a part of the knowledge base. In the deductive learning
approach, the uncertainty or the inconsistency associated with the agent knowledge is
usually disregarded. This makes it not suitable for real-time applications.
In inductive learning approach, the agent learns from observations of state-action
pair. These viewed as the instantiation of some underlying general rules or theories
without the aid of a teacher or a reference model. Inductive learning is effective when
the environment can be presented in terms of some generalized statements. Well
46
known inductive learning approaches utilize the correlation between the observations
and the final action space to create the internal state model of the agent. The
functionality of inductive learning may be enhanced if the knowledge base is used as
a supplement to infer the state model. The inductive learning approach suffers at the
beginning of operation as statistically significant data pertaining to the agent may not
be available.
The probabilistic learning approach is based on the assumption that the agent
knowledge base or the belief model can be represented as probabilities of occurrence
of events. The agent's observation of the environment is used to predict the internal
state of the agent. One of the best examples of probabilistic learning is that of the
Bayesian theorem. According to the Bayesian theorem, the posterior probability of an
event can be determined by the prior probability of that event and the likelihood of its
occurrence. The likelihood probability can be calculated based on observations of the
samples collected from the environment and prior probability can be updated using
the posterior probability calculated in the previous time step of the learning process.
In a multi-agent scenario where the action of one agent influences the state of other
agent, the application of using the probabilistic learning approach is difficult. This
stems from the major knowledge requirement of the joint probability of actions and
state space of different agents. With an increase in the number of agents, it is difficult
in practice to calculate and infeasible computationally. The other limitation is the
limited number of the sample observations available to estimate the correct trajectory.
2.7.2 Reactive learning
The process of updating a belief without having the actual knowledge of what needs
to be learnt or observed is called Reactive Learning. This method is particularly useful
47
when the underlying model of the agent or the environment is not clearly known.
Reactive learning can be seen in agents which utilize connectionist systems such as
neural networks. Neural networks depend on the mechanism which maps the inputs to
output data samples using inter-connected computational layers. Learning is
performed by adjustment of synaptic weights between the layers. In [64], reactive
multi-agent based feed forward neural networks have been used and its application to
the identification of non-linear dynamic system have been demonstrated. In [65]
many other reactive learning methods such as accidental learning, go-with-the-flow,
channel multiplexing and a shopping around approach have been discussed. Most of
these methods are rarely employed in a real application environment as most of them
are designed exclusively to a specific application domain.
2.7.3 Learning based on consequences
Learning methods presented in the previous sections were concerned with
understanding the environment based on the belief model update and analysis of
patterns in sample observations. This section will deal with the learning methods
based on the evaluation of the goodness of selected action. This may be performed by
reinforcement learning methods.
Reinforcement learning is a way of programming the agents using reward and
punishment scalar signals without specifying how the task is to be achieved. In
reinforcement learning, the behaviour of the agent is learnt through trial and error
interaction with the dynamic environment without an external teacher or supervisor
that knows the right solution. Conventionally, reinforcement learning methods are
used when the action space is small and discrete. Recent developments in
reinforcement learning techniques have made it possible to use the methods in
48
continuous and large state-action space scenarios too. Examples of applications using
reinforcement learning techniques in reactive agents are given in [66, 67].
In reinforcement learning [68], the agent attempts to maximize the discounted scalar
reward received from the environment over a finite period of time. To represent this,
an agent is represented as a Markov Decision Process.
A discrete number of states s S
A discrete set of actions a A
State transition probability p(s' | s,a)
Reward function R : SXA °
The reward function can be written as R t R(st ,at ) . The objective is to
t 0
maximize this function for a given policy function. A policy is a mapping from the
state to the action values. The optimal value of a specific state s can be defined as the
maximum discounted future reward which is received by following a specific
stationary policy and can be written as
V * (s) max E t R(st ,at ) | s0 s,at (s)
t 0
(2.1)
The expectation operator averages the transition values. In a similar manner the Q
value can be written as
Q * (s,a) max E t R(st ,at ) | s0 s,a0 a
t 0
(2.2)
49
The optimal policy can then be determined as arg max of the Q-value. To compute the
optimal value function and the Q-value, the Bellman equation (2.3) and (2.4) is used.
The solution to Bellman equation can be obtained by recursively computing the
values using dynamic programming techniques. However, the transition probability
values are difficult to obtain. Therefore the solution is obtained iteratively by using
the temporal difference error between the value of successive iterations as shown in
(2.5) and (2.6).
V * (s) max R(s, a) p(s' | s, a)V * (s')
s'
Q * (s,a) R(s,a) p(s' | s,a)V (s')
s'
(2.3)
(2.4)
V (s) V (s) r V (s') V (s)
(2.5)
Q(st ,at ) Q(s,a) r max Q(s',a') Q(s,a)
a'
(2.6)
The solution to (6) is referred to as the q-learning method. The Q-value computed for
each state action pair is stored in Q-map and is used to update the Q-values. Based on
the Q-values, the appropriate actions are selected. The major disadvantage is that the
exploration and exploitation trade-off must be determined. To build an efficient Qmap, it is essential to compute the Q-values corresponding to all the state-action pair.
The convergence is guaranteed if all the state-action pairs have been visited infinite
number of times (theoretical).
In single agent RL, the convergence and methodologies are well defined and proven.
In a distributed MAS, the reinforcement learning method faces the problem of
50
combinatorial explosion with increase in the state and action space. Another major
concern is that the information must be passed between the agents for effective
learning. In [69], distributed value function based on RL has been described. The
value functions are shared among the agents thereby increasing the global view
available to each agent. For effective distributed learning through shared value
function, all agents must have similar architecture and the inputs and state space are
similar for all agents. A complete survey of reinforcement learning can be found in
[70] .
2.8
SUMMARY
In this chapter, a detailed survey of the existing agent architectures, taxonomy,
communication requirements, coordination mechanism, decision making and learning
in multi-agent systems applied to wide range of applications have been presented. The
insights derived from the survey of the existing design methodology will be useful in
conceptualizing and implementing an effective distributed multi-agent system for
complex urban traffic signal control problem.
51
CHAPTER 3
REVIEW OF ADVANCED SIGNAL CONTROL
TECHNIQUES
In order to construct an efficient distributed multi-agent based traffic signal control
system, it is essential to review the existing traffic control methods, their advantages
and disadvantages. This chapter presents a classification of the existing traffic signal
control methods and provides a detailed review of the working of various existing
traffic control methods. The chapter also details the working mechanism of the
benchmark traffic signal controls HMS and GLIDE used for evaluating the
performance of the proposed multi-agent based traffic signal controls.
3.1
CLASSIFICATION OF TRAFFIC SIGNAL CONTROL METHODS
Traffic signal controls can be classified into three types based on the type of
architecture used to obtain the green time required for each phase in a cycle.
3.1.1
Fixed-time control
Traffic Actuated control
Traffic Adaptive control
Fixed-time control
In fixed-time control, the duration and the order of all green phases are pre-fixed.
Fixed time control assumes that the traffic patterns can be predicted accurately based
52
on historical data. The traffic situation changes over time and usually employs a clock
to replace one fixed-time control plan with another. As fixed-time controllers do not
require traffic detectors to be installed at the intersection, the construction cost is
much lower than with traffic actuated and traffic-adaptive control. The main
drawback of fixed-time control is that it is not capable of adapting itself to the real
time traffic patterns as it is based only on historical data. Historical data is often not
representative of the current situation as:
Traffic arrives at the intersection randomly, which makes it impossible to
predict the traffic demand accurately.
Traffic demand changes and shifts over a long period of time leading to
“aging” of the optimized settings.
Traffic demand may change due to drivers‟ response to the new optimized
signal settings.
Events, accidents, and other disturbances may disrupt traffic conditions in a
non-predictable way.
In fixed-time control, the signal cycle is divided over the various phases according to
historical traffic volumes. As a consequence of the time needed to clear the
intersection when changing phases and for traffic to start-up, a fixed amount of time
during the signal cycle can be considered lost and cannot be effectively used for
traffic flow. The amount of time lost (per hour) increases when the duration of the
signal cycle is chosen shorter. Intersections with a shorter signal cycle therefore
would have a lower overall capacity. However, longer signal cycles also lead to
53
longer waiting times and longer queues leading to saturation of the links. This
necessitates fixing the maximum and minimum bounds for green time and cycle
length. In order to find an optimal value for the cycle and the green durations of the
separate phases, Webster had derived the formula utilizing the flow rate experienced
at each lane of the link. Based on how the green time is computed, the fixed-time
control can be classified into two types:
Progression based methods : Maximizes the bandwidth of the progression –
PASSER (Progression Analysis and Signal System Evaluation Routine)[71],
MAXBAND[72] and MULTIBAND[73]
Disutility based methods : Minimizes a performance measures like overall
travel time delay and number of stops – TRANSYT-7F[74] and
SYNCHRO[75]
3.1.2
Traffic actuated control
For vehicle-actuated and traffic-actuated control, detectors are needed to get
information about the actual traffic situation. The detectors that are used most
frequently in practice are inductive loop detectors. In order to decide whether it is
efficient to terminate the green phase, the traffic-actuated controller should be able to
determine whether the last vehicle of the queue that has build up at the stop line
during the red phase has passed. This is performed by measuring the gap between the
vehicles. If the gap between vehicles is larger than a threshold maximum gap, the
control program decides to terminate the green phase. Additionally many trafficactuated controllers also extend the green time to ensure that the green phase is
terminated safely and in a comfortable manner.
54
These extensions continue until the intervals between vehicles are long enough for the
signals to decide that it would be more efficient to terminate the current green phase
or until a pre-specified maximum green time has been reached. There are four major
regions that needs to be monitored - Zone 1, Zone 2, Option zone and Comfort zone.
Zone 1 and 2 are very close to the stop line intersection. Zone 1 is a 3m region close
to the stop line intersection followed by Zone 2 which extends for a further 20m
beyond the first vehicle after the Zone 1. Option zone is the overlapping region
extending beyond zone 2. If the vehicles are present within this region, the green time
of the phase can be effectively terminated. In this region, the traffic flow is
considerably slower and varies with traffic conditions. Comfort zone extends beyond
the option zone where the vehicle flow is steady and is not influenced much unless
there is a queue build-up. Presence of vehicles in any of these regions can cause the
extension of the green time. However, the priority is different in each region.
Coordination between traffic-actuated controllers is achieved on the basis of the same
principles through which coordination between fixed-time controllers is achieved. In
order to ensure that traffic-actuated controllers return to the coordinated phase in time
a mechanism must be in place to force non-coordinated phases to terminate. Two
types of force-off modes are used : floating and fixed force-offs. The primary
difference in these modes is in the manner the excess time from one non-coordinated
phase is used by another non-coordinated phase. The non-coordinated phases can gap
out if they have detectors and are operated in an actuated manner. A force-off point
for each non-coordinated phase is the point in the cycle where the respective phase
must terminate to ensure that the controller returns to the coordinated phase at the
proper time in the cycle[76].
55
Floating force-off : In floating force-off mode, the duration of the noncoordinated phases is limited to the splits that were programmed in the
controller. As a consequence floating force-off does not allow for any time
from phases with excess capacity to be used by a phase with excess demand.
This means that phases that are allowed to start earlier as a consequence of an
excess of capacity on phases earlier in the cycle will be forced to terminate
before their force-off point in the cycle. This results in an early return to the
coordinated phases. Let us take an example of a four phase signal. If the green
required for the phase 1 is much lower than the vehicle inflow capacity, the
green time of phase 1 is terminated after the minimum force-off period and the
remaining green time available is not available to other phases. The second
phase starts earlier than the starting period and continues recursively for all the
phases. The second cycle begins much earlier if there is some amount of green
time greater than the maximum cycle length allowable.
Fixed force-off : Fixed force-off, on the other hand, allows the transfer of
excess capacity from one phase to a subsequent phase with excess demand.
This means that phases with excess demand will terminate at the force-off
point irrespective of when the phase starts. The controller only allows the use
of excess unused capacity and ensures that coordinated operations are not
disrupted.
Some of the advantages and disadvantages of fixed force-off are :
Fixed force-off allows better utilization of the time available from phases
operating below capacity by phases having excess demand that varies in a
56
cyclic manner. This is the case when phases earlier in the phasing sequence
operate below capacity more often than phases later in the phasing sequence.
Fixed force-off minimizes the early return to coordinated phases, which can
be helpful in a network that has closely spaced intersections. An early return to
the coordinated phase at a signal can cause the platoon to start early and reach
the downstream signal before the onset of the coordinated phase, resulting in
poor vehicle progression.
Fixed force-off minimizes the early return to the coordinated phase, which is
a major disadvantage. Under congested conditions on arterial roads, an early
return can result in the queue clearance for coordinated phases. Minimizing
early return to coordinated phases can cause significant disruption to
coordinated operations and in the dispersion of the platoons. This
disadvantage can be overcome by adjusting the splits and/or offsets at the
intersection to minimize disruption.
Overall, fixed force-off has the potential to improve signal operations by better
utilization of any excess capacity. However, fixed force-off will only benefit if the
phases that are more likely to be below capacity are earlier in the phasing sequence
and allows the maximum utilization of the maximum available green time.
3.1.3
Traffic adaptive control
Traffic load is highly dependent on parameters such as time, day, season, weather,
and unpredictable situations such as accidents, special events, or construction
activities. These factors are taken into account by a traffic-adaptive control system, so
57
that bottlenecks and delays can be prevented. Adaptive traffic control systems
continuously sense and monitor traffic conditions and adjust the timing of traffic
signals accordingly. Adaptive systems, like SCOOT (Split, Cycle and Offset
Optimization Technique) and SCATS (Sydney Coordinated Adaptive Traffic
System), have been around since the mid 70‟s, and have proven their worth in various
places around the world. Using real-time traffic information, an adaptive system can
continuously update signal timings to fit the current traffic demand. The aging of
traffic signal plans, with a gradual degradation of performance as traffic patterns drift
away from those in place during implementation, is well documented [77]. Many
agencies have no program for monitoring the applicability of signal timing to the
current traffic patterns, and it is not uncommon to find agencies that have not re-timed
their signals in years. The benefits of an adaptive signal control system are apparent,
since both traffic operations and staff can be made more efficient since a better
performance can be gained with the same level of effort [78].
Adaptive traffic control systems are often categorized according to their generation.
First-generation traffic-adaptive systems employ a library of pre-stored signal control
plans, which are developed off-line on the basis of historical traffic data. Plans are
selected on the basis of the time of day and the day of the week, directly by the
operator, or by matching from an existing library a plan best suitable for recently
measured traffic conditions. First generation traffic-adaptive systems are often
referred to as traffic-responsive signal control. A limitation of traffic-responsive
signal control is that by the time the system responds, the registered traffic conditions
that triggered the response may have become obsolete or could have varied
significantly. Second-generation traffic-adaptive systems uses an on-line strategy that
implements signal timing plans based on real-time surveillance data and predicted
58
values. The optimization process can be repeated every five to fifteen minutes .
However, to avoid transition disturbances, new timing plans cannot be implemented
more than once every 10 minutes.
Third generation traffic-adaptive systems are similar to the second-generation
systems, but differ with respect to the frequency with which the signal timing plans
are revised. The third generation of control allows the parameters of the signal plans
to change continuously in response to real-time measurement of traffic variables,
which allows for “acyclic” operation. Generally speaking, given time-varying
unpredictable demand patterns, a traffic-adaptive system should be able to outperform
a fixed time or actuated system. The margin of improvement demonstrated by a
traffic-adaptive system over a fixed-time or traffic-actuated system cannot be
compared easily to that determined for another adaptive system as it is strongly
related to the network geometry and traffic demand chosen in the benchmark study.
For a fair comparison, the systems should be benchmarked using the same test
environment and an equal amount of effort should be put in the optimization of the
different systems by people that are knowledgeable. The systems described in this
section are systems from the, proven second generation (SCATS/GLIDE, SCOOT,
MOTION) and from the, younger, third generation (OPAC, PRODYN, RHODES,
UTOPIA/SPOT, TUC, HMS).
3.1.3a
SCATS/GLIDE
SCATS (Sydney Coordinated Adaptive Traffic System) [79] was developed in the
early 1970‟s by the Roads and Traffic Authority of New South Wales, Australia. The
system utilizes a distributed, three-level, hierarchical system employing a central
59
computer, regional computers, and local intelligent controllers to perform a largescale network control. The regional computer can execute adaptive control strategies
without any aid from the central computer, which only monitors the system
performance and equipment status. The control structure enables SCATS to expand
easily and suitably for controlling any size of traffic area.
SCATS employs a strategic optimization algorithm and a tactical control technique to
perform system-wide optimization. The optimization philosophy contains four major
modules:
Cycle length optimizer
Split optimizer
Internal offset optimizer
linking offset optimizer
SCATS selects combinations of cycle, splits and offset from predetermined sets of
parameters with few on-line calculations. Maximum freedom consistent with good
coordination is given to local controllers to act in the traffic-actuated mode. The
system is designed to automatically calibrate itself on the basis of data received,
minimizing the need for manual calibration and adjustment. For control purposes, the
total system is divided into a large number of comparatively small subsystems varying
from one to ten intersections. As far as possible, the subsystems are chosen so that
they can be run independently for many traffic conditions. For each subsystem,
minimum, maximum, and geometrically optimum cycle lengths are specified.
60
To coordinate larger groups of signals, subsystems can link together to form larger
systems, operating on a common cycle length. Linking plans manage the linking
between subsystems. When a number of subsystems are linked together, the cycle
time becomes a linked subsystem with the longest cycle time. The combination of
subsystem plans, link plans between subsystems, variable cycle length, and variation
of offsets provides an infinite number of operating plans. Four background plans are
also stored in the database for each subsystem. The cycle length and the appropriate
plan are selected independently of each other to meet the traffic demand. For this
purpose, a number of detectors in the subsystem area are defined as strategic
detectors; these are stop-line detectors at key intersections. Various system factors are
calculated from the strategic detector data, which are used to decide whether the
current cycle and plan should remain the same or be changed. Strategic options,
minimum delay, minimum stops, or maximum throughput can be selected for the
operation. These options can be permanent or dynamically changed at threshold levels
of traffic activity. Four modes of operation are included in SCATS:
Masterlink Operation : This is the normal mode of operation which
provides integrated traffic-responsive operation. There are two levels of
control in this mode: strategic and tactical. Strategic control determines the
best signal timings for the areas and subareas, based on average prevailing
traffic conditions. Tactical control is concerned with the control of the
individual intersections within the constraints imposed by the strategic
control. This lower-level control deals with termination of the unnecessary
green phases when the demand is below the average. The basic traffic
measurement used by SCATS for strategic control is a measure analogous
61
to the degree of saturation on each approach. This measure is used to
determine cycle length, splits, and the direction and magnitude of offset.
Flexilink Operation : In the event of failure of a regional computer or loss
of communications, the local controllers can revert to a form of time-based
coordination. In this mode, adjacent signals are synchronized by reference
to the power mains frequency or an accurate clock, and signal timing plans
are selected by time-of-day. The local controller operates under a vehicleactuated or a fixed-time control system.
Isolated Operation : In this mode, the controller operates under
independent vehicle actuation or a fixed-time control system.
Flash Operation : This is a manual mode in which normal automatic
operation is overridden. It incorporates flashing yellow display for the
major approaches and flashing red display for the minor approaches.
SCATS has also been widely used in several cities in Australia, New Zealand, USA,
China, Singapore, Philippines, and Ireland. GLIDE (Green Link Determining) is a
version of SCATS adapted according to the traffic network structure and requirements
in Singapore.
3.1.3b
SCOOT
SCOOT (Split, Cycle, and Offset Optimization Technique) [80, 81] was initiated by
the British Transport and Road Research Laboratory (TRRL) in the 1970‟s, with its
first commercial system installed in 1980. SCOOT is a centralized system based on a
62
traffic model with an optimization algorithm adapted for on-line application.
Optimization takes place by incrementally updating a fixed-time plan. The benefit of
this approach is that changes are gradual. The transition is less disruptive and less
prone to overreacting than the transition between distinct plans as is typical in a timeof-day scheme. SCOOT performs optimization at three levels: Split, Cycle and Offset.
SCOOT measures vehicles with a detector at the upstream location of the stop line.
SCOOT predicts the number of vehicles arriving at the intersection based on the
updated flow information collected by the upstream detectors. Difference between the
predicted arrival count and the actual departure count values gives the number of
vehicles in queue. The predicted flow profile and traffic count are estimated for each
cycle from a combination of the vehicles approaching, the time to clear the queue, the
impact of offset and split adjustment. The split optimizer in SCOOT evaluates the
projected arrival and departure profiles every second. A few seconds before the
change of signals, the system adds the delay from all movements that will end or
begin at that change of signals. This delay is compared with the delay calculated for
the change of signals occurring either a few seconds earlier or later. The best balance
of movement that provides the least delay is implemented.
The offset optimizer operates on each node pair and searches for the best offset timing
to improve traffic progression on the basis of the cyclic profile. Based on the profile
measured in the previous cycle, the offset optimizer minimizes the delay for all
movements of the intersection by incrementing or decrementing the current offset
with a few seconds. With the offset optimizer in the SCOOT systems, green waves
can be imposed along the coordinated signal controlled corridor. After this offset
63
adjustment, the split optimizer may further adjust the signal timings based on traffic
actually approaching the stop line at that time.
The cycle optimizer looks at the saturation levels of all intersection movements once
each cycle-control period. At critical intersections with low reserve capacity, the cycle
optimizer will extend the length of the cycle. It does so in different increments of time
(e.g., 4, 8, or 16 seconds) depending on the current cycle length. If an intersection is
operating below capacity, the cycle optimizer will reduce the length of the cycle.
SCOOT has been widely used in several cities in UK, USA, Canada, China, South
Africa, Cyprus, Pakistan, United Arab Emirates, Chile, and Spain.
3.1.3c
MOTION
MOTION (Method for Optimization of Traffic signals In Online-controlled Network)
[82, 83] is a traffic signal control strategy developed by Siemens, Germany. The
system operates on three functional levels: on the strategic level, every 5, 10 or 15
minutes (cycle time, average green time distribution, basic stage sequences and
network coordination); on the tactical level, every 60 to 90 seconds (cycle, current
stage sequence); and on the operational level, every second (green time modification).
Starting with the dominant traffic stream through the network, a grid of green waves
is constructed, taking into account modelled (or if available, measured) platoons in the
links. For each intersection, the optimum sequence of stages is identified, and the
basic split of green times is fixed.
Depending on the remaining spare time per intersection, and on the constraints of the
optimized offsets, a certain amount of bandwidth is available for the subsequent local
optimization. Optimization normally aims at minimizing delays and stops in the
64
network. In the final step the decision is made to change the signal programs at the
intersections. To avoid frequent minor changes, changes are only implemented if
calculation determines a significant improvement in the overall optimization
objective. Depending on the type of local controller and on the local control method
used, the signal programs are then converted and implemented. To avoid severe
disruptions in traffic flow due to the plan switch, a smooth (gliding) transition from
the running to the new plan is performed. Until the next optimization run of the
network model, the local controllers operate on their own and modify their plan
according to the local situation, but always staying within the given bandwidth.
3.1.3d
TUC
TUC (Traffic-responsive Urban Control) [84] employs a store-and-forward based
approach to road traffic control, which introduces a model simplification that enables
the mathematical description of the traffic flow process without the use of discrete
variables. This opens the way to the application of a number of highly efficient
optimization and control methods (such as linear programming, quadratic
programming, nonlinear programming, and multi-variable regulators), which, in turn,
allow for coordinated control of large-scale networks in real-time, even under
saturated traffic conditions.
The critical simplification is introduced when modelling the outflow of a stream
suggests that there is a continuous (uninterrupted) outflow from each network link (as
long as there is sufficient demand). The consequences of this simplification are:
65
The time step t of the discrete-time representation cannot be shorter than the
cycle time C, hence real-time decisions cannot be taken more frequently than
at every cycle.
The oscillations of vehicle queues in the links due to green/red-commutations
are not described by the model.
The effect of offset for consecutive intersections cannot be described by the model.
Despite these consequences, the appropriate use of store-and-forward models may
lead to efficient coordinated control strategies for large-scale networks. The three
main modules of TUC are the split, cycle, and offset control modules that allow for
real-time control of green times, cycle times and offset. The basic methodology
employed for split control by TUC is the formulation of the urban traffic control
problem as a Linear-Quadratic (LQ) optimal control problem based on a store-andforward type of mathematical modelling.
The control objective is to minimize the risk of oversaturation and queue spill-back,
and this is achieved through the appropriate manipulation of the green splits at
signalized junctions for given cycle times and offsets. Longer cycle times typically
increase the capacity of the junction as the proportion of the lost time caused by
switching signals becomes accordingly smaller. Longer cycle times may however
increase vehicle delays at under saturated junctions with longer waiting times during
the red phase. The objective of cycle control is to increase the capacities of the
junctions as much as necessary to limit the maximum observed saturation level in the
network.
66
Within TUC this objective is achieved through the application of a simple feedbackalgorithm that uses the maximum observed saturation levels of a pre-specified
percentage of the network links as the criterion for increase or decrease of the cycle
time. Offset control is achieved through the application of a decentralized feedback
control law that modifies the offsets of the main stages of successive junctions along
arterials, so as to create “green waves” when possible, taking into account the possible
existence of vehicle queues. To implement a new offset in TUC, a transient cycle time
is temporarily implemented at all but the first junction along an arterial road. The
transient cycle time is implemented one single time, after which all the junctions
along the arterial road are coordinated according to the new offset.
3.1.3e
UTOPIA/SPOT
UTOPIA/SPOT (Urban Traffic Optimization by Integrated Automation/Signal
Progression Optimization Technology) [85] is a traffic signal control strategy
developed by Mizar Automazione in Turin, Italy. UTOPIA/SPOT calculates optimal
control strategies for subareas of the network, with each subarea having the same
cycle length. While operating, the system maintains a historical database of measured
flows, turning percentages, saturation flows, and cycles in use.
The system utilizes a distributed, two-level, hierarchical system employing a central,
area-level computer, and intersection, local-level computers to perform large-scale
network control. SPOT is a fully distributed, traffic-adaptive signal control system,
which operates by performing a minimization of local factors such as delays, stops,
excess capacities of links, stops by public or special vehicles, and pedestrian waiting
times. With each repetition, all SPOT units exchange information on the traffic state
67
and preferential policies with their neighbouring SPOT units. This permits the
application of look-ahead (each SPOT unit receives realistic arrival predictions from
upstream intersections) and strong interaction (each controller considers, in the local
optimization, the adverse effects that it could have on downstream intersections). Data
is exchanged with neighbouring intersections every few seconds.
As each SPOT-unit communicates with surrounding units, the system can be
programmed to prioritize public transport and emergency vehicles by giving early
warning of these vehicles or by allowing them to be quickly cleared through the
intersection. SPOT can also prioritize traffic on the basis of adherence to timetables,
number of passengers, etc. SPOT allows a staged system implementation over time
starting with a few intersections. It can be implemented without a central computer for
small systems of typically six intersections or less. However, for larger intersection
networks, the UTOPIA central PC-based control system should be added.
At the area level, the UTOPIA-module provides a mechanism to handle critical
situations in the form of two actions that a signal controller may request of adjacent
signal controllers. Thus, a controller may cope with congestion by requesting that a
downstream signal increase throughput or that an upstream controller decrease
demand. These requests are realized by respectively relaxing or tightening green time
constraints . For the area level, UTOPIA, the model can:
Analyze area-wide traffic data and make predictions for main street flows over
time
Apply its internal macroscopic model to entire area network and traffic counts
68
Optimize the total travel time with constraints of average speed and saturation
flows
SPOT has been used in several cities in Italy, The Netherlands, USA, Sweden,
Norway, Finland, Denmark, and the UK.
3.1.3f
OPAC
The OPAC (Optimized Policies for Adaptive Control) algorithm [86-88] has gone
through several development cycles ranging from OPAC I through OPAC-VFC.
OPAC maintains the specified phase order. For uncongested networks, OPAC uses a
local level of control at the intersection to determine the phase on-line, and a network
level of control for synchronization, which is provided either by fixed-time plans
(obtained off-line), or a “virtual fixed cycle”. A virtual fixed cycle is a cycle that
although fixed between intersections (to enable synchronization), is determined online (hence virtual). Predictions are based on detectors located approximately 10-15
seconds upstream. After the initial 10-15 seconds, a model predicts traffic patterns
(typically 60 seconds).
OPAC breaks the signal optimization problem into sub problems using dynamic
programming, an approach that leads to a more efficient computation. At the same
time it determines a virtual cycle. These are implemented for a time-step (roll period)
of about 2-5 seconds. The length of the virtual cycle is varied according to the needs
of either the critical intersection or the majority of intersections. The virtual cycle is
allowed to change by typically one second per cycle. Within this limitation, OPAC
provides local coordination by considering flows into and out of an intersection in
selecting its offset and phase lengths. The congestion control process in OPAC
69
generally attempts to maximize throughput, by selecting the phase that will allow the
maximum number of vehicles to pass the intersection. OPAC does this by considering
saturation flows and space available to store vehicles on each link. The first step of
congestion control involves determining the next phase given that there is not a
critical link that is on the verge of or currently experiencing spill-back. On the basis of
these calculations, the algorithm determines whether it is necessary to revisit the
timings at neighbouring intersections in light of throughput constraints that their
physical queues impose on each other‟s effective service rates.
OPAC-I assumes an infinite horizon and uses dynamic programming to optimize the
performance index. OPAC-I cannot be implemented on-line in real-time because of
the extensive time required to compute the optimal settings. OPAC II used an optimal
sequential constraint search (OSCO) to calculate the total delay for all possible phase
switching options. The optimal solution was determined as the phase switching that
produces the lowest total delay values, and OPAC-II was found to derive solutions
with performance indexes within 10% of those generated with OPAC-I. Although
OPAC-II was faster than OPAC-I, it still suffered from the need for vehicle arrival
information for the entire planning stage, which was 50-100 seconds in length.
OPAC-III was the first version of OPAC that featured the rolling horizon approach
and was developed at first for a simple two-phase intersection, but later extended to
an eight-phase intersection, which allowed phase skipping. OPACVFC added the
algorithm used to coordinate adjacent signals.
70
3.1.3g
PRODYN
PRODYN (Programmation Dynamique) [89, 90] is a real-time traffic control
algorithm, which has been developed by the Centre d‟Etudes et de Recherches de
Toulouse (CERT), France. PRODYN evolves from two stages of development: twolevel hierarchical control (PRODYN-H) and then decentralized control (PRODYND). The former offers the best result; however, its applicability is restricted due to the
complex computations involved and the network size (limited to about 10
intersections). The latter, on the other hand, alleviates those limitations. Two
approaches have been studied for PRODYN-D: no exchange (PRODYN-D1) versus
exchange (PRODYN-D2) of information between the intersections. At the
intersection level, the optimization model‟s aim is to minimize delay by using
improved forward dynamic programming with constraints on maximum and
minimum greens.
At the network level, the network coordination optimization is performed by a
decentralized control structure. The procedure includes:
Simulating a specific intersection output for each time step as soon as the
intersection controller finishes its optimization over the time horizon
Sending the simulation output to each downstream intersection controller
Using the output message from upstream controllers at the next time step to
forecast arrivals.
3.1.3h
RHODES
RHODES (Real-Time, Hierarchical, Optimized, Distributed, and Effective System)
[91] is a hierarchical control system that uses predictive optimization, allowing
71
intersection and network levels of control. RHODES includes a main controller, a
platoon simulator (APRES-NET [92]), a section optimizer (REALBAND [93]), an
individual vehicle simulator (PREDICT [92]), and a local optimizer (COP [94]).
RHODES requires upstream detectors for each approach to the intersections in the
network. RHODES also can use stop-line detectors to calibrate saturation flow rates
and to improve traffic queue estimates. RHODES is entirely based on dynamic
programming, and it formulates a strategy that makes phase switching decisions based
on vehicle arrival data. The design of RHODES is based on dividing the traffic
control problem into sub problems by use of a network hierarchy. The sub problems
include the network-loading problem, the network flow control problem, and the
intersection control problem.
At the top of the hierarchy is the network-loading problem. At this level, link loads
and the prediction of the trends in the change of loads from real-time data are
estimated. RHODES uses this information pro-actively to predict future platoon sizes
near the boundaries of the system. The middle level consists of the network flow
problem and involves the selection of signal timing to optimize the overall flow of
vehicles in the network. The decisions are made in this level every 200-300 seconds.
A platoon prediction logic model called REALBAND is used at this level. Network
optimization is also established at this level and its results are used as constraints for
the decision made in the next level. The lowest level of the control strategy is the one
at the intersection and it is responsible for making the final second-by-second
decisions regarding traffic signal operation. This level uses two sublevels of logic.
The first is the Link Flow Prediction Logic which uses data from detectors on the
approach of each upstream intersection, together with information on the traffic state
72
and planned phase timings for the upstream intersection, to estimate vehicle arrivals at
the intersection being optimized. The other level is the Controlled Optimization of
Phases (COP), which uses the information from the network flow problem, in addition
to the results from the link prediction logic, to determine whether the current phase
should be extended or terminated.
3.1.3i
Hierarchical multi-agent system (HMS)
The hierarchical multi-agent signal control model is designed in an hierarchical
manner to provide different levels of control for the traffic network. The architecture
consists of three layers with each layer having a different internal structure and
composition. The agent in the lowest layer consists of intersection controller agents
(ICA) that control individual, pre-assigned intersections in the traffic network. The
middle layer consists of zone controller agents (ZCA) that controls several pre
assigned ICAs. The highest level consists of one regional controller agent (RCA)
controls all the ZCAs.
73
Regional
Control Agent
(RCA)
Online
reinforcement
learning
Policy Repository
Gating Module
Zonal Control Agents
ZCA
ZCA
Arbitration and conflict
resolution
Gating Module
Gating Module
Intersection Control Agent
ICA
ICA
ICA
Intersection Control Agent
ICA
ICA
ICA
Traffic Network
Figure 3.1. Architecture of hierarchical multi agent system
The three-layered multi-agent architecture [45] is shown in Figure 3.1. The problem
of real-time network-wide signal control is divided into several sub-problems of
different scales and magnitude. Individual agents from each layer of the multi-agent
architecture are tasked to manage the respective sub-problems according to their
position in the hierarchy. Each agent is a concurrent logical process capable of
querying, directly interacting with the environment (e.g., sensors and agents in the
lower hierarchy within its control) and making decisions autonomously. The agents in
each layer decide the appropriate local policies that indicates the green signal timings
necessary for each phase and levels of cooperation- indicated by cooperative factor
that they deem appropriate based on the conditions of the intersections, or set of
74
intersections under their jurisdictions. Besides having higher-level traffic network
parameters as inputs to their decision-making process, the higher-level agents obtain
the cooperative factors recommended by their lower level agents as inputs too
(Fig.3.1. shows that the intersection cooperative factors recommended by the lower
level ICAs are part of the inputs of a ZCA).
Based on these inputs, the decision-making process of the higher level agents may
present a set of higher-level policies that are different from those policies
recommended by their lower level agents or they may choose to follow the lower
level policies. The policy repository is a dynamic database for storing all the policies
recommended by the controller agents of all levels at the end of each evaluation
period. The end of an evaluation period is indicated when all the intersections have
finished their current signal phases. After each period, the previously recommended
policies are updated with a new set of policies. The policy repository then performs
arbitration and conflict resolution for the entire set of recommended polices.
The arbitration process gives priority to higher-level policies. However, since one of
the outputs, namely the cooperative factor, of the lower-level agents is a part of the
inputs to higher-level agents (as mentioned earlier), the lower-level decisions affect
directly the outcomes of the higher-level agent‟s decision-making process. As such,
lower-level policies are not completely neglected by the arbitration process. The
decided final policies for each of the agents in different hierarchy is stored in the
policy interpreter for future use. The function of the policy interpreter is to translate
the chosen set of policies into actions, which may result in adjustment of the various
signal-timing parameters such as phase-length, cycle-time, direction of offset
coordination, for the affected intersections.
75
Webster state
estimator
High
Average
occupancy
Error between
computed and
calculated
values
High Load
Medium
Low
Medium Load
Good state
High
Average
flow
State
Low Load
Medium
Low
Average
change in
rate
Bad state
High Congestion
expectation
High
Medium
Congestion
Expectation
Medium
Low
Computed traffic
state value
Synaptic
weights
Synaptic
weights
Low Congestion
Expectation
Synaptic
weights
Synaptic
weights
Implication
Fuzzification
T-Norm
S-Norm
Consequent
Defuzzification
Aggregator
Figure 3.2. Internal neuro-fuzzy architecture of the decision module in zonal control
agent
Each Intersection Control Agent (ICA) takes in the lane-specific occupancy, flow and
rate of change of flow of the different intersection approaches as input. The
occupancy, flow and rate of change of flow are computed from the data collected by
loop detectors placed at the stop line of the intersection in the lanes with right of way
during the specific phase in progress. Based on the collected data, green time for a
specific phase and the intersection cooperation factor is computed. In order to
quantify the traffic conditions of the intersections in a zone, neuro-fuzzy decision
module of the ZCA takes in each intersection‟s representative occupancy, flow and
rate of change of flow as its inputs is shown in Figure 3.2.
76
The fuzzy sets of occupancy, flow and rate of change of traffic volume have three
linguistic labels, namely high, medium and low to describe the respective degrees of
membership (Gaussian membership function). Besides these inputs, the ZCA also
takes in the intersection cooperative factors recommended by the respective ICAs
under its control (to reflect the level of cooperation each ICA sees fit for its own
intersection, all of which are within the zone controlled by the ZCA). The antecedents
of the fuzzy rules are defined by properly linking the nodes in the second layer to
those in the third layer. The third layer fires each rule based on the T-norm fuzzy
operation, implemented using the minimum operator. Nodes in the third layer define
the degree of current traffic loading of the zone (i.e., high, medium and low loads)
and the level of cooperation needed for the intersections within the zone (i.e., high,
medium and low degrees of cooperation). The nodes in the fourth layer represent the
various consequents that correspond to the fuzzy rules in the decision module.
For the signal policy inference engine, the consequents consist of the various signal
improvement/control policies. For the cooperation factor inference engine, the
consequents consist of the various possible levels of cooperation. Since some of the
fuzzy rules share the same consequent, the S-norm fuzzy operation is used to integrate
and reduce the number of rules. For this research, the S-norm fuzzy operation is
implemented using the MAX operator.
Finally, the fifth layer performs the defuzzification process in order to obtain crisp
values correspond to the chosen signal policy and cooperative factor (i.e., outputs of
the neuro-fuzzy decision module for each agent). The architecture of the decision
module for the ICA and RCA is largely similar to the one described for the ZCA. The
main difference lies in the inputs and the hierarchical nature of the overall multi-agent
77
architecture. The RCA can overrule the decisions of all agents in the lower level of
hierarchy than itself.
The flow of control is from top to down and no sideways communication between
agents of same hierarchy exists. This effectively reduces the amount of information to
be communicated between agents. This reduces the effects of information corruption,
noise addition or the loss of data arising due to communication link failure. However,
data mining requirements are extremely inflated. It is not an easy task to identify the
representative input data required by agents in higher level of hierarchy. The
complexity further increases with growing network size.
3.2
SUMMARY
This chapter presented a comprehensive review of the existing traffic signal control
techniques and highlighted the advantages and disadvantages of the various trafficadaptive control strategies. The review clearly indicates the lack of a distributed
intelligent traffic signal control capable of optimizing the green time minimizing the
total travel time delay experienced by the vehicles inside the road network. The
insights that are derived from the previous research works will be useful in designing
a multi-agent based distributed traffic signal control with effective communication
and learning capabilities.
78
CHAPTER 4
DESIGN OF PROPOSED MULTI-AGENT
ARCHITECTURE
In this chapter, the proposed multi-agent architecture is presented in detail. Based on
the detailed study of the advantages and disadvantages of the various agent
architecture presented in Chapter 2, a connected distributed architecture with local
communication through message passing between agents is proposed. For proper
functioning of any multi-agent system, it must be designed in a modular fashion with
each module performing a specific task. The basic functional aspects that need to be
present in a multi-agent architecture are identified, and the required interactions with
different modules within the agent are explained in detail.
4.1
PROPOSED AGENT ARCHITECTURE
The proposed multi-agent traffic signal control system is a distributed system with
autonomous agents capable of interacting with each other and deciding the optimal
action without an external supervisory system. The overall structure of the proposed
multi-agent architecture is shown in Figure 4.1. It consists of nine distributed agents
communicating with their immediate neighbours to perform the desired action.
The proposed agent system does not have a specific overall agent management
structure except for the directory service as all the agents have homogenous structure
and equal decision making capabilities. The directory service stores each agent‟s
attributes, behaviours, information pertaining to the neighbouring intersection,
79
number of links, number of lanes etc. The structure of each individual agent is shown
in
Figure 4.2. In order to control the traffic flow at urban road network, each
intersection is controlled by an autonomous agent.
Each intersection agent is composed of six modules or functional blocks operating
concurrently. They are as follows:
Data Collection module
Communication module
Decision module
Knowledge base and data repository module
Action implementation module
Backup module
Figure 4.1. Overall structure of the proposed multi agent architecture
80
Agent Directory Service
Agent1
Agent 3
Communication
module
Data collection
module
Decision module
Backup module
Traffic road
network
Action
implementation
module
Knowledge base /
Data repository
Agent 2
Figure 4.2. Internal structure of the proposed multi agent system
Each module performs a specific task and stores the output in a buffered memory
thereby creating a concurrent or parallel environment. The architecture and the
decision system of the individual modules can vary in accordance to the requirements
of the application. Functionality of each of the module is described in detail in the
following sections.
81
4.2
DATA COLLECTION MODULE
Each agent receives the data required for its functioning from the loop induction
detectors placed near the stop line of the intersection. A snapshot of the intersection
showing the induction loop detector is shown in Figure 4.3.
Figure 4.3 Induction loop detectors at intersection
Figure 4.4. Working of Induction loop detectors
The induction loop detector placed on road detects the presence of a vehicle using the
change in the magnetic field caused by the metallic parts of the vehicle. Using this
information, the road side electronic devices can compute the number of vehicles that
have crossed the intersection, the amount of time a vehicle occupies the link etc. Data
82
collection module interacts with these electronic devices to gather the traffic
information. The traffic data used by the agent system designed are flow, queue, rate
of change of flow, and the ratio of green time at neighbouring intersections to
maximum permissible green time. Not all of the input data is used in the designed
agent system and the choice of input depends on the decision system employed.
The flow rate of vehicles in a link is calculated based on the difference between the
upstream arrival flow rate and downstream departure flow rate for each lane in the
link. Upstream refers to the intersection that releases the vehicles and downstream is
the intersection that receives the vehicles. The data is sampled at the frequency of five
to ten seconds.
Increasing the sampling frequency would enable collection of large amount of data
that can be used to construct a realistic traffic arrival distribution and increase the
possibility of noise compensation by averaging. Mathematically, the flow value
calculated can be expressed as
(4.1)
where
is the current green time for the phase, n is the number of links having right
of way during the phase,
period and,
is the number of lanes in the link,
is the sampling
are the upstream and downstream vehicle flow rate
respectively.
The queue information is calculated as the number of vehicles waiting in lanes which
had the right of way during the phase that just ended. The queue value gives only the
83
remaining vehicles count and does not take into consideration the vehicle count added
after the end of a phase. As it is essential to calculate the green time based on the
maximum congested lane in a link, the maximum value of queue formed is used as the
input data. Equation (4.2) shows the mathematical expression for obtaining the
maximum queue from the links.
(4.2)
where
is the maximum queue value at an intersection,
represents
the upstream, downstream traffic count and the previous stored queue for the specific
lane in a link respectively.
It is assumed that the arrival rate of vehicles can be approximated to a uniform
distribution, as the average value is calculated from data sampled at high frequency,
and can adequately compensate queue increase at the end of each phase.
The rate of change of flow is the other input that is used in few of the proposed agent
architecture. Rate of change is computed as the difference between the average flow
value computed at decision time instance
used during decision time
for a specific link and lane to the flow
.
(4.3)
Where
links,
is the rate of change in flow at decision instance ,
is the number of lanes connected to each link,
and
is the number of
are the
84
flow values computed at decision time instance
and
. The rate of change in
flow gives an indication of the variation in vehicle count value and helps in
differentiating a saturated network from unsaturated one. The sign of the rate of
change in flow also gives an indication whether the vehicle count has incremented or
decremented during the evaluation period from the previous one.
The final input used is the information communicated by the neighbouring
intersection. Its value is computed as ratio of the green time for a phase in progress at
the neighbouring intersection to the maximum permissible green time. Usually the
maximum permissible green time is fixed at forty seconds as given in highway
capacity manual(HCM).
4.3
COMMUNICATION MODULE
The communication module is responsible for multiple functions. The communication
module interacts with the agent directory service to collect the information regarding
the agents that are connected to the current agent through incoming and outgoing
links and their corresponding configuration. The communication module receives the
information of neighbour status data or the reward value computed by the
neighbouring intersection. The communication module follows some of the guidelines
of
FIPA
protocol[60]
(Foundation
for
Intelligent
Physical
Agents).
The
communication is performed as a broadcast. Each agent checks for the agent ID in the
transmitted information and if it matches the ID of the connected agents (Queried
from directory service), the information is stored in a buffer with a time stamp for use
at a later time.
85
The communication module also communicates the state of the traffic of the
intersection as neighbourhood status data or reward value to the adjacent agents as a
broadcast.
The module is responsible for coordination and cooperation between the agents. In
absence of communication between agents, the limited sensing capability of the
sensors associated with each agent provides only a partial view of the environment.
This can cause selection of actions that can produce higher delays in long run because
of the correlated agent environment. The communication module serves to enhance
the view of each agent by sharing information with other agents to improve the action
policy selected to achieve global optimal solution.
The communication module works in an asynchronous manner therefore necessitating
the use of buffer memory to store the data that can be used for synchronization of
information in the later stages in the decision module. A typical communication
between the agents is shown in Figure 4.6. The dotted arrows indicate the data
received by the agent from the agents connected to the incoming links and the solid
arrows indicate data transmitted to the other agents. Apart from the above mentioned
functionalities, the communication module also interacts with the directory service to
receive the information that are essential for computing the green time. This is helpful
in informing the agents any variations required in the signal plans, phases or reduction
in the capacity of infrastructure.
86
FIPA QUERY PROTOCOL
Initiator
Participant
Query-if
Query-Ref
refuse
[refused]
agree
[agreed and notification
necessary]
failure
Inform-t/f :
inform
[agreed]
[Query-if]
Inform-result :
inform
[Query-Ref]
Figure 4.5. FIPA Query protocol[60]
87
A1
A2
A3
A4
A5
A6
Figure 4.6. Typical communication flow between agents at traffic intersection
4.4
DECISION MODULE
The decision module is responsible for computing the optimal value of green time for
each phase of signal at the intersection. The module uses the information received
from the data collection module and the communication module to perform its action.
The decision module could be designed using type-2 fuzzy or neuro-type-2 fuzzy
system. The details of the proposed decision architecture are provided in Chapter 5.
4.5
KNOWLEDGE BASE AND DATA REPOSITORY MODULE
The knowledge base and data repository module is used to store the rule base and the
data collected from the road networks to design fuzzy decision system and to perform
online training of the designed decision system. The Knowledge base update is
88
performed if there is any variation in the rule base or at an interval of 30 minutes,
which is fixed to gather the data.
The knowledge update is performed in an asynchronous manner. Irrespective of
whether the decision module is currently executing the decisions chosen or
performing the computation operation, knowledge update is performed at prespecified intervals. The knowledge base is also responsible for storing the signal plan
calculated using the Webster‟s formula based on the historical traffic flow pattern. in
case of failure to receive data from the road networks, these signal plans are used by
the backup module.
4.6
ACTION IMPLEMENTATION MODULE
The action implementation module adjusts the green time of the phase in each cycle
of a traffic signal control at an intersection based on the optimized green time
computed by the decision making module. The action implementation module ensures
the completion of all phases in a cycle before adjusting the green time so that vehicles
in lanes that do not have right of way during a specific phase do not experience delay
because of extension of the green time. In case of failure of the decision module due
to certain reasons, the module communicates with the backup module to fetch the
signal plan according to the timing of the day. In short, the action implementation
module serves as a main communication interface between the traffic signal and the
agent. It also allows for a central control to override the signal timings in case of
emergency situations allowing manual control of the signals.
89
4.7
BACKUP MODULE
The backup module sets the signal control operation in pre-timed mode in the event of
input failure for ten consecutive cycles or stationary action policy for twenty
consecutive cycles.
The backup module queries the knowledge base and data
repository module for the appropriate signal plans based on the timing of the day. The
signal plans stored in the knowledge base are computed based on Webster‟s formula.
Procedure to obtain the lost time and calculation of split timing used in the Webster‟s
signal time calculation is detailed in [29].
The backup system returns control to the agent once changes in the decision value are
detected. This ensures smooth flow of traffic to the maximum possible extent except
during network saturation or full capacity flow period, without affecting the network.
4.8
SUMMARY
The chapter covered the basic building blocks used to construct the proposed multiagent architecture. It also provided a clear overview of the functionality of the
individual blocks, the flow of control and data communication between modules and
the necessity of modules to cooperate with other functional blocks to control traffic
in an urban road network.
90
CHAPTER 5
DESIGN OF HYBRID INTELLIGENT DECISION
SYSTEMS
In this chapter, details of the proposed decision systems design, architecture and their
construction are discussed in detail. This chapter also presents an overview of type-2
fuzzy sets, symbiotic evolutionary learning techniques used in this thesis, their
advantages and its application to urban traffic signal timing optimization problem.
5.1.
OVERVIEW OF TYPE-2 FUZZY SETS
Fuzzy set theory or fuzzy logic was first proposed by Zadeh in early 1975 [95-97]. It
was an attempt to represent the uncertainties and vagueness associated with the
linguistic expressions both quantitatively and qualitatively. The transition from crisp
set theory to fuzzy theory was to accommodate and exploit the generality of fuzzy
theory and its ability to replicate the real-world scenario to a large extent. When there
is no fuzziness involved in the definition of a particular class or cluster of objects, it
becomes a simple two-valued characteristic function. The input or object is assigned
a crisp value between zero and one if it belongs to the class and zero otherwise. This
is not true in real-world applications. There is a certain level of abstraction and
fuzziness associated with the membership values assigned to each input. Fuzzy logic
is able to provide the abstraction similar to human thinking and account for this
fuzziness.
91
Figure 5.1. Block diagram of type-2 fuzzy sets
The preliminary fuzzy systems or type-1 fuzzy systems developed were based on the
assumption of the availability of a crisp value of membership grade for the entire
universe of discourse of the input. In many applications, determining a crisp
membership value is a difficult task as an input could be perceived differently by
individuals. For example, a traffic flow value of 250veh/hr could be assigned a value
of 0.7 under class low and the same could be assigned a value of 0.4 by another
individual. In order to handle such uncertainties in the interpretation, type-2 fuzzy sets
were introduced. Type-2 fuzzy sets are represented by type-2 membership values that
are assigned a range of values rather than a crisp value. Based on the discussion, the
type-2 fuzzy sets can be written mathematically as shown below.
A (( x, u), A ( x, u) | x X , u J x [0,1]
(5.1)
Where 0 A ( x, u) 1. This definition is particularly different from that of the type-1
fuzzy sets and can be derived as an extension of type-1 fuzzy sets shown in (5.2).
A x, A ( x) | x X
(5.2)
92
Where 0 A ( x) 1 . A type-2 fuzzy set has an additional dimension associated with
its membership value A ( x) . In simple words, for a type-1 fuzzy logic, when x x ,
the fuzzy membership function A ( x) has a single crisp value whereas for a type-2
fuzzy set, A ( x) would provide multiple values between the lower and upper bounds.
A diagrammatic representation of type-2 fuzzy membership function is shown in
Figure 5.2 and 5.3. The membership grade u is a crisp value when J x is represented
in a three dimensional space and J x is the vertical slice of the membership function
A ( x, u ) . The secondary membership function for the input x x can be written as in
( 5.3).
A ( x x, u ) A ( x)
uJ x
f x (u ) / u
( 5.3)
Where f x (u ) is the amplitude of the secondary membership function and its value
lies between [0,1] both included. This clearly indicates that there is a secondary value
associated with f x (u ) for each primary grade.
Figure 5.2. Type-1 Gaussian membership function
93
Figure 5.3. Type-2 fuzzy membership function with fixed mean and varying sigma
values
The main problem with type-2 fuzzy logic in its generalized form is the amount of
computation required to convert from type-2 fuzzy
to type-1 fuzzy set before
computing the crisp output value. The computational complexity is greatly reduced by
using interval type-2 fuzzy sets[98], a special form of type-2 sets with the secondary
membership function value set to unity. This assumption helps in the use of type-2
fuzzy sets for practical applications to a great extent. As all the input uncertainties are
not known apriori, it is impossible to associate a proper secondary membership value
to each input. Therefore the assumption of unity value for secondary membership
grade while maintaining the property of type-2 fuzzy sets also reduces the
computational requirements.
Mathematically, interval type-2 fuzzy sets can be
written as shown in (5.2) by modifying the general expression of type-2 fuzzy sets in
(5.1).
94
A
1/( x, u )
xX uJ x [0,1]
1/
u
/ x
x X uJ [0,1]
(5.4)
x
Where J x is called the primary membership of x. The uncertainty about the fuzzy set
is conveyed by the union all the primary memberships and is called as footprint of
uncertainty [99].
The FOU is bounded by the upper and the lower membership function and can be
represented as given in (5.5)
FOU ( A)
xX
[ A ( x), A ( x)]
(5.5)
In the representation of the type-2 fuzzy sets, integration symbol is often used. This
does not represent actual mathematical integration but represents continuous universe
of discourse. For a discrete universe of discourse, summation symbol is used. Most of
the real world applications, however, have a continuous universe of discourse, and it
is a general practise to use the integration symbol.
The complexity as well as the uniqueness of type-2 fuzzy logic lies in the presence of
FOU. The footprint of uncertainty is usually obtained by blurring the edges of the
type-1 fuzzy membership function. It is possible to represent a type-2 fuzzy set using
a large number of embedded type-1 fuzzy sets. The embedded fuzzy sets are formed
such that at each input „x’, only a single value of primary membership grade between
the upper and lower range is used to construct the embedded set.
These embedded fuzzy sets increase computational requirements. Even if a
computationally less expensive method like centroid or center of sets methods is used
to obtain a crisp value, it incurs huge computational burden as it requires computing
95
centroid values of large number of embedded type-1 fuzzy sets to obtain an average
value. Reduction in the computational requirement has been achieved by using
different methods like iterative procedure [100], uncertainty bounds [101] method or
using geometric properties [102] of the membership function.
5.1.1.
Union of fuzzy sets
Based on the set theoretic operators available, the union operator for type-2 fuzzy can
be obtained according to the application requirement. T-conorm operator is usually
used for fuzzy logic. The advantages of using T-conorm operator is given in [103] .
Usually a maximum function is used for t-conorm operation to obtain the union of
fuzzy sets. The union of two fuzzy sets is as shown in (5.6)
A
z
( x)
1
2
u J x
A ( x)
f x ( u ) g x (u ) / v
wJ x
z ( x)
x X
(5.6)
Where v u ... w and denotes the maximum t-conorm operator. The symbol *
denotes minimum or product t-norm. The function of the union operator is to
enumerate all the possible maximum values of the primary membership grades of the
two sets and extract the minimum of the secondary membership grade.
5.1.2.
Intersection of fuzzy sets
The set theoretic operator commonly used for performing the intersection operation is
usually the product t-norm operator. The operation is performed as shown in (5.7).
96
A
z
( x)
1
f x ( u ) g x (u ) / v
2
u J x
wJ x
( x)
A ( x)
z
x X
(5.7)
Where v u ... w and denotes the product t-norm operator. The meet or
intersection operator effectively enumerates the minimum value of the primary and
the secondary membership grades of all the fuzzy sets.
5.1.3.
Complement of fuzzy sets
The complement of fuzzy sets is got as the primary membership grade subtracted
from unity. The representation of the complement operation is as shown in (5.8).
A ( x)
f x (u ) /(1 u ) A ( x)
x X
(5.8)
u
uJ x
Where denotes negation operator.
5.1.4.
Karnik Mendel [KM] algorithm for defuzzification
For performing defuzzification and to obtain a crisp value as output, different
methods like centre of sets, centroid, centre-of-sums and height type-reduction can be
used. Defuzzification in type-2 fuzzy sets requires computing the average value of
multiple number of centroid values computed for all the probable embedded fuzzy
sets that can be drawn. This is computationally intensive and prevents the use of type2 fuzzy sets in real-world applications. To circumvent this limitation, Karnik and
Mendel designed an iterative algorithm [100] that can obtain the crisp value in
maximum of M iterations, where M is the maximum number of rules in the rule base.
97
This is achieved by computing the left and right end points. The end points refers to
the point where the lower membership values shift to higher membership function
values and vice versa.
The KM algorithm for computing
is :
1. Without any loss of generality, assume the pre-computed value of
arranged in ascending order as shown
are
, where
is
the number of rules
2. Compute the
as
by initially setting the right end
point firing strength as
3. Find
for
and let
such that
4. Compute
with
for
and
for
and let
5. If
6. Set
then go to Step 6 else stop and set
equal to
and return to Step 3.
The same method is employed to compute the left-end point of the consequent. The
defuzzified output of an IT2 fuzzy logic system can be calculated as the average of
and
as shown in (5.9)
(5.9)
5.1.5.
Geometric defuzzification
Geometric type-2 fuzzy set can be defined as a collection of polygons in three
dimensional space with the edges forming the triangle [104]. In geometric
98
defuzzification, the type-reducer and the defuzzifier are combined to form a single
functional block, thereby reducing the computational requirements associated with the
type-reducer, and to directly provide the crisp output from the type-2 fuzzy
consequent sets. Centroid of the geometric type-2 fuzzy consequent can be calculated
as the center of the geometric shape of the final consequent set obtained using the
Bentley-Ottman plane sweep algorithm [105] after the calculation of firing levels of
all the rules. The plane sweep can be performed by ordering the discretized values of
the coordinate points in the consequent type-2 fuzzy set, and the corresponding
centroid (5.10) can be calculated by finding the closed polygon formed by the edges
of the ordered coordinates with the origin .
n 1
centroid
x x x y
i 0
i 1
i
i
i 1
xi 1 yi
n 1
3 xi yi 1 xi 1 yi
(5.10)
i 0
Figure 5.4 shows a consequent type-2 fuzzy set, where the lower and upper
coordinates are arranged in proper ascending and descending order. The two different
shaded regions in figure 5.4 indicate the triangle formed by using two adjacent points
on consequent and the origin to construct the final closed polygon.
Figure 5.4. Ordered coordinates geometric consequent set showing two of the closed
polygons
99
5.2.
APPROPRIATE SITUATIONS FOR APPLYING TYPE-2 FLS
The most appropriate situations under which Type-2 FLS performs the best are
highlighted in [106]. They can be summarized as follows:
Non-stationary noise associated with the sensor measurements that cannot
be fully expressed mathematically.
A stochastic data generating mechanism that cannot be correctly
approximated by mathematical distribution functions like Gaussian or
Poisson distribution.
The knowledge base used to construct the rule base for the fuzzy logic
system is mined from a series of if-then questionnaires put forward to
experts.
All these properties are existent in the urban vehicular traffic.
The induction loop detectors used are easily affected by the prevalent external
environmental conditions. This causes loss of data, improper detection of
vehicles, improper classification of the length of the vehicles thereby leading
to a calculation that predicts higher vehicle count than the actual value. This
can be assumed as random noise added to the input data and varies according
to the conditions of sensor and environment.
The traffic release pattern is highly stochastic in nature. The exact number of
vehicles that would enter a section of the network is not easily predictable as it
depends to a great extent on the drivers‟ behaviour and their need for travel.
Additionally, the presence of traffic signals produces a pseudo random effect.
100
This is mainly due to the formation of platoon of vehicles at the intersection
because of the traffic signal at intersections.
Traffic at an intersection can be easily defined using linguistic rules. It is much
easier to define the control sequence as a set of rules. The rules are usually
constructed using expert opinion. These rules are prone to error with increase
in the number of links, lanes and phases at an intersection.
All of the above mentioned properties of the traffic system makes it an ideal candidate
for type-2 fuzzy system based control.
5.3.
CLASSIFICATION OF THE PROPOSED DECISION SYSTEMS
Four different types of decision systems are proposed in this work. The first two
decision systems (T2DR and GFMAS) are designed based on heuristics and deductive
reasoning. Third decision system (SET2) design is based on a stochastic symbiotic
evolutionary learning approach to adapt the parameters of the type-2 fuzzy system.
This decision system uses an online batch learning approach. The fourth decision
system designed is a Q-learning based neuro-type2 fuzzy system (QLT2). It is an
online adaptive system where the parameters of the neuro-fuzzy system are adapted
using back propagation technique using error value computed with reinforcement Qlearning technique.
5.4.
TYPE-2 FUZZY DEDUCTIVE REASONING DECISION SYSTEM
The first heuristics based type-2 fuzzy decision system designed is T2DR (Type-2
fuzzy Deductive reasoning) system. The block diagram of the decision system is
101
shown in Figure 5.5 and details of the designed decision system are described in the
following sections.
5.4.1.
Traffic data inputs and fuzzy rule base
T2DR decision system uses the flow, queue and communicated neighbour status data
to compute the green time. The data collection module provides the necessary data to
the decision system. First step performed inside the decision module is the
fuzzification of crisp inputs received from the data collection module. The queue and
flow data are clustered into three regions and the universe of discourse of the inputs
are decided using the hotspot and saturation flow rate parameters. Hotspot refers to the
condition of the link that experiences a queue length greater than threshold number of
vehicles beyond which link can be classified as in high congestion state. A vehicle
count of thirteen was used as a threshold value to classify the link as hotspot [62].
Saturation flow rate refers to the equivalent hourly rate at which vehicles can traverse
an intersection under prevailing conditions, assuming a constant green indication at all
time and no loss time, expressed in vehicles per hour or vehicles per hour per lane. A
saturation flow rate of 2400 vehicle per hour was used to normalize the flow data.
102
Decision System
Queue
Type-2 fuzzy
decision system
Optimized green
time
Level of
cooperation /
Weight for flow
Action
Implementation
module
Type-2 fuzzy
LOC
Communication
module
Flow
Input from
neighbour
nodes
Max
Aggregation
operator
From
adjacent
agents
Figure 5.5. Block diagram of T2DR multi-agent weighted input decision system
Membership grade for each of the clustered input was assigned using a Gaussian
function with overlapping regions. The mean values of the Gaussian functions
represent the centre of the cluster formed and the values were chosen to be closer to
the values used in [62]. The upper and lower boundary standard deviation values were
calculated based on the flow rates and queues experienced at different time periods
and same period of the day in a week. The fuzzified antecedents and consequents of
the fuzzy rules are as shown in Table 5.1 and 5.2.
The consequents were designed as type-2 fuzzy sets to account for the uncertainties
associated with scaling factor and the signal timings. In this work, type-2 membership
functions were developed based on heuristics and type-1 fuzzy membership functions
used previously in [45]. The type-1 fuzzy membership function in [44] was used as
the base and blurred to create lower and upper membership grade. The details of
103
antecedent and consequent membership functions used are shown in Figure 5.6. The
maximum green time allocated for a phase is fixed at 60 seconds even though the
recommended value is 40 seconds to provide more degree of freedom for the signal
control and a minimum green time of 10 seconds to avoid short phase, switching loss
and increased travel time delay.
µg
µ fl
B)
A)
µQ
µN
D)
C)
µW
E)
Figure 5.6. Antecedent and consequent fuzzy membership functions
104
Selection of appropriate rules and mapping of the antecedents to consequents is an
important objective. In general practice, the rules are created by trial and error based
on expert knowledge. Type-2 fuzzy system produces better results than type-1 fuzzy
system under these conditions [106]. The structure of the developed rule base is as
shown
Level of cooperation decision fuzzy system - If {flow is high} and
{Neighbor status is low} then {Level of cooperation is low}
Green time decision fuzzy system - If {Queue is low} and {flow is high}
then {traffic is low}
Table 5.1. Mapping of flow and Neighbour state inputs to consequent weighting
factor output
Neighbour state
flow
low
Medium
High
sparse
low
low
low
low
low
low
low
Medium
low
medium
medium
High
medium
high
high
105
Table 5.2. Mapping of flow and queue input to consequent green time output
Queue
Flow
low
Medium
High
low
sparse
low
low
Medium
high
medium
medium
High
high
high
high
Table 5.1 and 5.2 show the rule base for both the belief or level of cooperation and
the fuzzy decision module. The initial rule base was designed based on the
understanding of [107] in creating the rule base from data. The rule base was finetuned and validated using a small section of the network with six intersections. The
initial origin-destination matrix or the traffic demand was modified from the original
matrix created from actual data to avoid saturation of the network as suggested in
[108]. The rules were initially modelled for fuzzy decision module without any
communication between the agents. The validation of the rule base was performed by
comparing the total mean delay experienced by a vehicle inside a network to HMS
traffic signal controller. A detailed explanation of Hierarchical Multi-agent System
(HMS) was provided in chapter 3 . The rule base with the smallest travel time delay in
comparison to HMS was used in this work. The same procedure was repeated for the
agent system with communication capabilities to develop the belief model which
decides the level of cooperation. The initial rule base was developed based on
intuition and basic understanding of the relationship between traffic flow and queue.
106
5.4.2.
Inference Engine
Inference is the key component of a fuzzy system. The antecedents membership
functions are used to calculate the firing level for each rule in the rule base and is
applied to the consequent fuzzy sets. The outputs from the belief and fuzzy decision
system for the l-th fuzzy rule can be written as in (5.11) and (5.13).
(5.11)
(5.12)
The weight parameter for the flow input was calculated based on the instantaneous
flow value and the communicated neighbour status data. Max function was used to
aggregate the communicated neighbour status data to enable coordination with
respect to most congested neighbour. The computed weight parameter calculated is
inversely proportional to the level of cooperation among the agents. A large weight
value causes the flow input to the decision making fuzzy system of the inference
engine to have a higher membership value as shown in (5.12). This makes the output
of the fuzzy system to be influenced by the queue input and less dependent on
neighbouring agents.
The flow input was used for deciding the level of cooperation, as it is an averaged
value giving the overall traffic situation of the neighbours and calculated from
upstream, downstream and queue spill back inputs. The instantaneous value of queue
was determined using „max‟ function as they indicate current traffic condition at an
107
intersection and ensure the green time calculated is with respect to maximum
congested lane.
( 5.13)
Where
are the outputs of the fuzzy systems,
,
,
are the
type-2 membership functions for the antecedent inputs for the l-th rule.
,
,
are the type-2 fuzzy membership functions
of the inputs. Since the interval type-2 fuzzy system is used for determining the
optimized green time, the firing level obtained will also be expressed as an interval
set:
(5.14)
The lower and the upper bounds of the level of cooperation deciding type-2 fuzzy
system of the inference engine can be written as in (5.15) and (5.16) respectively.
(5.15)
(5.16)
In a similar fashion, the lower and upper bounds of the decision making fuzzy
system can be written as in (5.17) and (5.18):
(5.17)
108
(5.18)
Output from the fuzzy system for all the rules can be aggregated together as shown
in (5.19) and (5.20)
(5.19)
(5.20)
Where „m‟ refers to the number of rules in the decision making type-2 fuzzy system.
Karnik-Mendel algorithm was used to convert the type-2 fuzzy outputs into type-1
fuzzy output. The crisp output is calculated as average value of the right and left end
points calculated using the equations given in (5.21) and (5.22) respectively.
(5.21)
(5.22)
(5.23)
The main objective of the firing level is to minimize the lower bound and maximize
the upper bound. As the number of rules were restricted to a fixed value, the number
of iterations required to compute the left and right end points is bound within the
fixed value.
The final green value output selected is applied to the traffic signal through action
implementation module. The action implementation module has an additional task of
checking for the cycle time and adjusting the values proportionally for each phase, as
maximum cycle length allowed is restricted to 120 seconds.
109
5.5.
GEOMETRIC FUZZY MULTI-AGENT SYSTEM
The computational complexity associated with the T2DR (Type-2 fuzzy deductive
reasoning) decision system and also the difficulty in computing the associated weight
for the input necessitates the development of lesser complex system with internal
belief model and trapezoidal membership function for ease of implementing a lesser
complex defuzzification process developed based on the geometric properties. It was
also found that not all the inputs and output need to be of type-2 fuzzy set. To satisfy
these requirements a geometric type-2 fuzzy decision process was developed and
explained in detail in the following sections.
Figure 5.7. GFMAS agent architecture
5.5.1.
Input Fuzzifier
The averaged values of flow and queue data collected from the road network and
communicated data is passed on to the geometric type-2 inference system for
110
calculating the green time required for the phase during the next cycle period.
Geometric inference system has the same functional blocks as those of a type-2 fuzzy
system, except the type-reducer and defuzzifier are merged into a single block called
geometric defuzzifier as shown in Figure 5.8. The initial step in the inference system
is the fuzzification of the inputs, in which a measure of possibility is assigned to each
input. Since the fuzzy set used was an interval type-2 fuzzy set, the inputs were
assigned upper and lower membership grades or measure of possibility based on their
membership functions, and the secondary membership grade associated with each
primary membership grade assigned a value of unity. The union of all the membership
grades between the upper and lower bounds would give the foot print of uncertainty
of the input in the classified region. The membership functions were designed in
trapezoidal shape and divided into three regions.
The lower and upper bounds in each region were decided based on the maximum and
minimum flow rate experienced at an intersection during the specified time period of
weekdays thereby, each input is associated with a range of membership grade rather
than a crisp point, thus retaining the fuzziness. In a similar manner, the queue count
was designed as Type-2 fuzzy set, and each input was assigned a boundary
membership grade. The final input used was the data received from the
communication module which is a Type-1 fuzzy set. The fuzzified inputs are shown
in Figure 5.9.
111
Flow
Fuzzifier
Rule base
Queue
Fuzzifier
Inference
Engine
Comm
data
Fuzzifier
Geometric
defuzzifier
Green
Time
Figure 5.8. Block diagram of geometric type-2 fuzzy system
The communication module calculates the percentage of green time allocated to a
phase by the signal with respect to the maximum permissible green time. Green time
can serve as a direct indication of the traffic state at an intersection if the value is
calculated dynamically based on the traffic data and the traffic flowing in the
outgoing links. However, the relationship becomes highly non-linear due to the
influence of traffic in the incoming links and platoon formation. The level of
uncertainty associated with green timing in a phase can be easily handled by using
type-1 fuzzy sets itself as the only source of uncertainty is the communication noise.
112
Figure 5.9. Fuzzified antecedents and consequents in a GFMAS
The green timing calculated solely based on the local traffic data would cause
increased congestion levels in the outgoing links. This can cause already congested
neighbouring intersections to experience even higher inflow of vehicles, and
extension of queue in the link beyond the incoming intersection, resulting in
deadlocks. Therefore coordination based on communicated congestion data becomes
essential during the periods of heavy and medium traffic flow. Reduced value of
green time is allocated to a specific phase if the intersection connected to the outgoing
113
link is already congested. This prevents queue spill back and deadlock formation. This
coordination is achieved based on the rule base information and highly congested
neighbour communicated data.
Since the communicated congestion data is similar to the consequent green time
except for their representation as percentage, both have a similar classification. The
membership functions are designed by dividing into three equal regions with the
overlap calculated based on Webster‟s equation. As the coordination is with respect to
the maximally congested neighbour, maximum value of the communicated congestion
data from the neighbouring intersections is used as input. The communication module
in short performs the functions of data reception, transmission of congestion status to
neighbours and data mining operation.
5.5.2.
Inference engine
Inference engine is the core of a multi-agent system. The inputs and outputs to the
type-2 fuzzy inference engine are shown in Figure 5.9. The inference engine
calculates the lower and upper threshold of firing levels for each rule in the rule base.
A total of twenty seven rules were created based on the three inputs as shown in
Figure 5.10. When the communicated congestion data indicates low traffic congestion
in the neighbouring intersections, greedy signal policy was used to calculate the green
time with minimal cooperation. For adjacent intersections with medium and high
traffic congestion levels, the signal policy was suitably modified to produce smaller
duration of green time than with greedy signal policy. This ensures smaller inflow of
vehicles into the already congested intersection, enabling the intersection to clear its
traffic faster.
114
The structure of the rule base is as shown below.
“If Flow input is in Low region and Queue input is in High region and
Communicated data is in Low region then Green time is in High region”
Figure 5.10. Rule base for the GFMAS signal control
The „AND‟ operation is performed by using the t-norm (minimum function). Since
the geometric inference system used was a Type-2 fuzzy system, the lower and upper
firing threshold values for the consequent set for each rule were calculated using
(5.24) and (5.25) respectively
115
Where
and
are the lower membership function,
are the upper membership functions of antecedents inputs
for the l-th rule.
are the lower and upper
membership functions of the inputs queue, flow and communicated congestion data
respectively. The supremum value is attained when the terms inside the bracket attain
the least upper bound value. The firing levels of each rule is used to clip the
consequents membership function and derive the output type-2 fuzzy set.
Geometric defuzzification is used to calculate the crisp value of the green time from
the derived output type-2 fuzzy set. Consequents geometrical properties are used to
derive calculate the centroid of the shape. This requires ordering of the discretized
lower and upper membership functions in ascending and descending order
respectively. The coordinates are obtained using the plane sweeping algorithm that
gives the non-overlapping edges of the trapezoid and the point of intersection of
different regions of green time or the consequent fuzzy set. Figure 5.11 shows the
process of arranging the coordinate points and also removing overlap regions.
Once sorted, triangles can be constructed by using two adjacent points on the
consequent set and the origin. The resultant average value of the centroid of all the
triangles constructed on the consequent set meets the geometric centre of gravity of
116
the output type-2 fuzzy set which is the closed polygon formed by connecting all
edges to origin.
For better accuracy of the calculated value of centroid, it is essential to discretize the
consequent set into a large number of points to increase the number of overlapping
polygons used to average the centroid value. The trade-off would be the
computational cost associated with the process. The signal control decisions are based
on traffic in the incoming and outgoing links, but not on the traffic in lanes without
right of way. Since the green time for each phase is calculated online at the end of
each phase, it is difficult to consider the competing phase timings.
However, the maximum green time limitation imposed on each phase minimizes the
possibility of indiscriminate increase in green time for a phase. The signal control also
begins to allocate more time to a phase in a recursive manner once a queue build up is
detected. It can be said that the traffic at lanes without right of way is taken care of
with a delay of one cycle period.
117
Figure 5.11. Geometric defuzzification process based on Bentley-Ottmann plane
sweeping algorithm
5.6.
SYMBIOTIC EVOLUTIONARY TYPE-2 FUZZY DECISION SYSTEM
SET2 stands for symbiotic evolutionary type-2 fuzzy decision system. In the previous
sections, the rule base and the parameters for the creation of the input fuzzifier of the
type-2 fuzzy system in both T2DR and GFMAS were designed based on the heuristics
and the historical traffic data that were obtained when using a fixed-time signal
control. The traffic data of flow and queue therefore does not truly represent the
dynamics of the environment and does not account for the effect of the variation in
signal timing in other connected intersections. To reduce the errors, the decision
system utilized the communicated information of ratio of green time allocated by each
phase in the adjacent intersection with respect to the maximum permissible value.
However, this creates over dependency of the decision system on communicated
118
information. Any absence of the data due to communication failure would affect the
decisions taken using the constructed rule base in which communicated data is an
integral part. This would trigger the use of backup fixed time plan until the
communication is fixed and adversely affecting the performance of the entire system.
In order to avoid this, it is essential to use the communicated data to build the model
of the system than can effectively take care of the variations in the decisions of
adjacent agents.
In order to address the issues mentioned above, SET2 decision system uses only the
flow and queue input collected from the road network. Apart from these two inputs,
rate of change in flow between two consecutive evaluation period is also used. This
input is made possible by storing the values of average flow experienced during each
phase in the previous cycle in the knowledge base and data repository module. This
input gives an indication of the increase or decrease in the value of flow during the
evaluation period. The universe of discourse for this particular input is normalized
between [-1,1] using the saturation flow rate. A value of -1 would indicate a large
decrease in the value of flow and literally would mean no vehicle crossed the
intersection during the evaluation period and a value of +1 would indicate a large
increase in the vehicle count from the previous evaluation period. The rate of change
in flow input is therefore clustered into three clusters namely {-ve, normal, +ve}. The
rate of change in flow is determined from the flow input received from the loop
detectors placed at the intersection. Therefore, the rate of change in flow also has all
the uncertainties that were associated with the flow input and is represented using
type-2 fuzzy sets.
119
In the previous approaches, the parameters of the fuzzy sets and the rule base were
generated based on the heuristic data. In order to make the decision system much
more responsive to the dynamics of the traffic environment, all the parameters in
SET2 are evolved using online simulation of the network. A batch mode online
learning is employed, where a full set of training data created dynamically with online
simulation is used for the training. Stochastic search based method like traditional
GA can be used for evolving the parameters. However, traditional GA methods suffer
from lack of diversity and requires the use of extremely high value of mutation rate to
maintain the diversity but ends up losing on the exploitation front [109].
Use of a single chromosome to represent the entire solution presents a difficulty in
evolving all the parameters and solution space search becomes too large to handle.
Therefore a cooperative co-evolutionary approach is employed to generate the full
solution. In the following section, an introduction to the basics of symbiotic
evolutionary approach used to generate the decision system is presented.
5.6.1.
Symbiotic evolution
Symbiotic evolution can be defined as a type of co-evolution process where different
individuals explicitly cooperate with each other and rely on the presence of the other
individual for their existence[110]. The symbiotic evolution is distinctly different
from the co-evolutionary GA‟s. Most of the co-evolutionary process is developed
based on the immune system or fitness sharing[111-113]. The immune algorithm is a
global search algorithm that is based on the characteristics of the biological immune
system. It is an adaptive, distributed, and parallel intelligent system that has the
capability to control a complex system. The immune system is aimed at protecting
120
living bodies from the invasion of various foreign substances called antigens, and
eliminating them. When an antigen is introduced in a biological species, a specific
antibody that is capable of detecting the antigen and eliminating it needs to be
produced. This is achieved by forming a group of antibodies that detects maximum
number of antigens introduced. Therefore, the antibodies needs to compete with each
other to provide the desired result. The principles of immune system can be easily
adopted to genetic algorithm with slight modification to the parent chromosome. In
the modified GA, the chromosome represents partial solution instead of the complete
solution . A combination of large number of these partial solutions would then
produce the final antibody that maximizes the objective function. Here each partial
solution compete with each other instead of cooperating with each other. Figure 5.12
Shows the solution generated by combining a large number of partial solutions. The
partial solutions are referred to as “Specializations”
121
Complete Solution
Partial Solution
1
2
3
1
2
4
Specializations
1
Crossover
Mutation
2
3
4
5
Fitness
Evaluation
6
4
5
6
Selection
Fitness Evaluation
for individual
species
Figure 5.12. Block diagram of symbiotic evolution complete solution obtained by
combining partial solutions
This is particularly not useful in large number of practical applications. For example,
evolving an entire rule base of the fuzzy system can be cumbersome. When individual
rules are created as species and combined together to form a full solution, the species
should not be competing with each other but must cooperate with each other to obtain
the optimal final solution.
When a single set of partial solution or specialization is used to create the final
solution, the diversity can be affected to a great extent and large number of
specializations would get repeated. To avoid this, an approach based on creating a
large number of islands of specialization was introduced in [114]. Each island
122
maintains a certain type of specialization that are unique to it and final solution is
obtained by combining specializations gathered from each of the island. Figure 5.13
Shows the block diagram of one such islanded evolutionary approach. In [115-117],
the fuzzy rule base was evolved using the islanded approach. Here each of the
specialization represents a single rule - the parameters of each of the inputs are
encoded to form the partial solution. The number of rules needs to be pre-fixed and
each rule would represent a cluster in the solution space. It is possible that the
solution evolved does not cover the entire solution space if the training data is not
continuous and do not cover the entire state space. Therefore a modified version of
the islanded symbiotic evolution is proposed in this work.
5.6.2.
Proposed symbiotic evolutionary GA decision system
In the proposed symbiotic evolutionary genetic algorithm, instead of creating each
rule along with its input parameters as a specialization, two islands were created. The
first island represents the membership function values such as mean, lower and upper
sigma values for the Gaussian membership function, and the second island represents
the rule. A combination of the membership function and the corresponding rule would
create the final solution. The solution set obtained can then be evaluated as done in
traditional GA. Figure 5.14 Shows the diagrammatic representation of creating full
solution using the proposed symbiotic evolutionary approach.
123
‘X1’ is Low
(0.5, 0.1)
1
2
P
P1
Q2
R1
P2
Qn
Rn
P2
Q1
R1
Islanded Specialization
n
1
2
Q
n
Complete solution
‘X2’ is Medium(0.2,
0.4)
1
2
R
‘X3’ is High(0.1, 0.3)
n
Clustered
Islands
Modified Parent Chromosome Generation
Figure 5.13. A representation of the islanded symbiotic evolutionary algorithm
population
124
MF fitness
evaluation
Membership fn
parameters
Selection
mf
Rule
Crossover
Mutation
Rules
Fitness
Evaluation
Parent population
Rules fitness
evaluation
Selection
Figure 5.14. A block diagram representation of the symbiotic evolution in the
proposed symbiotic evolutionary genetic algorithm
The values of membership function have continuous state space and hence real coded
GA was used. However, binary coded GA was used for evolving the rule base as its
chromosome alleles takes only two values (either active or inactive). The structure of
the chromosome used in membership function island and the rule island specialization
is shown in Figure 5.15 and Figure 5.16.
The parameters evolved is bounded within [0,1]. Real-coded GA ensures a continuous
state space representation and avoids errors arising from discretization of state space.
125
The islanded rule specialization are binary coded and a value of one represents that
the particular input is active and will be used in the computation of output. The cluster
in the consequent that needs to be used in a particular rule is also evolved. The
consequent part is represented using two bits. A decimal equivalent zero value
indicates that the rules is invalid and would not be used. A value of one indicates first
cluster, two represents second cluster and three represents the third cluster from the
consequent to be used for the specific rule. Therefore the evolution of the rule base
island would adapt the number of rules used in a particular evaluation.
Flow
Low
fmean
Med
σlower σupper
High
Low
Queue
Med
High
Rate of change
of flow
Low
Med
Green time
High
Low
Med
High
gmean
σ
Figure 5.15. Structure of the chromosome for membership function cluster island
Obtaining the fitness value for the individual specialization is an extremely difficult
task. The two sets of islanded clusters are dependent on each other and the fitness
value is shared among each other. The fitness of each individual specialization is
obtained as the average value of the final solution in which the particular
specialization is used. Apart from this shared value of the fitness, it is also possible to
use other parameters apart from the shared fitness to evaluate the partial solution.
The fitness function for the full population is computed in terms of the time delay
value using the Webster‟s formula introduced in Chapter 3. Apart from the delay
value computed for the local intersection, the weighted communicated delay
126
information from the neighbouring agents is also used for computing the fitness value.
Equation (5.26) shows the final fitness function used.
(5.26)
is the fitness value corresponding to the Pth parent in the
Where
population, D is the delay computed at the intersection based on Webster‟s formula,
represents the delay experienced in the adjacent intersections and a is the number of
cycles during the current traffic simulation. The delay value obtained from the
incoming and outgoing intersection is weighted as only a certain portion of the traffic
enters the intersection. The weight value computed is based on the assumption of
equal distribution of delay among all the phases in the signal.
Rule 1
Rule 2
Flow
Rule 3
Queue
Change in
flow
Rule n
Green
Time
1/0 1/0 1/0 1/0 1/0
Figure 5.16. Structure of chromosome of the rule base cluster island
127
The fitness function of the partial solution of the membership function specialization
is derived as a function of the fitness value of the parent population and a similarity
measure used to compute the similarity between the evolved clusters of an input.
The similarity measure between two different clusters is obtained by constructing a
triangle between the mean value and the point of intersection of the line drawn at
membership value of 0.1 (Chosen arbitrarily) and the mean of the upper and lower
membership function. Euclidean distance measure is used for calculating the distance
between the axes points of the triangles constructed. The equations for computing the
similarity and distance between two triangles A and B is shown in (5.27) and (5.28).
(5.27)
(5.28)
It can be observed that as the distance between the triangles increases, the similarity
measure value decreases. Therefore by combining (5.26) and (5.27), the fitness of l-th
individual in the membership function specialization can be written as shown in
(5.29).
(5.29)
Where m represents, number of times the specialization was used in the complete
solution, A,B,C represents the triangles drawn corresponding to the clusters in the
input. Evolution of the GA to minimize this objective function would produce optimal
overlap of the cluster as well as a lower travel time delay.
128
Similarly, the fitness function corresponding to the rule specialization can be written
as shown in (5.30).
(5.30)
Where P is the number of active rules. When the GA is used to minimize this
objective function, number of rules required and the inputs that needs to be active in a
rule to produce optimal solution is obtained.
5.6.3.
Crossover
The reproduction process does not create new individuals but creates a population
from the parent population. New population is created by performing a crossover of
two specializations from the same island or cluster. This increases the exploration of
the search space and helps in creating new set of parent population. The parents are
for the crossover is selected by using binary tournament. One of the parent is selected
based on the result of binary tournament. The second parent is selected randomly
from the top half of the parents sorted according to the fitness value. One point
crossover method was employed to obtain the offspring. The point of crossover is
selected randomly.
5.6.4.
Mutation
Mutation is an operator whereby the allele of the chromosomes are altered randomly.
The real valued part in the membership function specialization cluster is altered by
adding a random Gaussian noise with mean zero and variance one. For the binary rule
specialization part, the binary value is changed from zero to one or vice versa.
Mutation should be used sparingly because it is a random search operator; otherwise,
129
with high mutation rates, the algorithm will become little more than a random search.
In our experiment, the mutation probability used was a fixed value of 0.01 .
5.6.5.
Reproduction
Reproduction is the process by which the offspring generated are copied to the parent
population. The fitness of the evolved offspring is computed and used in the selection
of the parent in the next generation. The lower half of the parent for the next
generation is selected based on a binary tournament. The top half of the parent for the
next generation is obtained as the best half of the parent population obtained by
sorting it based on the fitness value. This creates an elitism where the best performing
candidate is always retained.
The termination criteria used was the total number of generations. Main reason for
such a criteria is each evaluation takes a minimum of 40 seconds and a generation
consists of 100 evaluations (10 parents in each cluster). A Single training therefore
would require a minimum of 100 hours of simulation.
The evolved final network can be obtained as the combination of the best parent from
each of the cluster. This method is based on batch learning as the fitness value at the
intersection at the end of the each evaluation is used rather than the fitness value
actual time period where there is a variation in the green time of the traffic signal.
This is a major limitation as the average value does not truly reflect the traffic
condition and can overestimate the performance of solution that performs badly
during the peak traffic periods and extremely well during the non-peak hours. Other
limitation is the time taken for learning of parameters. This is a serious limitation and
130
cannot be used for real-time learning without high powered multithreaded computing
system.
5.7.
Q-LEARNING NEURO-TYPE2 FUZZY DECISION SYSTEM
The SET2 decision system developed in the previous section was used to evolve the
type-2 fuzzy decision system parameters and its rule base through online batch
learning process. The fuzzy system developed decides the optimal green time for a
phase in a cycle using the information of flow and queue collected during the phase in
progress. The traffic flow and queue value in the other phases were not used for
obtaining the green time. This might cause increased delay to be experienced in links
that do not have right of way during the current phase in progress. Another major
shortcoming with the SET2 decision system is the time required for evolving the type2 fuzzy network. A better online method that uses the current rewards received from
the environment to evaluate and modify the network needs to be developed.
Fuzzy logic systems are generally good at modelling controls based on the rules
mined from the expert opinions. However, they lack the ability to learn. The online
learning methods that have been developed in the recent past mostly modify only the
consequent part of the fuzzy system[118]. The antecedent part of the fuzzy system
and their membership functions are usually fixed or designed in an offline manner. In
order to perform online learning of all the parameters, a neuro-type2 fuzzy system has
been proposed.
The proposed neuro-fuzzy architecture utilises the advantages offered by neural
network in learning from data and the capabilities of type-2 fuzzy system. In order to
reduce the complexity, Takagi-Sugeno fuzzy sets[119] were employed instead of
131
using Mamdani representation[120]. In simple terms, instead of using a Gaussian
functions to represent the consequent part, crisp singleton constant values were used.
This greatly improves the mapping of antecedents to the consequents though the
interpretation or extraction of rules outside the decision system is difficult.
The general type-2 fuzzy system
(Figure 5.1) have a type-reducer before the
defuzzification part to obtain a crisp value. The type-reducer converts the type-2
fuzzy set into equivalent type-1 fuzzy set. This is usually performed in an iterative
manner or by using geometric methods. Both these methods require the consequents
to be sorted arranged in an ascending and descending order. When this type of
defuzzification is used in neural networks, a separate memory needs to be used to
store the correct mapping (before and after modifying the consequents)[121] and
additional overhead is associated with re-mapping the network. In order to avoid this
complexity, the type-reducer has been moved before the inference engine in the
proposed system. The type-reduction is then performed at the level of the antecedent
rather than at the consequents. This would greatly reduce the complexity associated
with constructing the neural network.
The type-reduction is performed as the weighted average of the upper and lower firing
strength calculated from the membership function for the specific input. This allows
the final defuzzification to be performed as a weighted average of the firing strength
instead of using centre-of-sets or centroid methods.
Training can be performed in an online manner using Q-learning [122]. Basics of Qlearning techniques were introduced in Chapter 2. In Q-learning, the best state-action
pair is evolved over time by actively adjusting the actions chosen for a specific state
132
to maximize the overall scalar reinforcement reward received over time. In
conventional learning techniques, the desired output for a specific input state is
assumed to be known a priori. The mean squared error value is then computed as the
squared difference between the actual output and the desired output. In traffic signal
timing optimization problem, the desired output is not known. Therefore adjusting the
values needs to be performed using just scalar reward value that does not indicate the
desired output but is indicative of performance of the system. Q-learning is useful in
such learning problems.
5.7.1.
Proposed Neuro-fuzzy decision system
The proposed neuro-type2 fuzzy decision system consists of seven layers as shown in
Figure 5.17.
Layer 1 is the input layer that takes in the values of flow, queue and rate of change in
flow. Each of the input is n- dimensional, where n refers to the number of phases.
Layer 2 is the fuzzification layer where the inputs are clustered into three regions.
Each cluster is modelled using type-2 gaussian function with fixed mean and lower
and upper bound sigma values. Therefore each node in layer 2 produces two outputs –
Lower and upper membership grade corresponding to the input.
Layer 3 is the type reducer (Figure 5.18), where the type-2 fuzzy input is converted
into equivalent type-1 fuzzy set. The synapse between layer 2 and layer 3 assigns the
weight corresponding to the lower and upper membership functions. The output of the
layer 3 can be represented as in (5.31)
133
(5.31)
Where c is the cluster index corresponding to the input, l represents the lower and u
represents the upper bound, j represent the input.
Layer 4 represents the rules used to control traffic. Each node in layer 4 represents a
rule. The output of the layer is the product of the firing value of the input from each of
the input. Since a product T-norm is used in the layer 4, it is essential to perform
normalization and is performed in layer 5.
The structure of the rule is significantly different from the rules used in the previous
decision systems as it includes inputs for all phases as shown:
“If flowphase1 is low and flowphase2 is low and.... flowphase n is low and Queuephase1 is low
and ......... rate of flowphase n is low then greenphase1 is 0.125 and ...greenphase n is 0.250”
The output from the layer 5 is the firing value of each rule. However, there is no
corresponding consequent assigned as is done in Mamdani system. The weights of the
134
∑
Low
Phase 1
Medium
High
Flow
Low
Phase n
GreenPhase 1
Medium
High
Low
Phase 1
Medium
GreenPhase n
High
Queue
Low
Phase n
Medium
High
-ve
Phase 1
Normal
+ve
Change
in flow
-ve
Phase n
Normal
+ve
Consequent values
Weight for
type-reducer
Fuzzification
Normalized rules
firing strength
Green time
Weight for
output
Q output
Figure 5.17. Structure of the proposed neuro- type2 fuzzy decision system (QLT2)
135
synapse connecting the layer 5 with layer 6 represents the consequents. Layer 6
consists of n nodes, where n corresponds to number of phase. The output of the layer
is calculated as weighted average as shown in (5.32).
(5.32)
Where m is the number of rules and i represents the output from layer 6. The final
layer consists of single node that provides Q value as the output. The Q-value is
represented as in (5.33)
(5.33)
The value of the mean, upper and lower bound sigma values and other weights
are adjusted using back propagation of the squared error value. Adaptive
learning rate was used during the back propagation as used in [123] The error is
calculated using (5.34)
(5.34)
(5.35)
Where R is the reinforcement received for taking a specific action in a state,
discount factor which is chosen as 0.1,
is the
is the Q value for the next state and
is the current Q value due to the action chosen in a specific state. Choosing a
lower value of discount factor ensures that the update is based on the latest computed
Q values. It is much easier to compute the maximum value of the Q if a tabular
structure is maintained. However, the state and action space are continuous, it cannot
be easily represented using tables and needs to be approximated using generalized
136
models like neural networks. Since the neural network is continuously trained to give
maximum value of Q, the max part in (5.35) can be removed and becomes similar to
SARSA algorithm. Eligibility trace was not included in this work as value assignment
is difficult.
The reward value or the reinforcement received from the road network at each
evaluation period is the inverse of sum of total delay experienced at an intersection
and discounted value of reward from adjacent intersections.
(5.36)
Where R is the reinforcement value at the current intersection,
is the ratio of
number of phases during which vehicle is released into the network to the number of
phases in the incoming agent intersection.
is the ratio of number of phases in the
current intersection during which vehicles are received by the intersection to the
number of phases at the intersection. 1000 is used to make the number large and any
positive value could be used here.
Input
Fuzzifier
Rule Base
Defuzzifier
Type Reducer
Crisp
value
Output
Inference Engine
Figure 5.18. Structure of type-2 fuzzy system with modified type reducer
The communication between adjacent agents is an essential component. Without
communication of the reward or reinforcement value of the adjacent intersection, the
137
signal green timing computed would cause higher delay. Communication of the Qvalues can improve the performance of the system to a great extent, however, it is
difficult as the correct state action pair must be known and would require a large
memory to store all the values. After the initial fine tuning, it is possible to start the
learning after every hour of simulation and keep the network updated.
5.7.2.
Advantages of QLT2 decision system
1. Better online learning capability than SET2 decision system.
2. Green time of all the phases are computed at the end of each cycle using a
single decision system instead of multiple fuzzy networks or by using a fuzzy
network in a sequential manner.
3. Lesser number of evaluations are required than SET2, therefore requiring
considerably lesser time to train the network than SET2 decision system.
5.8.
SUMMARY
In this chapter, four different types of decision systems were presented. Two of the
proposed agent systems were designed based on heuristics and historical traffic
volume. The rule base was developed using deductive reasoning technique. T2DR
decision system uses a weighted input approach for coordination between agents. The
decision system also uses iterative Karnik-Mendel algorithm based defuzzification
procedure. GFMAS uses an internal belief model based architecture where the
communicated data is an integral part of the decision system. Defuzzification is
performed by utilizing the geometric properties of the consequent. The other two
138
decision systems are adaptive in nature with online learning capability. SET2 uses
symbiotic evolutionary genetic algorithm to evolve the type-2 fuzzy system
parameters and rule base. The fitness function is shared between the neighbouring
agents. QLT2 decision system performs online adjustment of weights and parameters
of the neuro-fuzzy network using back propagation of the temporal difference error in
the computed Q-values. The reward value used to compute the error is obtained as the
summation of the discounted reward from the intersections connected to the agent.
139
CHAPTER 6
SIMULATION PLATFORM
This chapter explains in detail the simulation platform used as a test bed to evaluate
the performance of the proposed multi-agent architecture for urban traffic signal
control. The simulation platform designed is a large, complicated and realistic
problem that can replicate the real-world scenarios efficiently. The chapter details the
performance evaluation metrics used to correctly evaluate the performance of the
applied control algorithm. The chapter also explains the benchmarks used to compare
the proposed multi-agent controllers.
6.1
SIMULATION TEST BED
In order to experiment with different strategies for the application of multi-agent
systems for dynamic traffic management and to examine their applicability, a suitably
designed test bed is an essential requirement. The main requirements of the test bed
developed are the following [124].
The multi-agent traffic management system should be easily configurable.
Easy interpretation of the logic developed in traffic terms.
Ability to create simulated scenarios that best represents the real-world
conditions with minimum assumptions
Provide the necessary complexities associated with urban road
intersection, like, the limitations of intersection geometry, vehicle merging
etc.
140
Isolated intersections are commonly used to evaluate the developed traffic signal
controls. The main drawbacks of using isolated intersection are as follows:
The traffic input does not represent the real-traffic flow experienced at urban
road networks
The effect of variation in traffic flow in adjacent network is not captured
Effects of queue overflow are not represented
Platoon formation is not efficiently reproduced
These drawbacks critically affect the evaluation of traffic adaptive signal control
systems even though they are sufficient in evaluation of smaller traffic networks,
where the variation in traffic is limited. This necessitates the use of traffic networks
that are interconnected.
141
Figure 6.1. Layout of the simulated road network of Central Business District in
Singapore
Based on the above requirements, the traffic network selected to evaluate the
performance of the proposed multi-agent architecture is a section of the Central
Business District (CBD) area of Singapore shown in Figure 6.1. This section of the
road network is considered as one of the busiest sections of network that experiences
frequent traffic jams due to extreme variation in the traffic flow during the peak
periods and different day of the week. The section of the network faces extreme
traffic variations due to the presence of large number of commercial offices and
142
shopping centres. This causes the vehicle count to increase considerably during the
morning office hours and evening shopping periods well beyond the office timings.
This traffic network is significantly bigger compared to those developed in [107, 125128].
The section of the network consists of one-way links, two-way links, major and minor
priority lanes, signalled right and left turning movements, merge lane etc., that makes
CBD the perfect test bed to simulate all of the traffic conditions efficiently.
6.2
PARAMICS
The network was modelled using version 6.0 of PARAMICS software. PARAMICS
is a microscopic traffic simulation suite developed by Quadstone Ltd [129]. Figure 6.2
Shows a screenshot of the Paramics modeller software. The modeller software is used
to define the characteristics of the traffic network, their geometry, amount of traffic to
be simulated and the maximum capacity of the network. Paramics allows the traffic
process to be simulated on the level of the individual vehicles. The functionality of
the Modeller can be further enhanced using customized API. User defined plug-ins ,
written in C++ and compiled using Microsoft visual C++ compiler allows the users to
retrieve traffic simulation information from Paramics and send back the control
actions. Critical information like flow, queue and rate of flow can be obtained via the
loop detectors in real-time, while the simulation is running. The induction loop
detectors are coded in the simulated traffic network at the stop line of each
intersection.
143
Figure 6.2. Screenshot of Paramics modeller software
6.3
ORIGIN-DESTINATION MATRIX
Details of the simulated traffic network model is as follows:
Total number of nodes
130
Total number of links
330
Total number of zones
23
Total number of loop detectors
132
Total number of vehicle types simulated
16
Total number of agents
25
In order to reflect the true traffic conditions that exist in the urban traffic network of
Central Business District of Singapore, the traffic information needs to be collected
144
from the real network and converted into suitable origin-destination matrix. The
required data is collected from the Land Transport Authority of Singapore for three
separate days. LTA uses GLIDE which is a modified version of SCATS (Explained in
Chapter 3). A screenshot of SCATS traffic control is shown in Figure 6.3.
Figure 6.3. Snapshot of SCATS traffic controller and the controlled intersection
SCATS assigns a set of pre-determined signal plans to each intersection according to
the traffic condition experienced at the specified time period. Therefore the data
collected [108] from LTA consisted of the pre-determined signal plans used for each
of the intersection (Computed using the Webster‟s signal plan formulation), the order
of the signal plan usage during 24-hours over a period of three days, change in the
cycle length during the specific signal plan execution and the total number of vehicles
that have crossed the specific intersection at each lane during the execution of the
145
specific signal plan. The data is sampled at an interval of 15 minutes which is the
usual sampling time for the operation of SCATS. The nominal cycle length is
obtained as the average cycle length of the signals during the 15 minute sampling
period.
Figure 6.4. Origin-destination matrix indicating trip counts
146
2400
2200
Vehicle Count (Veh)
2000
1800
1600
1400
1200
1000
800
600
400
0
50
100
150
200
250
300
350
Time (Minutes)
Figure 6.5. Traffic release profile for a six hour, single peak simulation
Using the vehicle count data sampled over a period of three days, a origin-destination
matrix is constructed. The road networks usually possess input zones that forms a sink
and pool for the release and termination of vehicles. Number of vehicles released
from a start zone to the end zone is specified in the demand profile and hourly vehicle
release rate is specified in the demand profile which can be adjusted by using the
divisor (Figure 6.4).
The profile of the traffic can be adjusted using the divisor value in the profile editor.
Figure 6.5 Shows the typical traffic profile for a six hour, two peak simulation and the
vehicle release pattern. The modeller software ensures the number of vehicles
released into the network to be equal to the percentage of vehicles indicated in the
profile (Figure 6.6) from the specified demand matrix. The vehicles are released
randomly even though the vehicle count is pre-fixed for the release period (In our case
147
it was fixed at 15 minutes to be consistent with SCATS data sampling period). This
property reflects the stochastic nature of traffic witnessed in the real world.
Figure 6.6. Profile demand editor for a twenty four hour eight peak traffic simulation
6.4
PERFORMANCE METRICS
Performance of the proposed multi-agent signal control is evaluated based on two
measures.
1. Mean delay of vehicles
2. Mean speed of vehicles currently inside the network
6.4.1.
Travel Time Delay
Delay at each signalized intersection is computed as the difference between the actual
148
travel time of vehicles across the intersection and the travel time in case of no signal
control. The actual delay is calculated as the sum of the acceleration, deceleration and
stopped time delay for each vehicle at an intersection. In microscopic traffic
simulation platform like PARAMICS, the delay value calculated at each intersection
of the network at every time step of the simulation is stored in the memory, and an
average value of the time delay TAD is calculated as
n
TAD
T
i 1
Tv
D
(6.1)
where n is the number of intersections in the road network, TD is the time delay
experienced by vehicles at each intersection and Tv is the total number of vehicles that
entered and left the network during the measurement period. Delay parameter has
been widely adopted for characterizing traffic signal control schemes. In [130], it has
been proved that the average queue size at any intersection is directly proportional to
the average delay experienced by a vehicle inside the network, which makes average
time delay parameter a suitable entity to classify the congestion level at an
intersection. Another important work that showed the linear relationship of the
detector occupancy with the average delay was [131]. Moreover, the Highway
capacity manual (HCM2000)[132] uses the average control delay incurred by
vehicles at the intersection to classify the level of service offered by each signal
control. All of these aforementioned works justify suitability of using average delay
parameter to evaluate the performance of traffic signal control.
6.4.2.
Mean Speed
Current vehicle mean speed is the other parameter used for evaluation of performance
of the proposed architecture. It is essential to use the speed parameter along with the
149
delay parameter to avoid errors caused due to inaccuracy in the calculated travel delay
value and the number of vehicles inside the network [33, 62]. Current vehicles mean
speed is inversely proportional to delay. These two parameters reflect the overall
traffic condition in the road network and have been adopted in this work.
(6.2)
Where
is the average value of vehicle speed inside the network,
is the speed of
individual vehicles and n is the number of vehicles released into the network.
6.5
BENCHMARKS
It is extremely difficult to find a best suitable benchmark for an urban traffic network
with a large number of interconnected intersections. Some of the major factors that
affect the selection of the benchmarks are as follows:
Existing traffic signal control algorithms have been specifically developed for
different traffic networks with varying complexity.
The traffic patterns on which the networks have been tested are different from
those experienced in Singapore.
Most of the controllers were tested for isolated intersections.
Internal working of the commercially available traffic controllers like SCATS,
SCOOT and GLIDE are not known and are proprietary in nature.
The assumptions used are different limiting the application of the developed
signal control on the system.
For this specific reason Hierarchical Multi-agent System (HMS) traffic signal control
is used as a benchmark in this study. Working of the HMS has been explained in
150
detail in Chapter 3. The HMS have been specifically developed for the Central
Business District of Singapore ( Same as the network considered in this study) and
exactly same traffic scenarios, patterns and software has been used to evaluate the
performance. This reduces the requirement of recoding the algorithm that might
produce a sub-optimal performance than what was intended.
The other benchmark used is the version of GLIDE simulated in [33, 62]. Though it
has been shown that HMS performs better than simulated version of GLIDE in [33], it
would be appropriate to show a comparison here as it had been tested on the same
traffic network as the proposed signal control and for similar traffic patterns.
6.6
SUMMARY
This chapter describes the various implementation details concerning the modelling of
the urban traffic network using PARAMICS software. It details how the data
collected from Land Transport Authority of Singapore is converted into origindestination matrix that reflects the true traffic pattern experienced at the urban traffic
network. The chapter also details performance metrics and the benchmarks used to
evaluate the performance. The next chapter discusses the simulation results for all the
experiments that are carried out in this study to evaluate each of the proposed multiagent system developed.
151
CHAPTER 7
RESULTS AND DISCUSSIONS
In this chapter, details of the various experiments that were performed to evaluate the
performance of the proposed distributed multi-agent system based traffic signal
control developed in this study are detailed. Different types of simulation scenarios
were designed consistent with the tests performed in [108] that effectively tests the
robustness and effectiveness of the proposed multi-agent systems. Apart from the tests
in [108], lane closures and incidents were simulated to test the responsiveness,
stability and effect of the variation in capacity of the infrastructure on the proposed
multi-agent based traffic signal controls. An in-depth analysis of the simulation results
and the reasons for improved performance of the proposed traffic signal controls over
benchmark signal controls – HMS (Hierarchical Multi-agent Systems) and GLIDE
(Green Link Determine) are presented.
7.1.
SIMULATION SCENARIOS
Two types of simulation scenarios were used to evaluate the performance of the
proposed multi-agent based traffic control system. They are as follows:
1. Peak traffic scenarios
2. Events
While „peak traffic scenarios‟ simulate the traffic flow pattern observed in urban road
networks , „events‟ simulate the traffic scenarios where there is a change in capacity
of the road network.
152
7.1.1.
Peak Traffic Scenarios
Three types of peak traffic scenarios were used to evaluate the performance of the
multi-agent traffic signal controller. They are namely:
Six hour, two peak simulation
Twenty four hour, two peak simulation
Twenty four hour, eight peak simulation
The six hour, two peak traffic simulation is designed to test the response of the
proposed multi-agent traffic signal control to peak traffic conditions experienced
during the morning and afternoon period. This test was designed to verify the
efficiency of the proposed multi-agent systems to high traffic conditions experienced
in a short period of time. Twenty four, two peak simulation reflects the true condition
experienced at the Central Business District of Singapore during the morning and
evening periods. This scenario creates an infinite horizon problem that replicates the
increased stress level experienced by the road networks during peak periods. The final
traffic condition experimented is the twenty four hour, eight peak traffic condition.
This is an extremely fictitious scenario, where the urban traffic network is subjected
to extreme stress condition. Multiple peaks are placed close to each other that creates
an increased level of stress condition and allows to test the response of the proposed
signal controls to dynamic variation in traffic condition.
7.1.2.
Events
Two different type of events were simulated to verify the responsiveness of the
proposed multi-agent traffic signal control and the effect of variation in capacity of
153
the road network and are as follows:
Link and lane closures
Incidents and accidents
Link and lane closures simulate the conditions where the infrastructure is not
available for use due to pre-planned events or conditions. Incidents and accidents,
simulate conditions that cause restriction to traffic flow due to reasons not foreseen
and accounted for during signal timing optimization.
7.2.
SIX HOUR, TWO PEAK TRAFFIC SCENARIO
The two peak simulation is a typical traffic pattern, where the morning and afternoon
heavy traffic condition is simulated. The traffic release pattern and the number of
vehicles released into the road network during different time periods of simulation is
shown in Figure 7.1.
2400
2200
Vehicle Count (Veh)
2000
1800
1600
1400
1200
1000
800
600
400
0
50
100
150
200
250
300
350
Time (Minutes)
Figure 7.1. Vehicle release profile for a six hour, two peak traffic scenario
154
Figure 7.1 shows the approximate count of the vehicles released into the network
during the period of simulation. The actual traffic release is random and is divided
into regions of fifteen minutes period. The number of vehicles released during the
fifteen minute period is fixed using the origin-destination matrix.
Table. 7.1. Mean travel time delay and speed of vehicles for a six hour, two peak
traffic scenario
Total mean delay (sec per vehicle)
Current mean speed (kmph)
Control Techniques
1st peak
2nd peak
1st peak
2nd peak
QLT2
201
209
28
26.5
QLT1
222
218
20.75
23.08
SET2
209
213
23.3
25.16
GAT2
224
221
20.5
21.5
GFMAS
222
219
22.4
14.4
T2DR
224
222
20.32
17.92
HMS
400
470
11.2
12.8
GLIDE
500
600
8
4.8
Table 7.1. shows the comparison of results in terms of travel time delay and mean
speed of vehicles inside the road network for the proposed multi-agent signal controls
and the benchmarks at the end of peak traffic period. It can be seen that there is a
significant improvement in the delay experienced by vehicles during the peak traffic
time when using proposed multi-agent based traffic signal controls in comparison to
HMS and GLIDE. It can also be seen that for both HMS and GLIDE, the performance
degrades during the second peak period and can be attributed to the increased settling
time after the end of first peak period.
Among the proposed traffic signal controls, QLT2 performed the best followed by
SET2 signal control. GAT2 and QLT1 signal controls had a performance equivalent
to those of heuristics based fuzzy signal control. Table 7.1 indicates the point value of
155
delay experienced at the end of the peak traffic period. This value indicates the
performance of the signal control during the start of increase in vehicle count. For
proper analysis, this value might not be sufficient as the delay is experienced a short
period of time after the increase in traffic count. Hence the main reason for the
increase in travel time delay needs to be analyzed. Travel time delay is directly
proportional to the number of vehicles retained inside the network. Traffic count
information at the end of the peak period would not convey much information. Hence
an hourly analysis of the number of vehicles released and retained inside the network
is performed in Table 7.3.
156
Table.7.2. Total number of vehicles inside the network at the end of each hour of simulation for six hour, two peak traffic scenario
Simulation
time
Number of vehicles inside network
Total number of
vehicles released
from start
GAT2
GLIDE
GLIDE
HMS
GFMAS
T2DR
QLT2
QLT1
SET2
0:00:00
28
35
47
17
36
38
36
36
50
1:00:00
694
700
650
416
440
577
440
440
2:00:00
1671
1200
373
878
399
383
469
3:00:00
877
750
83
106
95
100
4:00:00
800
710
622
590
589
5:00:00
1850
1000
398
845
6:00:00
900
250
103
101
% Vehicles retained
HMS
GFMAS
QLT2
QLT1
SET2
GAT2
56%
70%
94%
72%
76%
72%
72%
5565
12%
13%
12%
8%
10%
8%
8%
538
14505
12%
8%
3%
3%
3%
3%
4%
69
79
20120
4%
4%
0%
0%
0%
0%
0%
539
426
440
25735
3%
3%
2%
2%
2%
2%
2%
394
377
429
512
34625
5%
3%
1%
1%
1%
1%
1%
91
86
93
76
40240
2%
1%
0%
0%
0%
0%
0%
157
Since the entire traffic flow generation mechanism is stochastic in nature, a
comparison between different control strategies is valid only when the number of
vehicles released into the road network at the end of specific time periods remains
close to each other. This condition is satisfied by using a fixed traffic profile and
origin-destination matrix for all simulations. Comparison is done with respect to the
best performing benchmark signal control HMS.
The results clearly indicate the effectiveness of proposed multi-agent based signal
controls in clearing traffic at intersections, and improving the current mean speed of
the vehicles. To prove the repeatability and robustness of the results, ten different
simulations were conducted for the scenario with different initial random seeds. The
standard deviation of the time delay for the ten simulation runs and the 90, 95 and 99
percentage confidence interval is calculated and is shown in Table 7.3. As can be seen
from Table 7.3, even for 99 percentile confidence interval, the delay value fluctuation
is restricted to a maximum of +10.7 seconds. The QLT2 signal controls showed the
lowest standard deviation and confidence interval value. T2DR signal controls
showed the highest variation. The main reason for this can be attributed to the
heuristic nature of the T2DR control and improved learning capability of QLT2. Even
though QLT1 controls have similar learning ability, the type-1 fuzzy system have
reduces the ability of the signal controls.
158
Table.7.3. Standard deviation and confidence interval of the mean travel time delay
for six hour, two peak traffic scenario
Simulation time
1:00:00
2:00:00
3:00:00
4:00:00
5:00:00
6:00:00
S.D
4.4
10.0
12.0
9.7
11.1
5.9
90% CI
2.3
5.2
6.2
5.0
5.8
3.1
95% CI
2.7
6.2
7.4
6.0
6.9
3.7
99% CI
3.6
8.1
9.7
7.9
9.0
4.8
S.D
5.2
8.8
13.1
10.1
12.9
7.0
90% CI
2.7
4.6
6.8
5.3
6.7
3.6
95% CI
3.2
5.5
8.1
6.3
8.0
4.3
99% CI
4.2
7.2
10.7
8.2
10.5
5.7
S.D
4.4
5.1
4.9
5.8
4.1
3.2
90% CI
2.3
2.7
2.5
3.0
2.1
1.7
95% CI
2.7
3.2
3.0
3.6
2.5
2.0
99% CI
3.6
4.2
4.0
4.7
3.3
2.6
S.D
3.7
4.8
5.9
4.8
5.3
4.2
90% CI
2.3
5.2
6.2
5.0
5.8
3.1
95% CI
2.7
6.2
7.4
6.0
6.9
3.7
99% CI
3.6
8.1
9.7
7.9
9.0
4.8
S.D
2.9
4.7
6.0
4.6
3.7
3.4
90% CI
1.5
2.4
3.1
2.4
1.9
1.8
95% CI
1.8
2.9
3.7
2.9
2.3
2.1
99% CI
2.4
3.8
4.9
3.7
3.0
2.8
S.D
3.1
4.7
4.9
3.4
3.7
2.9
90% CI
1.6
2.4
2.5
1.8
1.9
1.5
95% CI
1.9
2.9
3.0
2.1
2.3
1.8
99% CI
2.5
3.8
4.0
2.8
3.0
2.4
GFMAS
T2DR
QLT2
QLT1
GAT2
SET2
159
Figure 7.2 shows the average value of the time delay experienced by vehicles for a six
hour, two peak traffic profile applied to a traffic network implementing the proposed
multi-agent based traffic signal controls. Best results were obtained when traffic
signal control was based on multi-agent neuro-type2 fuzzy Q-learning (QLT2) traffic
signal controls. This particular behavior can also be observed in the mean speed of
vehicles inside the network shown in Figure 7.3. The QLT2 based agent system
improved the speed of vehicles during the peak period though all controls performed
equally well for low traffic conditions.
2400
250
2200
2000
1800
1600
150
1400
1200
100
QLT2
QLT1
SET2
GAT2
GFMAS
T2DR
50
1000
800
600
0
0
30
60
90
120
150
180
210
240
270
300
Vehicle Count (veh)
Time Delay (Seconds)
200
330
400
360
Time (Minutes)
Figure 7.2. Mean travel time delay of vehicles for six hour, two peak traffic scenario
160
QLT2
QLT1
SET2
GAT2
GFMAS
T2DR
45
2500
2250
2000
1750
35
1500
30
1250
25
1000
Vehicle Count (Veh)
Mean Speed (kmph)
40
20
750
15
500
0
50
100
150
200
250
300
350
Time (Minutes)
Figure 7.3. Average speed of vehicles inside the network for six hour, two peak traffic
scenario
For better understanding, a graphical representation of the number of vehicles retained
inside the network during various instances of the simulation is shown in Figure 7.4.
It is also necessary to highlight at this point that the speed comparison shown in
Figure 7.3. is a sampled version of the real speed data. The actual data fluctuates to a
great extent and cannot be presented in a single plot. The actual data is shown in
Figure 7.5.
161
QLT2
QLT1
SET2
GAT2
GFMAS
T2DR
1200
2200
2000
1800
800
1600
600
1400
1200
400
1000
Vehicle Count (Veh)
Vehicles inside network (veh)
1000
2400
800
200
600
0
400
0
50
100
150
200
250
300
350
Time (Minutes)
Figure 7.4. Total number of vehicles inside the road network for six hour, two peak
traffic
45
QLT2
Mean Speed (kmph)
40
35
30
25
20
15
0
50
100
150
200
250
300
350
Time (Minutes)
Figure 7.5. Actual mean speed of vehicle inside the road network
162
Table 7.4. shows the percentage improvement in the observed results over HMS
signal control. It can be seen that QLT2 signal controls performed the best followed
closely by SET2 signal controls. The GFMAS and T2DR performed better than HMS
but the control was not smooth and fluctuated to a large extent as shown in Figure 7.2
and 7.3. The online trained QLT1 signal control performed well, however, the
performance was not as comparable to QLT2 control.
Table.7.4.Percentage improvement over HMS signal control
Percentage improvement over HMS
Control Technique
7.3.
Travel time delay
Mean speed
QLT2
52.8
75
QLT1
49.4
52
SET2
51.4
66
GAT2
48.8
58
GFMAS
46
48
T2DR
45.1
46.9
TWENTY FOUR HOUR, TWO PEAK SCENARIO
For the typical twenty four hour scenario with morning and afternoon peaks, ten
simulation runs with different random seeds were conducted for each of the six
proposed multi-agent traffic signal controls. The typical traffic release profile is
shown in Figure 7.6. The average value obtained in these simulation runs were used to
evaluate the performance of the developed signal controllers.
163
2400
2200
Vehicle Count (veh)
2000
1800
1600
1400
1200
1000
800
600
400
0
200
400
600
800
1000
1200
1400
Time (Minutes)
Figure 7.6. Vehicle release traffic profile for twenty four hour, two peak traffic
condition
300
2000
200
1500
150
QLT2
QLT1
SET2
GAT2
GFMAS
T2DR
100
50
1000
Vehicle Count (veh)
Time Delay (Seconds)
250
500
0
0
200
400
600
800
1000
1200
1400
Time (Minutes)
Figure 7.7. Total mean delay of vehicles for twenty four hour, two peak traffic
condition
164
Figure 7.7 shows the comparison of the mean travel time delay experienced by
vehicles in response to a twenty four hour, two peak traffic simulation. QLT2 signal
control performed the best under this simulation scenario. The second best
performance was that of SET2 signal control. However, it was observed that during
second peak period, its performance was equal to GAT2 signal control. This
observation is supported by the variation in mean speed of vehicles seen in Figure 7.8.
The variation in speed clearly indicate a superior performance of QLT2 and low
performance of T2DR signal control in comparison to all other proposed multi-agent
signal controls.
45
2500
40
Mean Speed (kmph)
30
1500
25
20
1000
QLT2
SET2
GAT2
QLT1
GFMAS
T2DR
15
10
0
200
400
600
800
1000
1200
Vehicle Count (veh)
2000
35
500
1400
Time (Minutes)
Figure 7.8. Average speed of vehicles inside the network for twenty four hour, two
peak traffic scenario
It can be seen from Figure 7.9, there is a considerable amount of fluctuation in the
number of vehicles retained inside the road network at the beginning of the second
peak period. This particular phenomenon was also observed in the traffic simulation
165
scenario simulated in [108]. This was not seen prominently in HMS as the signal
control was not able to reduce the number of vehicles after reaching the peak value
and continued to have a flat profile.
1200
QLT2
QLT1
SET2
GAT2
GFMAS
T2DR
800
2000
1500
600
1000
400
Vehicle Count (veh)
Vehicles inside network (veh)
1000
200
500
0
0
200
400
600
800
1000
1200
1400
Time (Minutes)
Figure 7.9. Vehicles inside the network for a twenty four hour, two peak traffic
simulation scenario
For a more detailed analysis, Table 7.5. shows the comparison of the travel time
delay, speed and number of vehicles retained inside the network at the end of the peak
periods. QLT2 performs considerably well in comparison to other multi-agent models
followed closely by SET2. 47% improvement was observed in the mean time delay
and 84% in the speed when using QLT2 over HMS signal control. The heuristically
designed T2DR signal control also produced better results than online trained HMS
signal control and can be observed in Table 7.6. The main reason for superior
performance is due to the fact that the rule base was specifically selected by
166
comparing the delay with HMS signal control. The performance of SET2 signal
control degraded during the second peak period. This is mainly because of the traffic
pattern used for training the signal control.
SET2 signal control was not able to capture the dynamics of the traffic sufficiently.
Further, the green time of each phase was calculated at the end of the phase instead of
the end of cycle time. This increases the delay as the vehicles in lanes without right
of way are made to wait for longer period. It can also be seen that all proposed
controllers performed well without large fluctuations as the change in traffic count is
slower in this traffic simulation scenario.
Table 7.5. Comparison of mean delay, speed and number of vehicles for twenty four
hour, two peak traffic scenario
Control Technique
Total mean delay
per vehicle)
(sec
Current mean speed (kmph) Number of vehicles
inside network (Veh)
1st peak
2nd peak
1st peak
2nd peak
1st peak
2nd peak
GLIDE
600
430
9.6
8
1800
1650
HMS
420
340
12.8
12.8
1200
1050
QLT2
208.5
193.5
28.1
28.7
418
514
QLT1
250.4
207.8
18.6
24.9
744
533
SET2
222.4
198.7
22.9
27.1
463
538
GAT2
229.5
201.2
22.1
28.5
475
550
GFMAS
256.7
212.5
17.8
24.2
703
549
T2DR
274
221
15.8
22.3
905
619
167
Table 7.6. Percentage improvement of travel time delay and speed over HMS control
for twenty four hour, two peak traffic scenario
Percentage improvement over HMS
Control Technique
Travel time delay
Mean speed
QLT2
47.1
84
QLT1
39.7
52
SET2
44.6
71
GAT2
43.3
59
GFMAS
38.4
41
T2DR
35
29
The standard deviation and the confidence interval calculated using ten different
random seeds (Table 7.7) was well within the permissible limits indicating a good
repeatability and responsiveness to varying traffic loads. T2DR signal control
performed the worst in comparison to all other proposed multi-agent controls and
QLT2 signal control performed the best with a standard deviation of +2.9 seconds for
the ten simulation runs. It can also be seen that substantial improvement in the
standard deviation values and confidence interval was observed during the second
peak period. This is mainly due to the ability of the signal control to adapt to
repetitive peak traffic conditions.
168
Table 7.7. Standard deviation and Confidence interval for a twenty four hour, two
peak traffic mean travel time delay
Simulation period
1st peak
2nd peak
SD
5.1
6.2
90% CI
2.7
3.2
95% CI
3.2
3.8
99% CI
4.2
5.1
SD
4.3
3.2
90% CI
2.2
1.7
95% CI
2.7
2.0
99% CI
3.5
2.6
SD
2.9
2.5
90% CI
1.5
1.3
95% CI
1.8
1.5
99% CI
2.4
2.0
SD
3.9
5.1
90% CI
2.0
2.7
95% CI
2.4
3.2
99% CI
3.2
4.2
SD
3.8
2.9
90% CI
2.0
1.5
95% CI
2.4
1.8
99% CI
3.1
2.4
SD
4.6
2.1
90% CI
2.4
1.1
95% CI
2.9
1.3
99% CI
3.7
1.7
T2DR
GFMAS
QLT2
QLT1
SET2
GAT2
169
7.4.
TWENTY FOUR HOUR, EIGHT PEAK TRAFFIC SCENARIO
Twenty four hour, eight peak scenario is the extreme traffic pattern simulation stress
test to verify the integrity, robustness, and responsiveness of the signal control when
subjected to repetitive high traffic condition within a short interval of time. The traffic
release pattern is shown in Figure 7.10.
2400
2200
2000
Vehicle Count (veh)
1800
1600
1400
1200
1000
800
600
400
0
200
400
600
800
1000
1200
1400
Time (Minutes)
Figure 7.10. Twenty four hour, eight peak traffic release profile
170
Table 7.8. Travel time delay of vehicles at the end of peak period for twenty four
hour, eight peak traffic scenario
Total mean delay (sec per vehicle) in a period
control
technique
1st peak
2nd peak 3rd peak
4th peak
5th peak
6th peak
7th peak
8th peak
HMS
400
450
450
450
450
500
650
700
GLIDE
400
500
600
650
800
1500
2300
3200
T2DR
242
223
253
243
247
240
247
242
GFMAS
235
243
239
240
242
242
243
240
QLT2
196
207.4
205.8
207.4
206.7
208.8
210.9
210.1
QLT1
203.2
206.9
208.1
210.7
212.2
212.1
216.3
218.2
SET2
204.8
211.5
213.5
212
211
210.8
211.1
212.5
GAT2
195.7
200.3
208.5
213.4
215
216.8
215.4
214.6
It can be seen from Table 7.8 and 7.9, HMS and GLIDE signal control performance
starts to degrade after the fifth traffic peak period. The main reason for this
degradation is the inability to clear the vehicles present inside the network within a
short duration of time before the start of next high traffic period and can be observed
from Table 7.10. GLIDE signal control adopts a pre-specified value for the cycle
length during the off-peak periods and only changes it to another pre-specified value
during the peak period.
This makes the green time allocated to each phase
insufficient to clear the vehicles when the traffic input increases, resulting in queue
spillback. HMS [14][32] signal control also starts showing degradation in
performance after the sixth peak period, which is due to the increase in number of
vehicles waiting for green time.
171
Table 7.9. Total mean speed of vehicle inside the network for twenty four hour, eight
peak traffic scenario
Mean speed of vehicles (kmph)
control
technique
1st peak 2nd peak 3rd peak 4th peak 5th peak 6th peak 7th peak 8th peak
HMS
11.2
16
22.4
12
9.6
6.4
6.4
6.4
GLIDE
11.2
8
8
8
0
0
0
0
T2DR
20.8
22.5
18.8
23
11.36
23
12.6
23
GFMAS
21.8
27.2
27.1
23
24.2
20.4
20.6
22.7
QLT2
22.5
28.596
29.496
27
27.696
26.496
26.196
29.904
QLT1
24.996
25.896
27.6
21.696
24.804
30.996
22.404
27.396
GAT2
23.904
29.004
21.204
21.9
25.704
30.3
25.296
26.4
SET2
28.296
29.796
23.7
24.996
27.396
23.004
25.104
28.8
Table 7.10. Vehicles inside the network for twenty four hour, eight peak traffic
scenario
Peak traffic periods
Vehicles inside
the network
1
2
3
4
5
6
7
8
HMS
2000
1500
1400
1800
2500
3300
3000
3100
T2DR
700
890
710
819
853
790
793
808
GFMAS
681
848
701
781
819
754
762
798
QLT2
427
436
373
441
398
379
509
464
QLT1
379
401
387
407
330
481
347
386
SET2
443
466
564
556
507
441
441
394
GAT2
401
450
444
419
427
460
440
444
172
The degradation in performance can be attributed to the conflict in decisions between
agents of different hierarchy causing smaller green time to be allocated to congested
intersections. Therefore the number of vehicles leaving the simulated section of the
network increases as the vehicle input is kept a constant based on the OD matrix and
can be seen in Table 7.10. However, results show that the proposed GFMAS signal
control does not undergo any degradation and performs better than both GLIDE [13]
and HMS [19][32]. All of the proposed multi-agent based signal control was able to
handle the increased influx of traffic flow without major degradation in their
performance.
QLT2 signal control was the best performing multi-agent system in this traffic
scenario. Interestingly, SET2 signal control performance started to degrade after the
second peak period and was closer to QLT1. On other hand, GAT2 signal control was
able to bring down the time delay and improve the speed of vehicles to a greater
extent almost equalling the performance of QLT2.
For better understanding of the fluctuations and variations during different simulation
runs for the same traffic condition, an analysis of the standard deviation and
confidence interval of the observed time delay for every two hours of simulation is
presented in Table 7.11. Predictably, GFMAS and T2DR showed large fluctuations in
the standard deviation. This indicates that the variation in the observed time delay
with different starting points produced varying results.
173
Table 7.11. Standard deviation and confidence interval of travel time delay for twenty
four hour, eight peak traffic simulation
Simulation time (hours)
period
2
4
6
8
10
12
14
16
18
20
22
24
S.D
4.5
8.1
7.6
6.8
5.1
6.4
6.2
6.9
7.3
6.8
6.3
6.0
90% CI
3.3
6.0
5.6
5.0
3.7
4.7
4.5
5.1
5.4
5.0
4.6
4.4
95% CI
4.0
7.1
6.7
5.9
4.5
5.6
5.4
6.1
6.4
6.0
5.5
5.3
99% CI
5.2
9.3
8.8
7.8
5.9
7.4
7.1
8.0
8.4
7.9
7.2
6.9
S.D
5.2
9.3
8.3
7.2
6.1
7.2
6.1
6.5
7.6
6.9
6.1
5.8
90% CI
2.7
4.8
4.3
3.8
3.2
3.8
3.1
3.4
4.0
3.6
3.2
3.0
95% CI
3.2
5.8
5.1
4.5
3.8
4.5
3.7
4.0
4.7
4.3
3.8
3.6
99% CI
4.2
7.6
6.8
5.9
5.0
5.9
4.9
5.3
6.2
5.6
5.0
4.7
S.D
4.2
4.1
5.3
3.9
2.1
2.6
3.8
3.9
5.1
4.7
4.9
4.9
90% CI
2.2
2.1
2.8
2.0
1.1
1.4
2.0
2.0
2.7
2.4
2.5
2.5
95% CI
2.6
2.5
3.3
2.4
1.3
1.6
2.4
2.4
3.2
2.9
3.0
3.0
99% CI
3.4
3.3
4.3
3.2
1.7
2.1
3.1
3.2
4.2
3.8
4.0
4.0
S.D
3.9
2.7
3.9
3.4
3.4
4.1
4.8
5.4
2.9
2.8
4.1
3.7
90% CI
2.0
1.4
2.0
1.8
1.8
2.1
2.5
2.8
1.5
1.5
2.1
1.9
95% CI
2.4
1.7
2.4
2.1
2.1
2.5
3.0
3.3
1.8
1.7
2.5
2.3
99% CI
3.2
2.2
3.2
2.8
2.8
3.3
3.9
4.4
2.4
2.3
3.3
3.0
S.D
2.4
5.2
4.5
3.8
3.3
4.1
2.7
4.3
4.8
4.4
5.1
2.2
90% CI
1.2
2.7
2.3
2.0
1.7
2.1
1.4
2.2
2.5
2.3
2.7
1.1
95% CI
1.5
3.2
2.8
2.4
2.0
2.5
1.7
2.7
3.0
2.7
3.2
1.4
99% CI
2.0
4.2
3.7
3.1
2.7
3.3
2.2
3.5
3.9
3.6
4.2
1.8
S.D
3.1
4.9
5.5
5.2
4.8
3.9
3.3
3.3
4.7
4.2
4.2
3.8
90% CI
1.6
2.5
2.9
2.7
2.5
2.0
1.7
1.7
2.4
2.2
2.2
2.0
95% CI
1.9
3.0
3.4
3.2
3.0
2.4
2.0
2.0
2.9
2.6
2.6
2.4
99% CI
2.5
4.0
4.5
4.2
3.9
3.2
2.7
2.7
3.8
3.4
3.4
3.1
GFMAS
T2DR
QLT2
QLT1
SET2
GAT2
174
On the whole, a 59% improvement in the time delay and 71% improvement in the
speed of vehicles were observed for QLT2 signal controls over HMS. Least
improvement was observed in T2DR controls. It is interesting to note that GAT2
signal control performed better than SET2 signal control. Table 7.12 shows the
percentage improvement in reducing the travel time delay and increasing mean speed
of vehicles inside the network over HMS signal control.
Table 7.12. Percentage improvement of travel time delay and mean speed over HMS
signal control
Percentage improvement over HMS
Control Technique
Travel time delay
Mean speed
QLT2
59.2
71
QLT1
57.3
66
SET2
54.4
62
GAT2
58.2
70
GFMAS
51
59
T2DR
49.9
57
Figure 7.11 and 7.12 shows the average value of time delay and speed experienced
during simulation runs with different random seeds respectively. A distinct lag
between the travel time delay and the improvement in speed can be observed. Further,
QLT2 and GAT2 shows a low level of fluctuation in mean speed of vehicles
indicating a smooth and consistent control. T2DR shows the largest fluctuation and
can be deduced as the least performing signal control among the proposed multi-agent
systems.
175
2250
250
2000
1750
1500
150
1250
QLT2
QLT1
SET2
GAT2
GFMAS
T2DR
100
50
1000
vehicle count (veh)
Time Delay (Seconds)
200
750
500
0
0
200
400
600
800
1000
1200
1400
Time (Minutes)
Figure 7.11. Total mean delay experienced for a twenty four hour, eight peak traffic
scenario
QLT2
QLT1
SET2
GAT2
GFMAS
T2DR
2250
40
2000
35
1750
1500
30
1250
25
1000
20
750
15
500
0
200
400
600
800
1000
1200
Vehicle Count (veh)
Speed (kmph)
45
2500
1400
Time (Minutes)
Figure 7.12. Mean speed of vehicles for twenty four hour, eight peak traffic scenario
176
QLT2
QLT1
SET2
GAT2
GFMAS
T2DR
2000
800
1500
600
1000
400
Vehicle Count (veh)
Vehicles inside network (veh)
1000
200
500
0
0
200
400
600
800
1000
1200
1400
Time (Minutes)
Figure 7.13. Number of vehicles inside the network for twenty four hour, eight peak
traffic scenario
7.5.
LINK AND LANE CLOSURES
Link and lane closures are of practical importance and recreate the conditions of road
blockage due to pre organized or planned events such as clearance of roadside trees,
scheduled maintenance of traffic management systems like VMS (Variable message
signs), ERP ( Electronic road pricing units) maintenance, or special events like
“Formula One car racing”.
Planned events reduces the capacity of the traffic
infrastructure. Total traffic handling capacity of the road network was reduced by
closing down lanes in the road structure. This scenario is similar to step input in
control system.
The average travel time delay of vehicles when experiencing multiple peak traffic
scenario with two major lane closure is shown in Figure 7.14. There is a shift in the
177
total travel time delay and a increase in the delay during the start period. The main
reason for this is the routing mechanism used in PARAMICS. The routing is usually
performed by calculating the feedback costs associated with each link network wide
for every pre-specified interval (five minutes in this case). This causes the vehicles to
opt for major links thereby achieving better optimization and lesser travel delay time.
This particular effect is not prominently witnessed when only a single lane is closed.
This behaviour can be seen in Figure 7.15. This experimentation proves the reliable
performance of the proposed signal controls to the change in infrastructure capacity.
All the plotted results are the average of ten simulation runs.
Time Delay (Seconds)
250
200
150
100
GAT2
SET2
QLT1
GFMAS
QLT2
50
0
0
200
400
600
800
1000
Time (Minutes)
Figure 7.14. Two lane closure – Mean travel time delay of vehicles
178
300
Time delay(Seconds)
250
200
150
GFMAS
GAT2
SET2
QLT1
QLT2
100
50
0
0
200
400
600
800
1000
Time (Minutes)
Figure 7.15. Single lane closure – Mean travel time delay of vehicles
7.6.
INCIDENTS AND ACCIDENTS
Incidents and accidents simulation try to replicate the random reduction in the
infrastructure capacity due to unforeseen reasons like vehicle crash, that cannot be
controlled. Incidents were created randomly during the first peak period of the traffic
simulation using the incidents file in PARAMICS. Responsiveness of the proposed
multi-agent traffic signal controls to such intermittent disturbances, which are
analogous to impulse inputs, were studied using this scenario. The detection and
clearance of the incidents is assumed to be handled by different sub-units of the traffic
management system which do not come under the multi-agent signal control
architecture currently.
Figure 7.16-7.17 shows the various scenarios simulated for this study. The incidents
were created during the first peak period to examine the response and time taken to
179
settle down. The incidents were simulated in the link connecting Victoria St with
Rochor Road, and in the link connecting Rochor Road to Bencoolean St. as indicated
in the network map in chapter 6. It can be observed that two simultaneous incidents at
different links cause the proposed multi-agent traffic signal control performance to
degrade. Although the incidents were simulated very close to the peak traffic period,
the increase in the traffic is considerably lesser and the proposed multi-agent traffic
signal controls were able to effectively handle the increased traffic congestion, and
bring down the average delay experienced by the vehicles.
GFMAS experienced the largest fluctuation in travel time delay. QLT2 signal control
showed the least variation in the travel time delay. The behaviour of all signal
controls were on expected lines as seen in the twenty four hour eight peak traffic
simulation scenario. The only difference was the increased traffic experienced at the
start period due to incident. The test conclusively proves the ability of the proposed
multi-agent traffic signal controls in handling unexpected change in the traffic pattern.
180
250
Time Delay (Seconds)
200
150
QLT2
QLT1
SET2
GAT2
GFMAS
100
50
0
200
400
600
800
1000
Time (Minutes)
Figure 7.16. Single incident simulation – Multiple peak traffic scenario
350
Time Delay (Seconds)
300
250
200
150
QLT2
QLT1
SET2
GAT2
GFMAS
100
50
0
0
200
400
600
800
1000
Time (Minutes)
Figure 7.17. Two incidents simulation – Multiple peak traffic scenario
In conclusion, QLT2 signal control performed the best under all traffic simulation
scenarios, both for the regular traffic simulation scenarios and events. The
181
performance could be attributed to the online learning feature of the QLT2 signal
control. Though QLT1 signal controls also have online learning capabilities, their
performance were not equivalent to QLT2 because of the absence of type-2 fuzzy
decision system.
Table 7.13. Comparison of the proposed signal control methods with HMS in terms of
computation and communication
Control Method Computational Complexity Computational Cost Communication Overhead
QLT2
Low
Low
Low
SET2
High
Low
Low
GFMAS
High
High
Low
T2DR
High
High
Low
The batch learning multi-agent systems SET2 and GAT2 did not perform well
because the learning was based on the average fitness function values computed over
a period of three hours. This dilutes the difference in fitness function between various
multi-agent signal controls and reduces the ability of the system to handle the
variation in traffic during lower peak period. This increases the total baseline travel
time delay experienced by the vehicles and an overall shift in the value. SET2
performed better than GAT2 because of the increased coverage of state action space
than GA. The evolution of membership functions parameters and rule base as two
different individual entities in SET2 ensures better exploration of state space. Since
the fitness function for the individual population is shared, it allows co-evolution of
the parameters. This is the main reason for the improved performance over HMS.
182
HMS signal control used an actor-critic based reinforcement learning. Actor-critic
network are sensitive to exploration and requires the selected actions to be randomly
perturbed to improve performance. However, the developed system did not have any
such feature to improve exploration. The second disadvantage is the policy selected
must be kept fixed for proper learning. However, the rule base was evolved using
genetic algorithm removing the reference point for critic learning. The back
propagation method used in HMS selectively updated only active rules based on a
threshold value. Improper selection of threshold prevents weight values and
parameters to be updated. These are few of the reasons why the proposed methods
were able to perform better than the HMS signal control method. Table 7.13 shows a
comparison of the performance of the proposed traffic signal controls over HMS
signal control in terms of computational complexity, computational cost and
communication overhead. QLT2 signal control has the lowest in all of these
parameters as the control uses a single iteration level each time to decide the actions
and learning whereas in SET2 separate memory is required to store the fitness values
during the entire period of training pattern inputs.
7.7.
SUMMARY
In this chapter, the various traffic simulation scenarios as well the results obtained
have been presented and analyzed in detail. A comparative analysis of the results
show the better performance of the proposed multi-agent traffic signal control
methods over HMS traffic signal control. The reasons for such a marked improved
have been presented and discussed. From the results, it was observed that the online
learning based systems outperform heuristically designed controls. Among the
learning systems, QLT2 performed the best. The following chapter will draw a
183
conclusion to this dissertation and explore the open research avenues for future
research work.
184
CHAPTER 8
CONCLUSIONS
This chapter concludes the dissertation and provides recommendations for the future
research that could enhance the functionality of the proposed multi-agent system.
8.1.
OVERALL CONCLUSION
The objective of this thesis was to develop a distributed, multi-agent based approach
to traffic signal timing optimization and control. The choice for a distributed approach
was motivated by the fact that a centralized traffic control approach is often not
feasible due to computational complexity, communication overhead, and lack of
scalability.
The creation of a distributed, multi agent approach requires the subdivision of the
traffic control problem into several loosely coupled sub-problems, such that the
combination of all the solutions of the sub problems together provide an approximate
solution to the original traffic control problem.
In the multi-agent framework proposed in this dissertation, each agent located at the
intersection tries to optimize the green timing of the intersection rather than the whole
network with an objective to minimize the travel time delay and increasing mean
speed of vehicles inside the road network. In order to perform this,
four
computational intelligent decision systems have been proposed. Type-2 fuzzy sets
was used as the main component of the intelligent decision system. The ability of
type-2 fuzzy sets to handle the level of uncertainties associated with the data and
185
stochasticity associated with the dynamic environment, it is an ideal candidate for use
in traffic signal timing optimization.
Two of the proposed decision system (T2DR and GFMAS) were designed based on
heuristics and the rule base for the type-2 fuzzy sets were obtained by deductive
reasoning. This approach performed reasonably well during the high traffic
conditions, however, the performance degraded when subjected to a high stress traffic
condition.
Third proposed decision system (SET2) exhibits better adaptation than those designed
using heuristic methods. It used online batch learning method to adapt the parameters
of the type-2 fuzzy sets and at the same time evolve the fuzzy rules. Stochastic
optimization technique using symbiotic evolutionary genetic algorithm was able to
evolve the parameters better than the traditional GA approach. The cooperative coevolutionary approach based on fitness sharing between clusters and the neighbouring
agents was able to provide better results compared to GA with fitness sharing.
The last proposed decision system was an online learning neuro-type2 fuzzy system
whose parameters were adapted every evaluation period unlike the SET2, where the
parameters were updated after the completion of a simulation run. The update is based
on the objective to maximize the overall reward received by an agent using back
propagation technique. The method also combined decision system for all the phases
into a single network unlike the other three approaches. This considerably improved
the performance over all other proposed multi agent systems and the benchmark
multi-agent system.
186
8.2.
MAIN CONTRIBUTIONS
The main contributions of this research were in the conceptualization, development
and application of a distributed multi-agent architecture to urban traffic signal timing
optimization problem. The significant contributions made in the design front are as
follows.
The development of a generalized distributed multi-agent framework with
hybrid computational intelligent decision making capabilities for
homogeneous agent structure. The modular concept used in the design
allows the reuse of components without major modifications to its internal
structure.
The development of deductive reasoning method for the construction of
membership functions, rule base of type-2 fuzzy sets and calculating the
level of cooperation required between agents. Manual clustering of the
data and fine tuning of the rule base created using expert knowledge
through trial and error method to achieve lower travel time delay and
improved mean speed of vehicles inside the road network.
The development of cooperation strategies in multi-agent system through
internal belief model by incorporating communicated neighbour agent
status information. Two different structures with communicated neighbour
status data as an integral part of decision system and as an auxiliary input
external to the decision system were experimented.
187
The development of symbiotic evolutionary learning method for
coevolving membership functions and rule base for the type-2 fuzzy
decision system. Modified the general symbiotic evolutionary method to
coevolve the cluster mean and spread along with the number of rules and
significant inputs in each rule. Comparison with genetic algorithm based
evolution showed an improved performance while using modified
symbiotic evolutionary learning for evolving parameters of type-2 fuzzy
sets.
The development of modified Q-learning technique with shared reward
values for solving distributed urban traffic signal control problem.
Adapted the general Q-learning method to a distributed problem by
sharing the reward values to improve the global view and prevent
premature convergence.
The development and relocation of the modified type-reducer using neural
networks to reduce the computational complexity associated with sorting
and defuzzification process in interval type-2 fuzzy sets.
The development of traffic simulation scenarios to test the reliability and
responsiveness of the developed traffic signal controls.
8.3.
RECOMMENDATIONS FOR FUTURE RESEARCH WORK
Considerable amount of work has been done by researchers in the area of multi agent
systems application to traffic control. However, a solid multi agent framework with
hybrid computational intelligent techniques haven‟t been developed. Most of the
188
systems developed exhibits only partial or weak agency. Further, the field of multi
agent system by itself is a relatively new field with a lot of open avenues for research.
Some of the recommendations for future research work are given below.
The proposed multi agent architecture was designed specifically for the urban
traffic signal control problem. However, there are many other applications that
are similar to traffic control problem and have similar restrictions. Network
packet routing, ATM networks are examples of such similar systems. In order
to effectively use the proposed multi agent system for such application, it is
essential to generalize the framework and create standard templates that can be
easily embedded into the custom codes.
In this dissertation, the offset timing and direction of coordination were kept
fixed. The main reason is the non-availability of the network wide
performance information. For improving the performance further, a distributed
method to obtain the offset value must be developed. In HMS, the offset
adjustment was possible because of hierarchical nature of the system and
regional control agents had a better view of a section of the network.
In the proposed multi agent architecture, the protocol used was similar to
FIPA protocol but not all the functionalities were included. For example,
service request and acknowledgement were not used as the agents were
homogeneous and had the same functionality with no delegation of duty to
adjacent agents. However, to connect to legacy systems used in traffic signal
control all the functionalities needs to be introduced.
189
Parallel evaluation of multiple solution of an agent must be developed using
multithreading feature. In the current architecture, the multithreading or
parallelization is at the level of agent and not used in the internal evaluation.
This is essential to test multi agent system for applications with rapid changing
environment.
The Q-learning approach implemented in our study communicated or passed
reinforcement or reward values among the agents. This is a scalar quantity and
provides very little direction towards optimal solution. Communicating the
value function or Q-values would improve the performance to a great extent.
However, the challenge is in storing the state action pair values for the
continuous input and perform update in a distributed manner.
190
LIST OF PUBLICATIONS
JOURNALS
1. Balaji P.G and D. Srinivasan, “Type-2 fuzzy logic based urban traffic management,”
in Engineering Applications of Artificial Intelligence journal,vol.24, no.1, 2011.
2. Balaji P.G and D. Srinivasan, “Distributed Geometric Fuzzy Multi-agent Urban
Traffic Signal Control,” in IEEE Transactions on Intelligent Transportation Systems,
vol.11, no.3, pp.714-727, 2010.
3. Balaji P.G, X. German and D. Srinivasan, “Urban Traffic Signal Control Using
Reinforcement Learning Agents,” in IET Intelligent Transport Systems, vol.4, no.3,
pp.177-188, 2010.
4. D. Srinivasan, C.W. Chan and Balaji P.G, “Computational intelligence-based
congestion prediction for a dynamic urban street network,” in Neurocomputing,
vol.72, no.10-12, pp. 2710-2716, 2009.
5. Balaji P.G and D.Srinivasan, “Distributed Q-learning neuro-type2 fuzzy system, ”
Submitted in IEEE Transactions on Neural Networks.
6. Balaji P.G and D.Srinivasan, “Modified symbiotic evolutionary learning for type-2
fuzzy system, ” Submitted in International Journal on Fuzzy Systems.
MAGAZINE AND BOOK CHAPTERS
7. Balaji P.G and D. Srinivasan, “Multi-agent system in urban traffic signal control,” in
IEEE Computational Intelligence Magazine,vol.5, no.4,pp.43-51, 2010.
8. Balaji P.G and D. Srinivasan, “An introduction to multi-agent systems,” in
„Innovations in Multi-Agent Systems and Applications’, Studies on Computation
Intelligence, Springer, vol.310, pp.1-27, 2010.
CONFERNECES
9. Balaji P.G and D. Srinivasan, “ Distributed multi-agent type-2 fuzzy architecture for
urban traffic signal control,” IEEE International Conference on Fuzzy Systems, pp.
1627-1632, 2009.
10. Balaji P.G, D. Srinivasan and C.K. Tham, “Coordination in distributed multi-agent
system using type-2 fuzzy decision systems,” IEEE International Conference on
Fuzzy Systems, pp. 2291-2298, 2008.
11. Balaji P.G, G. Sachdeva, D. Srinivasan and C.K. Tham, “Multi-agent system based
urban traffic management,” IEEE Congress on Evolutionary Computation, pp.17401747, 2007.
12. Balaji P.G, D. Srinivasan and C.K. Tham, “Uncertainties reducing techniques in
evolutionary computation,” IEEE congress on Evolutionary Computation, pp.556563, 2007
191
REFERENCES
[1]
F. Webster, “Traffic Signal Settings,” Road Research Technical Paper, no. 39,
1958.
[2]
B. Logan, and G. Theodoropoulos, “The distributed simulation of multiagent
systems,” Proceedings of the IEEE, vol. 89, no. Copyright 2001, IEE, pp. 17485, 2001.
[3]
N. R. Jennings, K. Sycara, and M. Wooldridge, “A roadmap of agent research
and development,” Autonomous Agents and Multi-Agents Systems, vol. 1, no.
1, pp. 7-38, 1999.
[4]
L. C. Jain, and R. K. Jain, Hybrid Intelligent Engineering Systems, Singapore:
World Scientific Publishing Company, 1997.
[5]
C. Mumford, and L. C. Jain, Computational Intelligence: Collaboration,
Fusion and Emergence: Springer-Verlag, 2009.
[6]
L. C. Jain, M. Sato, M. Virvou et al., Computational Intelligence Paradigms:
Volume 1 - Innovative Applications: Springer-Verlag, 2008.
[7]
L. C. Jain, and P. De Wilde, Practical Applications of Computational
Intelligence Techniques, USA: Kluwer Academic Publishers, 2001.
[8]
L. C. Jain, and N. M. Martin, Fusion of Neural Networks, Fuzzy Logic and
Evolutionary Computing and their applications, USA: CRC Press, 1999.
[9]
H. N. Tedorescu, A. Kandel, and L. C. Jain, Fuzzy and Neuro-Fuzzy Systems
in Medicine, USA: CRC Press, 1998.
[10]
J. Fulcher, and L. C. Jain, Computational Intelligence: A Compendium:
Springer-Verlag, 2008.
[11]
R. Khosla, N. Ichalkaranje, and L. C. Jain, Design of Intelligent Multi-Agent
Systems: Springer-Verlag, 2005.
[12]
G. Resconi, and L. C. Jain, Intelligents Agents : Theory and Applications:
Springer-Verlag, 2004.
[13]
L. C. Jain, Z. Chen, and N. Ichalkaranje, Intelligent Agents and their
Applications: Springer-Verlag, 2002.
[14]
L. Gasser, and M. Huhns, Distributed Artificial Intelligence: Morgan
Kaufmann, 1989.
[15]
K. P. Sycara, “The many faces of agents,” AI magazine, vol. 19, no. 2, pp. 1112, 1998.
[16]
T. Finin, C. Nicholas, and J. Mayfield. "Agent-based information retrieval
tutorial," http://www.csee.umbc.edu/abir/.
192
[17]
H. S. Nwana, “Software agents: an overview,” Knowledge Engineering
Review, vol. 11, no. Copyright 1996, IEE, pp. 205-44, 1996.
[18]
M. Woodridge, and N. R. Jennings, “Intelligent agents theory and practice,”
Knowledge Engineering Review, vol. 10, no. Compendex, pp. 115-115, 1995.
[19]
E. H. Durfee, and V. Lesser, "Negotiating Task ecomposition and Allocation
Using Partial Global Planning," Distributed Artificial Intelligence, L. Gasser
and M. Huhns, eds., pp. 229-244: Morgan Kaufmann, 1989.
[20]
K. P. Sycara, “Multiagent Systems,” AI Magazine, vol. 19, no. 2, pp. 79-92,
1998.
[21]
N. Vlassis, Concise Introduction to Multiagent Systems and Distributed
Artificial Intelligence, San Rafael, CA, USA: Morgan & Calypool, 2007.
[22]
P. Stone, and M. Veloso, “Multiagent systems: a survey from a machine
learning perspective,” Autonomous Robots, vol. 8, no. Copyright 2000, IEE,
pp. 345-83, 2000.
[23]
Z. Ren, and C. J. Anumba, “Learning in multi-agent systems: a case study of
construction claims negotiation,” Advanced Engineering Informatics, vol. 16,
no. Copyright 2004, IEE, pp. 265-75, 2002.
[24]
E. Alonso, M. D'Inverno, D. Kudenko et al., “Learning in multi-agent
systems,” Knowledge Engineering Review, vol. 16, no. Copyright 2002, IEE,
pp. 277-84, 2001.
[25]
C. V. Goldman, "Learning in multi-agent systems : A case study of
construction claim negotiation." p. 1363.
[26]
F. Bergenti, and A. Ricci, "Three approaches to the coordination of multiagent
systems," Proceedings of the ACM Symposium on Applied Computing. pp.
367-373.
[27]
C. H. Tien, and M. Soderstrand, "Development of a micro robot system for
playing soccer games." pp. 149-152.
[28]
P. G. Balaji, and D. Srinivasan, "Distributed multi-agent type-2 fuzzy
architecture for urban traffic signal control," 2009 IEEE International
Conference on Fuzzy Systems (FUZZ-IEEE). pp. 1627-32.
[29]
L. E. Parker, “Heterogeneous multi-robot cooperation,” Massachusetts
Institute of Technology, 1994.
[30]
L. E. Parker, “Life-long adaptation in heterogeneous multi-robot teams:
Response to Continual variation in robot performance,” Autonomous Robots,
vol. 8, no. 3, 2000.
[31]
R. Drezewski, and L. Siwik, "Co-evolutionary multi-agent system with
predator-prey mechanism for multi-objective optimization," Adaptive and
Natural Computing Algorithms. 8th International Conference, ICANNGA
193
2007. Proceedings, Part I (Lecture Notes in Computer Science Vol. 4431). pp.
67-76.
[32]
A. Damba, and S. Watanabe, “Hierarchical control in a multiagent system,”
International Journal of Innovative Computing, Information & Control,
vol. 4, no. Copyright 2009, The Institution of Engineering and Technology,
pp. 3091-100, 2008.
[33]
C. Min Chee, D. Srinivasan, and R. L. Cheu, “Neural networks for continuous
online learning and control,” IEEE Transactions on Neural Networks, vol. 17,
no. Copyright 2006, The Institution of Engineering and Technology, pp. 151131, 2006.
[34]
P. G. Balaji, G. Sachdeva, D. Srinivasan et al., "Multi-agent system based
urban traffic management," 2007 IEEE Congress on Evolutionary
Computation, CEC 2007. pp. 1740-1747.
[35]
A. Koestler, The ghost in the machine, London: Hutchinson Publication
Group, 1967.
[36]
P. Leitao, P. Valckenaers, and E. Adam, "Self-adaptation for robustness and
cooperation in holonic multi-agent systems," Transactions on Large-Scale
Data- and Knowledge-Centered Systems. I, pp. 267-88, Berlin, Germany:
Springer-Verlag, 2009.
[37]
O. Yadgar, S. Kraus, and C. Oritz, "Scaling up distributed sensor networks:
Cooperative large scale mobile agent organizations," Distributed Sensor
Networks : A Multiagent Perspective, pp. 267-288: LNCS 5740, 2003.
[38]
M. Schillo, and F. Klaus, "A taxanomy of autonomy in multiagent
organisation," Autonomy 2003, LNAI 2969, pp. 68-82, 2004.
[39]
L. Bongearts, “Integration of scheduling and control in holonic manufacturing
systems,” Katholieke Universiteit Leuven, 1998.
[40]
D. Srinivasan, and M. Choy, "Distributed Problem Solving using Evolutionary
Learning in Multi-Agent Systems," Advances in Evolutionary Computing for
System Design, Studies in Computational Intelligence L. Jain, V. Palade and
D. Srinivasan, eds., pp. 211-227: Springer Berlin / Heidelberg, 2007.
[41]
M. Van De Vijsel, and J. Anderson, "Coalition formation in multi-agent
systems under real-world conditions," AAAI Workshop - Technical Report. pp.
54-60.
[42]
B. Horling, and V. Lesser, “A survey of multi-agent organizational
paradigms,” Knowledge Engineering Review, vol. 19, no. Copyright 2006,
IEE, pp. 281-316, 2004.
[43]
A. K. Agogino, and K. Tumer, Team Formation in Partially Observable
Multi-Agent Systems, United States, 2004.
[44]
Budianto, An overview and survey on multi-agent system, 2005.
194
[45]
C. Min Chee, D. Srinivasan, and R. L. Cheu, “Cooperative, hybrid agent
architecture for real-time traffic signal control,” IEEE Transactions on
Systems, Man & Cybernetics, Part A (Systems & Humans), vol. 33,
no. Copyright 2003, IEE, pp. 597-607, 2003.
[46]
P. G. Balaji, D. Srinivasan, and T. Chen-Khong, "Coordination in distributed
multi-agent system using type-2 fuzzy decision systems," 2008 IEEE 16th
International Conference on Fuzzy Systems (FUZZ-IEEE). pp. 2291-8.
[47]
S. E. Lander, “Issues in multiagent design systems,” IEEE Expert, vol. 12, no.
Copyright 1997, IEE, pp. 18-26, 1997.
[48]
J.-S. Lin, C. Ou-Yang, and Y.-C. Juan, “Towards a standardised framework
for a multi-agent system approach for cooperation in an original design
manufacturing company,” International Journal of Computer Integrated
Manufacturing, vol. 22, no. Compendex, pp. 494-514, 2009.
[49]
Y. Cengeloglu, “A framework for dynamic knowledge exchange among
intelligent agents,” in AAAI Symposium, Control of the Physical World by
Intelligent Agents, 1994.
[50]
M. Genesereth, and R. Fikes, Knowledge Interchange Format, Version 3.0
Reference Manual, Computer Science Department, Stanford University, USA,
1992.
[51]
M. L. Ginsberg, “Knowledge interchange format: the KIF of death,” AI
Magazine, vol. 12, no. Copyright 1992, IEE, pp. 57-63, 1991.
[52]
T. Finin, R. Fritzson, D. McKay et al., "KQML as an agent communication
language," CIKM 94. Proceedings of the Third International Conference on
Information and Knowledge Management. pp. 456-63.
[53]
A. Greenwald, The search for equilibrium in markov games: Synthesis
Lectures on Artificial Intelligence and Machine Learning, 2007.
[54]
Y. M. Ermol'ev, and S. P. Uryas'ev, “Nash equilibrium in n-person games,”
Cybernetics, vol. 18, no. Copyright 1983, IEE, pp. 367-72, 1982.
[55]
R. Gibbons, “An introduction to applicable game theory,” The Journal of
Economic Perspectives, vol. 11, no. 1, pp. 127-149, 1997.
[56]
H. Nwana, L. Lee, and N. Jennings, "Co-ordination in multi-agent systems,"
Software Agents and Soft Computing Towards Enhancing Machine
Intelligence, Lecture Notes in Computer Science H. Nwana and N. Azarmi,
eds., pp. 42-58: Springer Berlin / Heidelberg, 1997.
[57]
A. Chavez, and P. Maes, "Kasbah: an agent marketplace for buying and
selling goods," Acquisition, Learning and Demonstration: Automating Tasks
for Users. Papers from the 1996 AAAI Symposium (TR SS-96-02). pp. 8-12.
[58]
L. Kuyer, S. Whiteson, B. Bakker et al., "Multiagent reinforcement learning
for Urban traffic control using coordination graphs," Lecture Notes in
195
Computer Science (including subseries Lecture Notes in Artificial Intelligence
and Lecture Notes in Bioinformatics). pp. 656-671.
[59]
R. G. Smith, “The contract net protocol: high level communication and control
in a distributed problem solver,” IEEE Transactions on Computers, vol. C-29,
no. Copyright 1981, IEE, pp. 1104-13, 1980.
[60]
P. D. O'Brien, and R. C. Nicol, “FIPA-towards a standard for software
agents,” BT Technology Journal, vol. 16, no. Copyright 1998, IEE, pp. 51-9,
1998.
[61]
C. Guestrin, S. Venkataraman, and D. Koller, "Context-specific multiagent
coordination and planning with factored MDPs," Proceedings of the National
Conference on Artificial Intelligence. pp. 253-259.
[62]
D. Srinivasan, C. Min Chee, and R. L. Cheu, “Neural networks for real-time
traffic signal control,” IEEE Transactions on Intelligent Transportation
Systems, vol. 7, no. Copyright 2006, The Institution of Engineering and
Technology, pp. 261-72, 2006.
[63]
L. Lhotska, "Learning in multi-agent systems: theoretical issues," Computer
Aided Systems Theory - EUROCAST '97. Selection of Papers from the 6th
International Workshop on Computer Aided Systems Theory. Proceedings. pp.
394-405.
[64]
F. Gomez, J. Schmidhuber, and R. Miikkulainen, "Efficient non-linear control
through neuro evolution." pp. 654-662.
[65]
J. Jiu, Autonomous Agents and Multi-Agent Systems: World Scientific
Publication.
[66]
V. Vassiliades, A. Cleanthous, and C. Christodoulou, "Multiagent
Reinforcement Learning with Spiking and Non-Spiking Agents in the Iterated
Prisoner‟s Dilemma," Artificial Neural Networks – ICANN 2009, Lecture
Notes in Computer Science C. Alippi, M. Polycarpou, C. Panayiotou et al.,
eds., pp. 737-746: Springer Berlin / Heidelberg, 2009.
[67]
T. Gabel, and M. Riedmiller, "On a successful application of multi-agent
reinforcement learning to operations research benchmarks," 2007 First IEEE
International Symposium on Approximate Dynamic Programming and
Reinforcement Learning (IEEE Cat. No.07EX1572). p. 8 pp.
[68]
R. S. Sutton, and A. G. Barto, Reinforcement Learning: An Introduction,
Cambridge, MA: MIT Press.
[69]
J. Schneider, W. Weng-Keen, A. Moore et al., "Distributed value functions,"
Machine Learning. Proceedings of the Sixteenth International Conference
(ICML'99). pp. 371-8.
[70]
L. Busoniu, R. Babuka, and B. De Schutter, “A comprehensive survey of
multiagent reinforcement learning,” IEEE Transactions on Systems, Man and
196
Cybernetics Part C: Applications and Reviews, vol. 38, no. Compendex, pp.
156-172, 2008.
[71]
C. J. Messer, H. E. Haenel, and E. A. Koeppe, A Report on the User's Manual
for Progression Analysis and Signal System Evaluation Routine--Passer II,
United States, 1974.
[72]
E. C. P. Chang, S. L. Cohen, C. Liu et al., “MAXBAND-86. Program for
optimizing left-turn phase sequence in multiarterial closed networks,”
Transportation Research Record, no. Compendex, pp. 61-67, 1988.
[73]
C. Stamatiadis, and N. H. Gartner, “MULTIBAND-96: A program for
variable-bandwidth progression optimization of multiarterial traffic networks,”
Transportation Research Record, no. Compendex, pp. 9-17, 1996.
[74]
C. J. Messer, and M. P. Malakapalli, Applications Manual for Evaluating Two
and Three-Level Diamond Interchange Operations Using Transyt-7F, United
States, 1992.
[75]
C. Sun, and J. Xu, “Study on Traffic Signal Timing Optimization for Single
Point Intersection Based on Synchro Software System,” Journal of Highway
and Transportation Research and Development, vol. 26, no. Copyright 2010,
The Institution of Engineering and Technology, pp. 117-22, 2009.
[76]
S. R. Sunkari, R. J. Engelbrecht, and K. N. Balke, Evaluation of advanced
coordination features in traffic signal controllers, FHWA, September, 2004.
[77]
M. C. Bell, and R. D. Bretherton, "Ageing of fixed-time traffic signal plans."
[78]
K. Fehon, “Adaptive traffic signals are we missing the boat,” in ITE District 6
Annual Meeting, 2004.
[79]
P. R. Lowrie, "The Sydney Coordinated Adaptive Traffic System-principles,
methodology, algorithms," International Conference on Road Traffic
Signalling. pp. 67-70.
[80]
D. I. Robertson, and R. D. Bretherton, “Optimizing networks of traffic signals
in real time-the SCOOT method,” IEEE Transactions on Vehicular
Technology, vol. 40, no. Copyright 1991, IEE, pp. 11-15, 1991.
[81]
P. B. Hunt, D. I. Robertson, R. D. Bretherton et al., SCOOT-A traffic
responsive method for coordinating signals, TRL, 1981.
[82]
F. Busch, and G. Kruse, "MOTION for SITRAFFIC - A modern approach to
urban traffic control," IEEE Conference on Intelligent Transportation Systems,
Proceedings, ITSC. pp. 61-64.
[83]
C. Bielefeldt, and F. Busch, "MOTION-a new on-line traffic signal network
control system," Seventh International Conference on `Road Traffic
Monitoring and Control' (Conf. Publ. No.391). pp. 55-9.
[84]
M. Papageorgiou, "An introduction to signal traffic control strategy TUC."
197
[85]
V. Mauro, and C. Di Taranto, "UTOPIA [traffic control]," Control,
Computers, Communications in Transportation. Selected Papers from the
IFAC/IFIP/IFORS Symposium. pp. 245-52.
[86]
N. H. Gartner, S. F. Assmann, F. Lasaga et al., “A multi-band approach to
arterial traffic signal optimization,” Transportation Research, Part B
(Methodological), vol. 25B, no. Copyright 1991, IEE, pp. 55-74, 1991.
[87]
N. H. Gartner, J. D. C. Little, and H. Gabbay, “Simultaneous optimization of
offsets, splits, and cycle time,” Transportation Research Record, no.
Compendex, pp. 6-15, 1976.
[88]
N. H. Gartner, F. J. Pooran, and C. M. Andrews, "Implementation of the
OPAC adaptive control strategy in a traffic signal network," IEEE Conference
on Intelligent Transportation Systems, Proceedings, ITSC. pp. 195-200.
[89]
J. F. Barriere, J. L. Farges, and J. J. Henry, "Decentralization vs hierarchy in
optimal traffic control," IFAC Proceedings Series. pp. 209-214.
[90]
J. L. Farges, I. Khoudour, and J. B. Lesort, "PRODYN: on site evaluation,"
Third International Conference on Road Traffic Control (Conf. Publ. No.320).
pp. 62-6.
[91]
P. Mirchandani, and L. Head, “A real-time traffic signal control system:
Architecture, algorithms, and analysis,” Transportation Research Part C:
Emerging Technologies, vol. 9, no. Compendex, pp. 415-432, 2001.
[92]
K. L. Head, P. B. Mirchandani, and D. Sheppard, “Hierarchical framework for
real-time traffic control,” Transportation Research Record, vol. 1360, pp. 8288, 1992.
[93]
P. Dell'Olmo, and P. B. Mirchandani, “REALBAND: an approach for realtime coordination of traffic flows on networks,” Transportation Research
Record, no. Compendex, pp. 106-116, 1995.
[94]
Sen.S., and K. L. Head, “Controlled optimization of phases at an intersection,”
Transportation Science, vol. 3, pp. 5-17, 1997.
[95]
L. A. Zadeh, “Concept of a linguistic variable and its application to
approximate reasoning - 1,” Information Sciences, vol. 8, no. Compendex, pp.
199-249, 1975.
[96]
L. A. Zadeh, “Concept of a linguistic variable and its application to
approximate reasoning - 2,” Information Sciences, vol. 8, no. Compendex, pp.
301-357, 1975.
[97]
L. A. Zadeh, “Concept of a linguistic variable and its application to
approximate reasoning - 3,” Information Sciences, vol. 9, no. Compendex, pp.
43-80, 1975.
198
[98]
L. Qilian, and J. M. Mendel, “Interval type-2 fuzzy logic systems: theory and
design,” IEEE Transactions on Fuzzy Systems, vol. 8, no. Copyright 2000,
IEE, pp. 535-50, 2000.
[99]
N. N. Karnik, J. M. Mendel, and L. Qilian, “Type-2 fuzzy logic systems,”
IEEE Transactions on Fuzzy Systems, vol. 7, no. Copyright 2000, IEE, pp.
643-58, 1999.
[100] N. N. Karnik, and J. M. Mendel, “Centroid of a type-2 fuzzy set,” Information
Sciences, vol. 132, no. Copyright 2001, IEE, pp. 195-220, 2001.
[101] W. Hongwei, and J. M. Mendel, "Introduction to uncertainty bounds and their
use in the design of interval type-2 fuzzy logic systems," 10th IEEE
International Conference on Fuzzy Systems. (Cat. No.01CH37297). pp. 662-5.
[102] S. Coupland, and R. John, “New geometric inference techniques for type-2
fuzzy sets,” International Journal of Approximate Reasoning, vol. 49, no.
Compendex, pp. 198-211, 2008.
[103] Z. Wang, and J.-x. Fang, “On the direct decomposability of pseudo-t-norms, tnorms and implication operators on product lattices,” Fuzzy Sets and Systems,
vol. 158, no. Compendex, pp. 2494-2503, 2007.
[104] S. Coupland, and R. John, “Geometric type-1 and type-2 fuzzy logic systems,”
IEEE Transactions on Fuzzy Systems, vol. 15, no. Copyright 2007, The
Institution of Engineering and Technology, pp. 3-15, 2007.
[105] J. Pach, and M. Sharir, “On vertical visibility in arrangements of segments and
the queue size in the Bentley-Ottmann line sweeping algorithm,” SIAM
Journal on Computing, vol. 20, no. Compendex, pp. 460-470, 1991.
[106] W. Hongwei, and J. M. Mendel, “Uncertainty bounds and their use in the
design of interval type-2 fuzzy logic systems,” IEEE Transactions on Fuzzy
Systems, vol. 10, no. Copyright 2002, IEE, pp. 622-39, 2002.
[107] J. Niittymaki, “General fuzzy rule base for isolated traffic signal control-rule
formulation,” Transportation Planning and Technology, vol. 24, no.
Compendex, pp. 227-247, 2001.
[108] C. Min Chee, “Cooperative, hybrid multi-agent system for distributed , realtime traffic signal control,” Department of Electrical and Computer
Engineering, National University of Singapore, Singapore, 2006.
[109] D. E. Moriarty, and R. Miikkulainen, "Efficient learning from delayed rewards
through symbiotic evolution," Machine Learning. Proceedings of the Twelfth
International Conference on Machine Learning. pp. 396-404.
[110] D. E. Moriarty, and R. Miikkulainen, “Efficient reinforcement learning
through symbiotic evolution,” Machine Learning, vol. 22, no. Copyright 1996,
IEE, pp. 11-32, 1996.
199
[111] B. H. G. Barbosa, L. T. Bui, H. A. Abbass et al., “The use of coevolution and
the artificial immune system for ensemble learning,” no. Compendex, pp. 113, 2010.
[112] G. P. Figueredo, L. A. V. de Carvalho, and H. J. C. Barbosa, "Coevolutionary
genetic algorithms to simulate the immune system's gene libraries evolution,"
Advances in Natural Computation. First International Conference, ICNC
2005. Proceedings, Part II (Lecture Notes in Computer Science Vol. 3611).
pp. 941-4.
[113] Z. Qiuyong, R. Jing, Z. Zehua et al., "Immune co-evolution algorithm based
on chaotic optimization," 2007 Workshop on Intelligent Information
Technology Application. pp. 149-52.
[114] C.-F. Juang, and C.-T. Lin, "Genetic reinforcement learning through symbiotic
evolution for fuzzy controller design," IEEE International Conference on
Fuzzy Systems. pp. 1281-1285.
[115] Y.-C. Hsu, S.-F. Lin, and Y.-C. Cheng, “Multi groups cooperation based
symbiotic evolution for TSK-type neuro-fuzzy systems design,” Expert
Systems with Applications, vol. 37, no. Compendex, pp. 5320-5330, 2010.
[116] M. Mahfouf, M. Jamei, and D. A. Linkens, "Rule-base generation via
symbiotic evolution for a mamdani-type fuzzy control system," IEEE
International Conference on Fuzzy Systems. pp. 396-399.
[117] F.-z. Yi, H.-z. Hu, and D. Zhou, “Fuzzy controller auto-design based on the
symbiotic evolution algorithm,” Systems Engineering and Electronics, vol. 25,
no. Copyright 2004, IEE, pp. 750-3, 2003.
[118] H. B. Kazemian, “Study of learning fuzzy controllers,” Expert Systems, vol.
18, no. Copyright 2001, IEE, pp. 186-93, 2001.
[119] K. Tanaka, T. Taniguchi, and H. O. Wang, "Generalized Takagi-Sugeno fuzzy
systems: rule reduction and robust control," Ninth IEEE International
Conference on Fuzzy Systems. FUZZ- IEEE 2000 (Cat. No.00CH37063). pp.
688-93.
[120] P. Liu, “Mamdani fuzzy system: Universal approximator to a class of random
processes,” IEEE Transactions on Fuzzy Systems, vol. 10, no. Compendex, pp.
756-766, 2002.
[121] C. Lynch, H. Hagras, and V. Callaghan, "Using uncertainty bounds in the
design of an embedded real-time type-2 neuro-fuzzy speed controller for
marine diesel engines," IEEE International Conference on Fuzzy Systems. pp.
1446-1453.
[122] C. J. C. H. Watkins, “Learning from delayed rewards,” University of
Cambridge, 1989.
[123] R. A. Jacobs, “Increased Rates of Convergence Through Learning Rate
Adaptation,” Neural Networks, vol. 1, no. Compendex, pp. 295-307, 1988.
200
[124] R. T. Van Katwijk, P. Van Koningsbruggen, B. De Schutter et al., "Test bed
for multiagent control systems in road traffic management," Transportation
Research Record. pp. 108-115.
[125] S. Mikami, and Y. Kakazu, "Genetic reinforcement learning for cooperative
traffic signal control," Proceedings of the First IEEE Conference on
Evolutionary Computation. IEEE World Congress on Computational
Intelligence (Cat. No.94TH0650-2). pp. 223-8.
[126] L. Jee-Hyong, and L.-K. Hyung, “Distributed and cooperative fuzzy
controllers for traffic intersections group,” IEEE Transactions on Systems,
Man and Cybernetics, Part C (Applications and Reviews), vol. 29, no.
Copyright 1999, IEE, pp. 263-71, 1999.
[127] M. B. Trabia, M. S. Kaseko, and M. Ande, “A two-stage fuzzy logic controller
for traffic signals,” Transportation Research Part C (Emerging Technologies),
vol. 7C, no. Copyright 2000, IEE, pp. 353-67, 1999.
[128] S. Chiu, and S. Chand, "Self-organizing traffic control via fuzzy logic,"
Proceedings of the 32nd IEEE Conference on Decision and Control (Cat.
No.93CH3307-6). pp. 1897-902.
[129] Quadstone, PARAMICS Modeller v6.0 User Guide and Reference Manual,
Quadstone Ltd, Edinburgh, UK, 2002.
[130] J. Little, “A Proof for the Queuing Formula: L= λ W,” Operations Research,
vol. 9, no. 3, pp. 383-387, 1961.
[131] J. R. Peirce, and P. J. Webb, "MOVA control of isolated traffic signals-recent
experience," Third International Conference on Road Traffic Control (Conf.
Publ. No.320). pp. 110-13.
[132] T. R. Board, "Highway Capacity Manual," National Research Council, 2000.
201