Bun Bun
Bun Bun
Bun Bun
=
(3.1)
( )
( )
( ) , t t t
occ on
n n
off
n
= (3.2)
( )
,
L L
n
d
v
n
t
occ
n
+
= (3.3)
where,
q = flow (vehicles per second),
N = total number of vehicles,
( )
n occ
t = individual occupancy time (seconds),
( )
n on
t = instant of time the vehicle n is detected (seconds),
n
off
t
|
.
|
\
|
= instant of time the vehicle is exited (seconds),
n
v = vehicle speed (feet per second),
L
n
= vehicle length (feet), and
d
L = detection zone length (feet).
In the case of dual-loop detectors, flow and occupancy are reported when the vehicle crosses the
first loop of the dual trap. Speed calculations are made when the vehicle passes the second loop,
based on the known distance between the two loops and the time taken to travel from the first
loop to the second loop (Texas Department of Transportation (TxDOT) 2000; Sreedevi and
Black 2001).
33
Thus, flow and occupancy are calculated in an identical manner to the single-loop detector, as
given in equation 3.1 and 3.2. The speed is calculated as follows:
( ) ( )
,
D
v
n
t t
on on
n n
B A
=
( (
(3.4)
where,
A = first loop in the dual-loop detector,
B = second loop in the dual-loop detector, and
D = distance from the upstream edge of detection zone A to the upstream edge of
detection zone B (feet).
A local control unit (LCU) accumulates speed, occupancy, and volume from the detector
channels, keeps a moving average of these measurements, and sends the data to traffic
management center (TMC) at intervals of 20 to 30 seconds for analysis with a computer-based
algorithm (TransGuide Technical Brochure 2000). From this, the flow rate, percent occupancy
and average speed for that particular time interval are calculated. Percent occupancy is a
surrogate for density and is obtained by determining the percent of time a detector is occupied
and is calculated as follows (May 1990):
( )
1
100,
N
t
occ
n
n
O
t
=
=
(3.5)
where,
O = percent occupancy time,
( )
n occ
t = individual occupancy time (seconds),
N = number of vehicles detected, and
t = selected time period (seconds).
Density is calculated from the percent occupancy as
34
52.8
, k O
L L
v
d
=
+
(3.6)
where,
k = density (vehicles per lane-mile), and
v
L = average vehicle length (feet).
The San Antonio corridors are equipped with dual inductance loop detectors at approximately
0.5-mile spacing. The loop detectors used at the I-35 site are 6 feet by 6 feet (1.83 m by 1.83 m)
buried 1 inch (2.54 cm) below the road surface, and are centered in each lane. The two loops in
the dual-loop detectors are installed 12 feet (3.66 m) apart longitudinally and are made up of
differing number of turns to minimize cross talk. The loop detector signals are sent to LCUs,
where the data are analyzed to determine volume, occupancy, and speed for that 20-second
interval. The LCU also continually checks the loops for long periods of continuous presence
or complete lack of presence, which may indicate loop detector problems. Speed values are
reported when vehicles pass the second loop, and volume and occupancy data are reported from
the first loop detector (TransGuide Technical Brochure 2000).
The format of the raw data collected from the field is shown in Figure 3.4. The first and second
columns pertain to the date and time, respectively. The third column shows the detector number,
which includes details such as to whether the detector is on an exit ramp (EX), entry ramp (EN),
or on the main lane (L). The lanes are numbered in increasing order from the median to the curb
as L1, L2, L3, etc. The interstate name and mile marker are also provided in the detector
number. The speed, volume, and occupancy values are indicated in the fourth, fifth, and sixth
columns, respectively, for every 20-second period.
35
02/10/2003 00:00:28 EX1-0035N-166.829 Speed=-1 Vol=000 Occ=000
02/10/2003 00:00:28 EX2-0035N-166.836 Speed=60 Vol=000 Occ=000
02/10/2003 00:00:28 L2-0035S-166.833 Speed=87 Vol=000 Occ=000
02/10/2003 00:00:28 L3-0035N-166.833 Speed=61 Vol=002 Occ=005
02/10/2003 00:00:28 L3-0035S-166.833 Speed=70 Vol=002 Occ=002
02/10/2003 00:00:29 EX1-0035N-168.108 Speed=-1 Vol=000 Occ=000
02/10/2003 00:00:29 EX1-0035S-167.857 Speed=-1 Vol=000 Occ=000
02/10/2003 00:00:29 L1-0035N-167.942 Speed=71 Vol=001 Occ=001
02/10/2003 00:00:29 L2-0035N-167.942 Speed=76 Vol=001 Occ=001
02/10/2003 00:00:29 L3-0035N-167.942 Speed=77 Vol=001 Occ=001
02/10/2003 00:00:29 L4-0035N-167.942 Speed=64 Vol=001 Occ=001
02/10/2003 00:00:30 EN1-0035N-169.306 Speed=-1 Vol=000 Occ=000
02/10/2003 00:00:30 EX1-0035S-169.286 Speed=-1 Vol=000 Occ=000
02/10/2003 00:00:31 EN1-0035N-170.580 Speed=-1 Vol=000 Occ=000
02/10/2003 00:00:31 EX1-0035N-170.148 Speed=-1 Vol=002 Occ=002
02/10/2003 00:00:31 EX1-0035N-170.578 Speed=-1 Vol=000 Occ=000
02/10/2003 00:00:31 EX1-0035S-170.378 Speed=-1 Vol=000 Occ=000
02/10/2003 00:00:31 L2-0035N-170.425 Speed=59 Vol=000 Occ=000
02/10/2003 00:00:31 L2-0035S-170.425 Speed=62 Vol=003 Occ=003
02/10/2003 00:00:31 L3-0035N-170.425 Speed=63 Vol=002 Occ=002
02/10/2003 00:00:31 L4-0035N-170.425 Speed=62 Vol=002 Occ=002
02/10/2003 00:00:32 EN1-0035S-170.917 Speed=-1 Vol=000 Occ=000
02/10/2003 00:00:32 EN1-0035S-170.929 Speed=-1 Vol=000 Occ=000
Fig. 3.4 Raw ILD data
The ILD data from TransGuide area used in this dissertation were archived based on the server
which processed the data. There were 5 servers reporting the data for the selected days and each
of them were reporting data from approximately 100 detector locations. This came to around
30MB files from each of the servers for each of the days. For a selected location, the size of the
data files was approximately 600 KB per day.
3.2.2 Automatic Vehicle Identification (AVI)
Automatic Vehicle Identification refers to technology used to identify a particular vehicle when
it passes a particular point. Automatic vehicle monitoring or AVM involves the tracking of
vehicles at all times. Early development of AVI occurred in the United States (Hauslen 1977;
Roth 1977; Fenton 1980) beginning with an optical scanning system in the 1960s in the railroad
industry to automatically identify the rolling stock. Since then there have been enormous
36
advances in this area for different applications varying from toll collection systems to advanced
traveler information system (Scott 1992).
The AVI system needs AVI readers, vehicles that have AVI tags (probe vehicles), and a central
computer system, as shown in Figure 3.5. Tags, also known as transponders, are electronically
encoded with unique identification (ID) numbers. Roadside antennas are located on roadside or
overhead structures or as a part of an electronic toll collection booth. The antennas emit radio
frequency signals within a capture range across one or more freeway lanes. When a probe
vehicle enters the antennas capture range, the radio signal is reflected off the electronic
transponders. The reflected signal is slightly modified by the tags unique ID number. The
captured ID number is sent to a roadside reader unit via coaxial cable and is assigned a time and
date stamp and antenna ID stamp. These bundled data are then transmitted to a central computer
facility via telephone line, where they are processed and stored. Unique probe vehicle ID
numbers are tracked along the freeway system, and the travel time of the probe vehicles is
calculated as the difference between the time stamps at sequential antenna locations (Traffic
Detector Handbook 1991).
AVI systems have the ability to continuously collect large amounts of data with minimal human
resource requirements. The data collection process is mainly constrained by sample size (Traffic
Detector Handbook 1991). Figure 3.6 shows a sample set of raw AVI data that are collected by
the reader. The first column is the AVI reader number. The second column is the anonymous
tag ID of the vehicles. The third column gives the time followed by the date.
37
Fig. 3.5 AVI conceptual view
(Source: http://www.TransGuide.dot.state.tx.us/)
142 HCTR0092677553...!H$ &00:58:21.63 02/11/03%16-0-06-0
142 OTA.00095021C0...^D$ &00:58:29.44 02/11/03%16-0-12-0
145 ARFWP10647.......... &00:57:30.68 02/11/03%1B-0-03-0
145 ARFWD3018.......... &00:57:30.88 02/11/03%1B-0-03-0
145 ARFWP14316.......... &00:57:30.94 02/11/03%1B-0-01-0
145 DDS0112 &00:58:08.38 02/11/03%1D-1-04-1
145 DDS0223 &00:58:08.73 02/11/03%1D-1-01-1
144 ARFWP14872.......... &00:59:08.08 02/11/03%19-1-01-1
145 DNT.004672118B...^?$ &00:59:06.92 02/11/03%1D-1-08-1
144 ARFWP9898.......... &00:59:08.41 02/11/03%19-1-01-1
144 ARFWP11606.......... &00:59:34.27 02/11/03%19-0-01-0
144 ARFWD248.......... &00:59:34.59 02/11/03%19-0-02-0
144 ARFWP2143.......... &00:59:34.68 02/11/03%19-0-01-0
145 OTA.00074756F8...^D$ &00:59:47.38 02/11/03%1B-0-04-0
144 OTA.00794375F2...^D$ &00:59:44.61 02/11/03%19-1-01-1
137 OTA.00625142E0...^D$ &01:00:05.42 02/11/03%2D-0-04-0
141 OTA.005238872C...^D$ &01:00:09.00 02/11/03%32-0-0B-0
142 ARFWD3624.......... &01:00:18.19 02/11/03%16-0-07-0
Fig. 3.6 Raw AVI data format
38
The AVI data from the TRANSGUIDE site are archived in three different categories every day.
These are tag archive, link archive, and site archive. Each of these files was of the size
1.5MB/day. Tag archive is the one that contains all the vehicle information as shown in Figure
3.6. One days tag archive file will have data from all the 15 AVI stations for 24 hours. There
will be around 20,000 data points for each day from the 15 AVI stations. Figure 3.7 shows the
picture of an AVI antenna in the field.
Fig. 3.7 AVI antennas
(Source: http://www.houstontranstar.org/about_transtar/docs/2003_fact_sheet_2.pdf)
3.3 FIELD DATA COLLECTION
Data for the present study were collected from the TransGuide web site, where the data were
archived for research purposes. TransGuide, Transportation Guidance System, is San Antonios
advanced traffic management system (ATMS). TransGuide was designed to provide information
to motorists about traffic conditions such as accidents, congestion, and construction. With the
39
use of inductance loop detectors, color video cameras, AVI, variable message signs (VMS), lane
control signals (LCS), traffic signals, etc., TransGuide can detect travel time and respond rapidly
to accidents and emergencies (Texas Department of Transportation (TxDOT) 2003). The
specifically stated system goals are to detect incidents within 2 minutes, change all affected
traffic control devices within 15 seconds from alarm verification, allow San Antonio police to
dispatch appropriate response, assure system reliability and expandability, and support future
Advanced Traffic Management System (ATMS) and ITS activities (TransGuide Technical
Brochure 2003). Figure 3.8 shows two examples of the information provided to travelers
through the TransGuide system.
Fig. 3.8 Examples of TransGuide information systems
(Source: http://www.TransGuide.dot.state.tx.us/docs/atms_info.html)
The first phase of the San Antonio TransGuide system became operational on July 26, 1995 and
included 26 miles of downtown freeway. This phase of the TransGuide system includes variable
message signs, lane control signals, loop detectors, video surveillance cameras, and a
communication network covering the 26-instrumented miles. Now operational on 77 miles, the
system will eventually cover about 200 miles of freeway. The section on I-35 between New
Braunfels Avenue and Walzem Road went online in March 2000, which is the selected test bed
40
for the present study (TexHwyMan 2003). Figure 3.9a shows the San Antonio freeway system
indicating the location of the test bed selected for the present study. Figure 3.9b shows the
enlarged map of the selected test bed, and is detailed in the following section.
Fig. 3.9 a) Map of the freeway system of San Antonio and b) map of the test bed
(Source: http://www.TransGuide.dot.state.tx.us)
(a)
(b)
41
3.4 TEST BED
The I-35 section was selected based on the availability of the loops and AVI in the same
location. The selected test bed is a three-lane road with on ramps and off ramps in between the
detectors as shown in Figure 3.10a. The data were analyzed for continuous 24-hour periods for 5
consecutive days starting from February 10, 2003, Monday to February 14, 2003, Friday. For
the study period the data were reported in 20-second intervals. Thus for a 24-hour period,
around 4000 records were available for each of the detectors. A series of five detectors from
stations 159.500 to 161.405 including all the ramps in between was analyzed. AVI data were
also collected from the same section.
The present problem of estimation and prediction of travel time from ILD data using the
suggested model necessitated aggregating the data from all three lanes and analyzing them as a
single lane. The data were aggregated because the travel time estimation model suggested in the
present study is mainly based on the conservation of vehicles principle (see Chapter V for more
details). Although lane-by-lane data were available from the loop detectors, no lane changing
data were available. The constraint related to conservation of vehicles cannot be imposed on
individual lanes due to the lack of lane changing data. The vehicles entering the section of road
under consideration can change lanes before exiting the section. Hence, in addition to lane-by-
lane data, one also needs details related to the number of vehicles that changed lane from and/or
to the lane under consideration. The data used in this dissertation are from ILD, and it is
impossible to get the lane changing details using this technology. Hence, lane changing is not
taken into account while developing the model for the estimation of travel time.
In case the lane changing data are available from a different data source, such as video data, the
models used for the estimation of travel time need to be modified accordingly to incorporate lane
changing into account. In the present form, the models do not consider the inflow and outflow
from adjacent lanes by lane changing. Hence, the data from the detectors in different lanes at
each of the detector stations were aggregated together and assumed as a single lane in this
dissertation.
42
Since this dissertation investigated a series of detectors and analyzed the total inflow and outflow
at each entry-exit pair, data from the ramps in between the entry-exit pair were also needed. The
entrance and exit ramp data were added to the appropriate main lane data. This is required
because the present study uses an input-output analysis, and the ramp data are also part of the
input or output. Also, the volume coming from ramps becomes part of the vehicles in the road
section under consideration and the travel time is affected due to this incoming volume from
ramps. Figure 3.10b shows the accumulation process and the resulting five consecutive detector
locations in the present study.
3.5 PRELIMINARY DATA REDUCTION
The traffic data obtained from loop detectors are used for different applications such as graphical
displays, traffic forecasting programs, and incident detection algorithms. Ensuring the accuracy
of traffic data prior to their use is of utmost importance for the proper functioning of incident
detection algorithms and other condition monitoring applications. Techniques to screen such
data and to remove suspect data have evolved during the last few years and are detailed in
Chapter II.
In the present study, the initial data screening and quality control of detector data were carried
out based on suggestions in previous literature. The methods selected for the preliminary data
screening in the present study are discussed in the sections below.
4
3
Figure not to scale.
Fig. 3.10 Schematic diagram of the test bed from I-35 N, San Antonio, Texas
Northbound
AVI 42
EN3
161.203
EX1
160.625
AVI 43
EN1
159.506
Station 3
160.504
Station 4
160.892
Station 5
161.405
Station 1
159.500
Station 2
159.998
EN2
159.960
Lane 1
Lane 2
Lane 3
0.498 miles 0.506 miles 0.388 miles 0.513 miles
Location 1 Location 2 Location 3 Location 4 Location 5
(a)
(b)
44
3.5.1 Detector Data
There are five servers in the TransGuide center that are dedicated to data storage and data
processing. The data from the sites are sent to any one of the five servers available. To extract
data from a particular detector, the first step is to find out which server processed the selected
detector number. Once that is known, the entire data set is searched and the data corresponding
to the specific detector number and for a specific lane are extracted to a new file. Thus, for a
three-lane roadway, there will be three files corresponding to each of the lanes for one detector
number. MATLAB programs were developed to extract these data and are shown in Appendix
D.
As a first step, extensive quality control and data reduction were performed. The data were
cleaned of unreasonable values of speed, volume, and occupancy, both individually and in
combination. Also, the polling cycle of the data during the data collection period was 20
seconds, but the cycle occasionally skipped to larger intervals. Preliminary data reduction and
quality control were performed when the data had any of the above errors, and they are discussed
in the following sections.
3.5.1.1 Test for Individual Threshold Values
The threshold value test examined speed, volume, and occupancy in each individual record of
the data set. If the observed value was outside the feasible region, that particular value was
discarded and was assumed to be equal to the average of the previous time step and next time
step values. A maximum threshold value of 3000 vehicles per hour per lane was used as the
volume threshold. This value was based on previous studies (Jacobson et al. 1990; Turochy
2000; Turner et al. 2000; Park et al. 2003; Eisele 2001). For speed, a threshold value of 100
mph (160 kmph) was used, and occupancy values exceeding 90% were discarded. Table 3.1
shows the screening rules incorporated in the MATLAB code to identify erroneous data. These
rules were established in previous works (Turner et al. 2000; Park et al. 2003; Eisele 2001;
Brydia et al. 1998). Rules one, two, and three are the thresholds set for individual parameters.
45
3.5.1.2 Test for Combinations of Parameters
All combinations of one of the three parameters, speed, volume, or occupancy, being zero, with
the other two being non-zero were examined. Similarly, combinations with one being non-zero,
with other two being zero were also checked. When such unreasonable combinations were
found, the zero values were replaced with the average of the previous time step and next time
step values.
Rule four in Table 3.1 represents the condition of all traffic parameters being zero. This occurs
when vehicles are either stopped over the loop detectors or when there are no vehicles present in
that time step. This happens mostly due to vehicles not being present during off-peak traffic
conditions in early mornings. These data are removed from the data set so that they will not
affect the average speeds when taking the average of the 2-minute intervals. Rule five identifies
observations when the speed, volume, and occupancy are in the acceptable and expected ranges
for a 20-second period. The remaining rules are used to identify suspicious combinations of
speed, volume, and occupancy and their cause is unknown. The unreasonable observations in
this category were replaced with an average of the previous and next values.
Table 3.1 Screening Rules
SCREENING RULES
Individual tests
1) q > 17 Error
2) v > 100 Error
3) o > 90 Error
Combination tests
4) v = 0, q = 0, o = 0 Discard
5) v = 0 - 100, q = 0 - 17, o = 0 - 90 Accept
6) v = 0, q = 0, o > 0 Error
7) v = 0, q > 0, o > 0 Error
8) v = 0, q > 0, o = 0 Error
9) v > 0, q = 0, o = 0 Error
10) v > 0, q > 0, o = 0 Error
11) v > 0, q = 0, o > 0 Error
q = volume per 20 second, v = speed in mph, and o = percent occupancy.
46
3.5.1.3 Missing Data
Gold et al. (2001) reported that when the polling cycle is less than 2 minutes, the current
observation contains the sum of the traffic characteristics between the previous and current
observations. This means the volume indicated in the current observation is the sum of the
volume since the previous observation and the speed is the average speed since the previous
observation. Therefore, the current speed can be used for the speed of the previous observation
and half of the volume of the current observation can be used for the volume in the previous
step.
The polling cycle of the San Antonio data for the selected locations during the data collection
dates was 20 seconds, but it was observed that the cycle occasionally skipped to 60 or 120
seconds. When this happened one of the following two things have occurred. Either the first
interval was skipped and all the data were recorded in the next interval, or the first interval data
were missed altogether. The decision to use specific values was made based on the magnitude of
the values reported in the interval after the missing interval in comparison to the neighboring
values. In the case of aggregated interval, the data were split into 20-second intervals, whereas
in the case of missing intervals, the data were imputed with the average of the previous and next
interval data.
Programs were developed in MATLAB to carry out the threshold checking, combination checks,
and imputation. After these corrections, the data were aggregated into 2-minute intervals. Thus,
an original file with data for a 24-hour time period having around 4300 records will be reduced
to 720 records after aggregation. The 2-minute data from different lanes of the same detector
station were added together and assumed as a single detector location as explained earlier.
Subsequently, the entry ramp and exit ramp volume data were added to the appropriate main lane
detectors. Sample distribution of occupancy, speed, and volume as a function of time, after all
the quality control and data reduction are carried out, are shown in Figures 3.11, 3.12, and 3.13
for February 11, 2003 for location 2.
47
0
20
40
60
80
100
120
140
0
:
0
2
:
0
0
1
:
3
8
:
0
0
3
:
1
4
:
0
0
4
:
5
0
:
0
0
6
:
2
6
:
0
0
8
:
0
2
:
0
0
9
:
3
8
:
0
0
1
1
:
1
4
:
0
0
1
2
:
5
0
:
0
0
1
4
:
2
6
:
0
0
1
6
:
0
2
:
0
0
1
7
:
3
8
:
0
0
1
9
:
1
4
:
0
0
2
0
:
5
0
:
0
0
2
2
:
2
6
:
0
0
Time (hh:mm:ss)
O
c
c
u
p
a
n
c
y
(
%
/
3
l
a
n
e
/
2
m
i
n
u
t
e
s
)
Fig. 3.11 Occupancy distribution from I-35 site, location 2, on February 11, 2003
0
10
20
30
40
50
60
70
80
0
:
0
2
:
0
0
1
:
3
6
:
0
0
3
:
1
0
:
0
0
4
:
4
4
:
0
0
6
:
1
8
:
0
0
7
:
5
2
:
0
0
9
:
2
6
:
0
0
1
1
:
0
0
:
0
0
1
2
:
3
4
:
0
0
1
4
:
0
8
:
0
0
1
5
:
4
2
:
0
0
1
7
:
1
6
:
0
0
1
8
:
5
0
:
0
0
2
0
:
2
4
:
0
0
2
1
:
5
8
:
0
0
2
3
:
3
2
:
0
0
Time (hh:mm:ss)
S
p
e
e
d
(
m
p
h
)
Fig. 3.12 Sample speed distribution from I-35 site, location 2, on February 11, 2003
48
0
50
100
150
200
250
0
:
0
2
:
0
0
1
:
4
0
:
0
0
3
:
1
8
:
0
0
4
:
5
6
:
0
0
6
:
3
4
:
0
0
8
:
1
2
:
0
0
9
:
5
0
:
0
0
1
1
:
2
8
:
0
0
1
3
:
0
6
:
0
0
1
4
:
4
4
:
0
0
1
6
:
2
2
:
0
0
1
8
:
0
0
:
0
0
1
9
:
3
8
:
0
0
2
1
:
1
6
:
0
0
2
2
:
5
4
:
0
0
Time (hh:mm:ss)
V
o
l
u
m
e
(
v
e
h
i
c
l
e
s
/
2
m
i
n
u
t
e
s
)
Fig. 3.13 Sample volume distribution from I-35 site, location 2, on February 11, 2003
3.5.2 AVI Data
The AVI data collected from the field for a selected day were first sorted based on the vehicle
identification number. These data were then sorted based on the AVI reader number and the
time stamp. Thus, the time a selected vehicle crosses different AVI antennas are grouped
together. MATLAB programs were developed to carry out this sorting and are shown in
Appendix D. After sorting the data, the data quality was checked before carrying out the travel
time calculation. For the AVI data, the quality control mainly included the removal of outliers.
The primary source of these outliers is motorists that are read at the starting station of the
corridor, exit the freeway, and then reenter the freeway later. This provides large outlier
readings of travel time. In the present case, threshold values were determined based on the
length of the section and the minimum and maximum reasonable travel time for that distance.
Also observations with magnitude more than four times the mean of the previous 10
observations were considered as outliers. However, none of the data considered in this
dissertation showed the presence of outliers. Once the outliers were removed, the link travel time
was calculated by matching unique tag reads recorded by the AVI readers at the start and the end
of the defined AVI links. The travel time was averaged for all the vehicles during the selected
49
study interval, 2 minutes. Figure 3.14 shows the travel time obtained from AVI on February 11,
2003.
0
10
20
30
40
50
60
0
0
:
1
4
:
3
8
.
4
4
0
2
:
5
7
:
1
0
.
7
4
0
6
:
1
2
:
5
8
.
7
0
0
8
:
4
6
:
1
0
.
5
5
1
1
:
0
2
:
2
3
.
6
5
1
2
:
4
1
:
5
9
.
9
2
1
3
:
3
7
:
0
3
.
2
3
1
4
:
2
1
:
4
1
.
3
2
1
6
:
0
6
:
4
3
.
9
5
1
7
:
0
1
:
5
8
.
5
2
1
8
:
1
3
:
1
9
.
0
4
1
9
:
0
1
:
0
6
.
5
6
2
0
:
2
7
:
1
2
.
9
6
2
1
:
5
5
:
3
9
.
4
8
2
2
:
2
1
:
0
3
.
0
5
Time (hh:mm:ss)
T
r
a
v
e
l
t
i
m
e
(
s
e
c
o
n
d
s
)
Fig. 3.14 Sample AVI travel time from I-35 site, on February 11, 2003
3.6 SIMULATED DATA USING CORSIM
Simulated data were generated using the simulation software CORSIM (CORridor SIMulation),
for testing the accuracy of the methods developed and techniques employed in the work.
CORSIM is one of the most widely used microscopic traffic simulation models in the United
States. CORSIM was developed by the Federal Highways Administration (FHWA) and includes
two separate simulation models, NETSIM (NETwork SIMulation) and FRESIM (FREeway
SIMulation). NETSIM is a traffic simulation model that describes in detail the operational
performance of vehicles traveling in an urban traffic network. FRESIM represents the
simulation of freeway traffic. The stochastic and dynamic nature of the model allows accurate
representation of actual traffic conditions (CORSIM Users Guide 2001). CORSIM simulates
traffic utilizing the car-following model. The basic idea of car-following models is that the
50
response of the following vehicles driver is dependent on the movement of the vehicle
immediately preceding it (May 1990). Car-following models are composed of equations that
give the acceleration of the following vehicle with respect to the behavior of the lead vehicle.
Thus, CORSIM simulates vehicles by maintaining space headway between simulated vehicles.
CORSIM can be used to model an existing field network and collect the flow, speed, occupancy,
or travel time data similar to that collected from the field. This simulated data can be used for
validating traffic models when there is a lack of field data.
CORSIM is designed primarily to represent the spatial interactions of drivers on a continuous,
rather than a discrete basis for analysis of freeway and arterial networks (Rilett et al. 2000).
CORSIM is a stochastic model, applying a time step simulation to describe traffic operations,
randomly assigning drivers and vehicles to the decision-making process. It applies time step
simulation, where one time step represents one second. Each vehicle is modeled as a distinct
object that is moved every second, while each variable control device in the network is also
updated every second for drivers to react. The input requirement includes network details and
traffic details. The network is made up of links and nodes, and the traffic demand is input as
volume in vehicles per hour. The output provides details such as travel time, delay, queues, and
environmental measures. Surveillance statistics like vehicle counts, percentage occupancy, and
average speed values can be obtained by choosing the detector option (CORSIM Users Guide
2001).
The traffic simulation for the present work used the FRESIM subcomponent. A traffic network
similar to the field test bed was created in CORSIM, and detectors were placed 0.5 miles apart.
Traffic volumes from the field were given as input to CORSIM at 30-minute intervals. A
corridor with seven links was generated for the present analysis. Detectors were placed in each
link to collect the flow, speed, and occupancy rate. The default parameters in CORSIM were
used since they gave acceptable results without much error, as shown below. Varying flow rates
were input to the simulation, based on field data in order to have simulated flow variations
comparable to the field data variations. A 15-minute initialization time was given for the system
to reach equilibrium. The inputs are given by modifying the corresponding record types. The
direct output from CORSIM will not contain travel time details. Hence, the binary time step file
(.tsd) that describes the state of each individual vehicle within the simulation model at each 1-
51
second time step in the simulation is used for the estimation of travel time. These data are stored
for each link and time step within the model and are specially designed to provide quick access
to data within each individual time step data (.tsd) file. A conversion program written in C++
was used to convert the binary time step data file to an ASCII file that could be utilized to
analyze the output results. The conversion program extracts vehicle-specific data at 1-second
time increments between specific nodes of the corridor, including node number, time step (in 1-
second increments), global vehicle identification number, vehicle fleet, vehicle type, vehicle
length, vehicle acceleration, and vehicle speed. Because the data included 1-second time
increments, the majority of vehicles on the link were included in multiple time steps as they
traversed the network. Hence, each vehicles entry and exit time were determined and the travel
time was then calculated as the difference between the entry and exit time. Programs were
developed in C programming language to carry out these operations. The obtained travel time
value for each of the individual vehicles was then averaged for 2-minute intervals and was used
for the validation.
The detector output is given in the .OUT file. Every 20 seconds data were extracted to be
comparable to the field scenario. Programs were developed in PERL to extract the speed, flow,
and occupancy values from the output file. The output obtained from CORSIM was used for
checking the validity of the optimization technique and travel time estimation procedure as
described in Chapters IV and V. The developed CORSIM network is given in Appendix C, and
the programs developed for extracting the data are given in Appendix D.
The simulated volumes were compared with the corresponding actual values to check how the
integrity of the original data is maintained. Simulated data and the corresponding field data for
February 11, 2003, are shown in Figure 3.15 for illustration. Mean absolute percentage error
(MAPE), as defined below, is calculated for each set of data to determine the change in the data
from the actual values.
MAPE = 100
actual estimated
actual
Number of observations
,
(3.7)
52
The MAPE value came to be 14%, showing that the simulated data represent the actual data
reasonably well.
0
50
100
150
200
250
0
:
0
2
:
0
0
1
:
5
2
:
0
0
3
:
4
2
:
0
0
5
:
3
2
:
0
0
7
:
2
2
:
0
0
9
:
1
2
:
0
0
1
1
:
0
2
:
0
0
1
2
:
5
2
:
0
0
1
4
:
4
2
:
0
0
1
6
:
3
2
:
0
0
1
8
:
2
2
:
0
0
2
0
:
1
2
:
0
0
2
2
:
0
2
:
0
0
2
3
:
5
2
:
0
0
Time (hh:mm:ss)
V
o
l
u
m
e
(
v
e
h
i
c
l
e
s
/
2
m
i
n
u
t
e
s
)
Actual
Simulated
Fig. 3.15 Simulated data and the corresponding field data for February 11, 2003, for location 1
3.7 CONCLUDING REMARKS
This chapter described the details of the study corridors used, the data used in the analysis, and
the preliminary data reduction. The data were collected from the archived collection of the
TransGuide system in San Antonio. The study sites were selected from the I-35 N freeway in
San Antonio, since it was equipped with both loop detectors and AVI. Details about the working
of AVI and loops were described briefly before presenting the details of the data used. The
details of the test bed and the collected data were given next. The preliminary data quality
checks and the corrections carried out were detailed subsequently. These data quality control
procedures were based on previous investigations. Data were also simulated using CORSIM
simulation software, and the details were given in the last section. A network similar to the field
network was generated in CORSIM, and the data were used for checking the validity of the
53
models developed.
All of the preliminary data quality control techniques discussed in this chapter are useful to
correct data collected at a single location, and therefore cannot account for systematic problems
over a series of detectors. For an application such as travel time estimation, there is a need to
consider a series of detectors. In such cases when detectors are analyzed as a series, more
discrepancies are identified in the data, even after applying the screening methods at individual
locations. For instance, if the total number of vehicles counted by two consecutive detector
locations is observed over a period of time, the difference in the cumulative counts should not
exceed the number of vehicles that can be accommodated in that length of the road under the jam
density condition. However, this constraint will be violated if some of the detectors are under-
or overcounting vehicles. For many traffic applications such as incident detection, this might not
be an issue. However, for other applications that rely on accurate system counts, such as origin-
destination (OD) estimation and certain travel time estimation techniques, this can be a problem.
While most of the existing error detection and diagnostic tests do take into account possible
malfunctions of the loop detector by looking at the data at a specific point, the problems related
to balancing consecutive detector data for vehicles being under- or overcounted has not been
well addressed. This lack of interest in this area may be due to current applications being based
on data generated at a particular station point rather than series of station points at the same time.
In other words, since the error does not adversely affect the result of the applications, they are
typically ignored. However, if the loop detector data are to be successfully used for new
applications, these issues of system data quality will need to be addressed.
Thus, most of the existing error detection and diagnostic tests do take into account possible
malfunctions of the loop detector by looking at the data at a specific point. However, the
problems related to balancing consecutive detector data for vehicles being under- or overcounted
have not been well addressed. This analysis of the detector data as a series, the problems related
to balancing consecutive detector data, and the correcting methodology suggested forms the crux
of the next chapter.
54
CHAPTER IV
OPTIMIZATION FOR DATA DIAGNOSTICS
4.1 INTRODUCTION
Chapter III discussed the preliminary data quality control carried out on detector data at
individual locations. It was concluded that while substantial failures in loop detector data are
easily identified using the current technologies, more subtle failures such as biases in volume
counts may go unidentified and, hence, there is a need to analyze the data at a system level. For
example, in an application such as estimating travel time between two detector stations, where
the data from the neighboring detectors need to be compared, there is a need to check the
conservation of vehicles principle. Conservation of flow is one of the basic traffic principles that
any volume data as a series must meet. In this dissertation the conservation of vehicles is
checked by comparing the cumulative flow curves from consecutive detector stations. As
discussed in Chapter II, very few studies have been reported that systematically analyzed a series
of detector locations over a long interval of time to check whether the collected data follow the
conservation of vehicles. Most of those studies, when faced with a violation of conservation of
vehicles, suggested applying simple adjustment factors to rectify the problem, rather than
applying any systematic methodology.
In this dissertation a correction procedure based on nonlinear optimization is used for identifying
and correcting the data when the conservation of vehicles principle is violated. The generalized
reduced gradient method is chosen, where the objective function and constraints are selected
such that the conservation of vehicles principle is followed with least change to the original data.
Figure 4.1 shows the general flow chart for the overall data reduction process. Note that the
proposed optimization technique can also be readily adapted for other applications. Two such
applications, namely, to impute missing data, and to locate the worst performing detector station
among a series of detectors are also illustrated in this chapter.
Part of this chapter is reprinted with permission from Loop detector data diagnostics based on
conservation of vehicle principle by Vanajakshi, L. and Rilett, L. R., Accepted for publication in Transp.
Res. Rec., TRB, National Research Council, Washington, D. C.
55
Fig. 4.1 Algorithm for the overall proposed method
START
STOP
STOP
Follow conservation
of vehicles?
Optimize using GRG
Check individual speed, volume, and occupancy
values with the defined threshold values
Collect the loop detector field data either in 20- or
30-second intervals
Check for unreasonable combinations of the flow,
speed, and occupancy values
NO
YES
Check for missing intervals
Optimization Process
56
The following section details the conservation principle in vehicular traffic and the related
literature. The GRG method is detailed next, followed by the actual implementation of the
procedure for an example problem. Next, the validation of the optimization procedure is
illustrated using simulated data. The applicability of the method for different conditions and
other applications is detailed in the last section.
4.2 CONSERVATION OF VEHICLES
The concept of conservation of vehicles (Lighthill and Whitham 1955; Richards 1956) states that
the difference between the number of vehicles entering and leaving a link during a specific time
interval corresponds to the change in the number of vehicles traveling on the link. The simplest
and the most general way in which this can be stated is that vehicles cannot be created or lost
along the road (Daganzo 1997).
This concept is further explained using a one-lane road with two detectors located at each end, as
shown in Figure 4.2. The number of vehicle arrivals and departures are measured continuously
and aggregated regularly at the upstream location x
1
and downstream location x
2
, respectively.
n(t), k(t)
q(x
1
, t)
q(x
2
, t)
Fig. 4.2 Illustration of the conservation of vehicles
Referring to Figure 4.2, let q(x
1
, t) denote the flow measured at location x
1
at time t, and let
q(x
2
, t) denotes the flow measured at location x
2
at the same time t. Let n(t) be the number of
vehicles traveling over the link distance dx between the detector stations x
1
and x
2
at time t and
k(t) the corresponding density of vehicles.
Under the principle of conservation of vehicles, the number of vehicles on the length of road dx
x
1
x
2
dx
57
between upstream location x
1
and downstream location x
2,
in an interval of time dt must equal the
difference between the number of vehicles entering the section at x
1
and the number of vehicles
leaving the section at x
2
, which is equal to x
1
+dx, in that time interval. If the number of vehicles
on the length dx at time t is k dx and the number of vehicles entering in time dt at x is expressed
as q dt, then the conservation equation is as shown below (Drew 1968).
,
k q
k dx k dt dx q dt q dx dt
t x
= +
| | | |
| |
\ . \ .
(4.1)
where,
q
k
dx
dt
= flow (vehicles per hour),
= density (vehicles per mile),
= length of road (miles), and
= time interval (hours).
Based on the fact that q = ku, where u is the space mean speed in vehicles per unit time, the
following simplified form for the above equation may be derived (Drew 1968; Kuhne and
Michalopoulos 1968).
0.
k q
t x
+ =
(4.2)
Let Q(x
1
,t
n
) and Q(x
2
,t
n
) be the cumulative number of vehicles entering and exiting the link,
respectively, from time t
1
to t
n
, which can be expressed as
1 1
1
( , ) ( , )
n i
n
Q x t t q x t
i
=
=
and (4.3)
2 2
( , ) ( , )
1
n
Q x t t q x t
n
i
i
=
=
. (4.4)
Under ideal conditions, the cumulative volume at an upstream location should be more than or
equal to the cumulative volume at the downstream location at any instant of time. Based on
Figure 4.2 this can be expressed as:
58
1 2
( , ) ( , ) Q x t Q x t
n n
. (4.5)
The equality condition in equation 4.5 holds for the case when all the vehicles that entered the
section had exited by the end of the time interval.
Also, the maximum difference between the upstream and downstream location cumulative flows
cannot exceed the maximum number of vehicles that can be accommodated between these two
locations at jam density as expressed in Equation 4.6.
1 2
( , ) ( , ) Q x t Q x t n
n n
jam
, (4.6)
where,
jam
n = maximum number of vehicles that can be accommodated between locations x
1
and x
2
at jam density.
Thus, if there are no systematic errors present in the data, the difference in the total number of
vehicles counted by the two consecutive detectors should equal the number of vehicles between
the two detector locations as shown in equation 4.7.
1 2
( , ) ( , ) ( ). Q x t Q x t n t
n n n
= (4.7)
Based on the cumulative flows recorded at x
1
and
x
2
, there can be two scenarios in which the
conservation of vehicles principle can be violated. In the first case when Q(x
2
,t
n
) becomes more
than Q(x
1
,t
n
), extra vehicles are said to be created. In the second case when Q(x
1
,t
n
) is more
than Q(x
2
,t
n
) and the difference is larger than the maximum number of vehicles that can be
accommodated in the road length under consideration, vehicles are said to be lost. Both of
these conditions violate the conservation of vehicles principle. These differences can be due to
errors of the detectors at the upstream location, the downstream location, or both.
Reported studies that checked the conservation of vehicles are limited and include Zuylen and
Brantson (1982), Petty (1995), Zhao et al. (1998), Cassidy (1998), Nam and Drew (1996, 1999),
Kikuchi et al. (1999, 2000), Kikuchi (2000), Windover and Cassidy (2001), and Wall and
Dailey (2003). While all of the above studies acknowledged the fact that conservation is
59
violated, few of them (Zuylen and Brantson 1982; Petty 1995; Nam and Drew 1996, 1999;
Kikuchi et al. 1999, 2000; Wall and Dailey 2003) discussed methods of correcting this problem.
Zuylen and Brantson (1982) developed a methodology that relied on an assumption about the
statistical distribution of the data to eliminate the discrepancy in the data. The algorithm is
developed assuming a Poisson distribution or a normal distribution. Pettty (1995), in a report on
the development of a program for freeway service patrol, discussed how to correct the loop
detector count data, based on the conservation of vehicles principle. The correction procedure
suggested was to use compensation factors, which are computed as a fraction of the flow from
the detector under consideration to the neighboring main lane flow. Nam and Drew (1996, 1999)
and Wall and Dailey (2003) used a simple adjustment factor for correcting the discrepancy. Nam
and Drew calculated adjustment factors as the ratio of inflow to outflow for every 30-minute
and adjusted the flow at the downstream point accordingly to balance the flow. The investigation
by Wall and Dailey (2003) required one properly calibrated reference detector that can be
assumed to be correct in order to calculate the correction factor. Kikuchi et al. (1999) studied an
arterial signalized network and proposed methodologies to adjust the observed values using the
concept of fuzzy optimization. Kikuchi et al. (2000) reported six different methods that can be
used to adjust traffic volume data so that they will follow vehicle conservation and thus be useful
for the subsequent analysis steps. They concluded that there is no single unique method that can
be used under different situations. The data they used for analysis were from a small arterial
network, and a single hourly inflow and outflow at each signal was compared.
From the above discussion, it can be seen that there is no systematic studies reported for
correcting continuous data collected from freeways when the data violate the conservation of
vehicles. In the case of freeway data, the suggested methods are limited to the use of simple
correction factors. This may work in situations where the analysis is for a small section of
roadway or for a short duration of time where the resulting discrepancy is small in magnitude.
For example, in the study reported by Nam and Drew (1999), the analysis was for a 4-hour
period and the magnitude of the error was 200 vehicles over the total period. However, for most
real-life traffic applications, the number of locations as well as the duration of study will be
large. Hence, the amount of discrepancy may become large as well. In such cases, a systematic
method is needed for diagnosing the data. The nature of this problem can be summarized as
follows:
60
Given: a set of vehicle volumes from consecutive locations.
Objective: adjust the volumes such that the values are consistent with respect to conservation of
vehicles.
Ideally, a method to solve the problem that finds a consistent set of adjusted values for a given
set of observed values meeting the following requirements is needed:
1. To ensure conservation of flow at any point at any time,
2. To handle situations in which some data are missing or questionable,
3. To preserve the integrity of the observed data as much as possible, and
4. To handle a large amount of data (for example, continuous 24-hour data per day) in a
systematic manner in a short computation time.
In this dissertation, an optimization approach that can meet the above requirements is selected to
balance the loop detector data. The details of the selected method and its implementation are
discussed in the following section. This dissertation represents the first application of this kind of
an optimization technique for quality control of ILD data collected from freeways.
4.3 GENERALIZED REDUCED GRADIENT OPTIMIZATION PROCEDURE
Of the different methods currently used in engineering optimization fields, the most popular are
the methods based on linearization of the problem because they are easy to solve (Gabriele and
Beltracchi 1987). Such methods include successive linear programming, methods of feasible
directions, and the generalized reduced gradient method (Gabriele and Beltracchi 1987). Each of
these methods is based on linearizing the objective function and constraints at some stage of
problem solving to determine a direction of search. This direction is then searched for the local
improvement in the objective function, while at the same time avoiding severe violation of the
constraints (Gabriele and Beltracchi 1987).
GRG is one of the most popular techniques among the above and has a reputation for its
robustness and efficiency (Venkataraman 2001; Eiselt et al. 1987). The GRG method is an
61
extension of the Wolfe reduced gradient method (Wolfe 1963, 1967), which solves problems
with linear constraints and a nonlinear objective function (Abadie and Carpenter 1969;
Himmelblau 1972). The extension in the GRG algorithm from the Wolfe algorithm is to take into
account nonlinear constraints also. The general steps involved in a GRG optimization are as
follows (Abadie 1970; Gabriele and Ragsdell 1977):
1. Partition the variables into dependent and independent categories, based on the
number of equality constraints involved,
2. Compute the reduced gradient,
3. Determine the direction of progression of the independent variables and modify
them, and
4. Modify the dependent variables in order to verify the constraints.
The GRG algorithm solves the original problem by solving a sequence of reduced problems. The
reduced problems are solved by a gradient method (Lasdon et al. 1978). The general form of a
GRG problem will be as follows:
Minimize F (X), (4.8)
Subject to g
j
(X) < 0, which will be converted to g
j
(X) + X
j+n
= 0,
or g
j
(X) > 0, which will be converted to g
j
(X) X
j+n
= 0,
h
k
(X) = 0,
(4.9)
(4.10)
(4.11)
where,
X = column vector of design variables,
F(X) = objective function,
g = inequality constraints,
h = equality constraints,
X
j+n
= slack/surplus variables,
j = number of inequality constraints (1, m),
k = number of equality constraints (1, l), and
n = number of original variables.
62
To start with, the slack/surplus variables (X
j+n
) are included in the original set of design
variables, thus having n+m total variables. The X vector now includes the original variables as
well as the slack/surplus variables. The variables are then partitioned in to (n l)
independent/decision/basic variables (Z) and (m+l) dependent/state/nonbasic variables (Y).
Now, with these variables and with all equality constraints, the original optimization task can be
stated as:
Minimize F(X) = F(Z, Y)
T
, (4.12)
Subject to h
j
(X) = 0, j = 1, m+l (4.13)
Now, differentiating the above objective and constraint functions yields,
( ) )
T T
dF
Z Y
= + X F(X) dZ F(X dY, (4.14)
( ) ( ) ( ) dh
j j j Z Y
= + X h X dZ h X dY, j = 1, m+l (4.15)
where, subscripts Z and Y correspond to the gradient with respect to the dependent and
independent variables, respectively.
Now, equation 4.15 can be rewritten as follows:
dh (X) =
(
(
(
(
(
(
(
(
(
(
(
+
) (
.
.
.
) (
) (
) (
3
2
1
X h
X h
X h
X h
l m
T
Z
T
Z
T
Z
T
Z
dZ +
(
(
(
(
(
(
(
(
(
(
(
+
) (
.
.
.
) (
) (
) (
3
2
1
X h
X h
X h
X h
l m
T
Y
T
Y
T
Y
T
Y
dY , (4.16)
or
dh (X) = A dZ + B dY, (4.17)
where, A is an (m+l) (n l) matrix and B is an (m+l) (m+l) matrix, since there are (n l) Z
variables and (m+l) Y variables. One restriction here is that the B matrix should not be singular
63
(i.e. the inverse of the matrix should exist). If it becomes singular, the selection of the dependent
and independent variable need to be changed such that B will not be singular.
For any change in the decision variables, the equality constraints must remain satisfied for
feasibility. It follows that ( )
j
dh X = 0, for j = 1, m+l in equation 4.15 for any change in the
independent variable dZ. Since dh(X) = 0, equation 4.17 can be solved for the corresponding
change dY in the dependent variables in order to maintain feasibility.
dY =
1
.
B A dZ (4.18)
Substituting Equation 4.18 into Equation 4.14 and rearranging, one gets the following
expression,
dF =
{ }
1 T T
Z Y
(
F(X) F(X) B A dZ . (4.19)
The generalized reduced gradient G
R
is defined by
( ) dF
dZ
X
and can be represented as:
( ) dF
R
dZ
= =
X
G
1 T
Y
Z
(
F(X) B A F(X) . (4.20)
The generalized reduced gradient can now be used to determine the search direction S in the
decision variables as:
S = G
R
. (4.21)
64
Then a one-dimensional search is performed with respect to the independent variable. For a
selected step size, searching in the search direction, the dependent vector is updated using
Newtons method for solving simultaneous nonlinear equations for dY. Having found the
minimum in the search direction, the process is repeated until convergence is achieved
(Vanderplaats 1984). In this case the convergence criterion was when all the constraints reach a
value of 110
-4
. The search direction is found such that any active constraints remain precisely
active for some small move in that direction. If a move results in an active constraint being
violated, Newtons method is used to return to the constraint boundary. More details of the GRG
method as well as the available software for this method can be found in Gabriele and Ragsdell
(1980), Lasden and Warren (1978), and Abadie (1978). Figure 4.3 shows the steps of the GRG
algorithm discussed in this section for a one-dimensional search as a flowchart (Vanderplaats,
1984).
For the present problem of adjusting the detector data for violation of the conservation of
vehicles principle, the optimization problem can be formulated for a series of I detectors in
sequence as:
65
Fig. 4.3 Algorithm for the GRG method
Start
Specify basic and nonbasic (dependent and
independent) variables
Calculate the gradients of the objective
function and the constraints
Calculate the reduced gradient
Determine the search direction
Perform the search with respect to
the independent variables
Update the dependent
variables
Converge?
Exit
Yes
No
Given objective function, constraints,
and design variables
66
( )
2
1
1
( ) ( )
min
1
I
i i
Q Q
t t
i
+
=
,
(4.22)
Where,
t = time,
i = detector number,
Q
(i)
= cumulative number of vehicles at detector i,and
I = total number of locations,
subject to the constraints,
( )
( ) ( 1)
0
i i
Q Q
t t
+
, i =1,m-1
( )
( ) ( 1) i i
Q Q z
t t
+
, i =1,m-1and
( )
( ) ( )
0,
1
i i
Q Q
t
t
i =1,m
where z is the maximum number of vehicles that can be accommodated between the
two locations.
(4.23)
(4.24)
(4.25)
The constraints in this case are selected based on the restrictions discussed earlier. The first
constraint, shown in equation 4.23, is based on the condition that the cumulative flow at the first
detector location should be greater than or equal to the cumulative flow at location 2, which in
turn should be greater than or equal to the cumulative flow at location 3, at all times. The second
constraint, shown in equation 4.24, stipulates that the maximum difference cannot exceed the
maximum number of vehicles that can be accommodated in that road length at jam density
conditions. The constraint shown in equation 4.25 is that the value at a particular time step
cannot be less than the value for the previous time step, since the variables used are cumulative
values.
4.4 IMPLEMENTATION
To illustrate the procedure, a corridor consisting of three detectors is considered for analysis. The
detector locations on San Antonio I-35 freeways were spaced approximately 0.5 miles apart,
making the corridor length approximately 1.5 miles. The three consecutive detector locations
67
selected were detector numbers 159.500, 159.998, and 160.504 as shown in Figure 3.10. The
analysis was carried out for a period of 24 hours for all 5-days under consideration.
As said earlier, the objective function was to adjust the observed volumes to meet the
conservation of vehicles constraint. The objective function and associated constraints given in
equations 4.22 4.25 are modified to suit the three detector series as given below:
( ) ( )
2 2
(1) (2) (2) (3)
min Q Q Q Q
t t t t
+
(
(
,
(4.26)
where,
t = current time, and
Q = cumulative number of vehicles at each detector,
subject to the constraints,
(1) (2)
( ) 0 Q Q
t t
, and
(2) (3)
( ) 0 Q Q
t t
,
(1) (2)
( )
t t
Q Q z , and
(2) (3)
( ) ; 500
t t
Q Q z z = , and
(1) (1)
1
0
t t
Q Q
,
(2) (2)
1
0
t t
Q Q
, and
(3) (3)
1
0.
t t
Q Q
(4.27)
(4.28)
(4.29)
The z value is calculated based on the known distance between the two consecutive detectors and
an assumed average vehicle length. Given that the length of road is 0.5 miles (805 m), the
maximum number of vehicles at jam density, assuming 25 feet (7.7 m) as the average vehicle
length, is 105 vehicles per lane between the two detector locations. Hence, the maximum
difference in the cumulative volumes between each pair cannot theoretically exceed 315 for the
three lanes, cumulatively. A z value of 500 was used in this dissertation as the maximum number
of vehicles that can be accommodated in the study length.
The cumulative volumes for the selected three consecutive detector locations for all 5 days were
studied first. One sample plot for a 24-hour period on February 11, 2003, is shown in Figure 4.4.
As discussed earlier, if conservation of vehicles is followed, the cumulative volume at location 1
should always be greater than or equal to that at location 2, which in turn should be greater than
or equal to location 3 values at all time intervals. Also, the maximum difference should be less
68
than the maximum number of vehicles that can be accommodated. Contrary to this, in the I-35
cumulative volume plot in Figure 4.4, it may be seen that the location 2 volume is consistently
lower than that of location 3. This is shown enlarged in Figure 4.5 for a 1-hour period from
8:00:00 to 9:00:00. Also, the cumulative flow at location 3 became larger than both locations 1
and 2 at certain points, as shown enlarged in Figure 4.6 from 18:00:00 to 19:00:00. These are
clearly violations of the conservation of vehicles principle, and show the necessity to check for
the systematic errors even after standard error checking has been carried out.
It is clear from the results that some or all of the detectors under consideration are
malfunctioning. There are two different ways of approaching this problem further. The first case
is where the specific detectors that are malfunctioning are determined by collecting the
corresponding ground truth data for each of the detectors involved. In this case one can find out
the exact detectors that are malfunctioning by comparing with the ground truth data and the
corrections can be applied to these detectors alone. In the second case, the ground truth data may
not be available and the corrections have to be applied based on some assumptions as to which
are the detectors that need to be corrected. For example, the error in the data from the three
locations shown in Figure 4.4 can be from any of the 11 detectors involved. The only way to
pinpoint the malfunctioning detector(s) is by a manual data collection to be carried out at each of
the 11 detector points and compare with the corresponding detector data. However, most of the
studies that use detector data do not collect the ground truth data. One reason for this may be that
the manual data collection can be very expensive, especially if the analysis is for a long period of
time over a long stretch of roadway. Moreover, most of the research studies using detector data
use archived loop detector data for model development, calibration, and validation. The
availability of ground-truth data for the archived data is very low as in the present dissertation.
6
9
-
10,000
20,000
30,000
40,000
50,000
60,000
70,000
0
:
0
2
:
0
0
2
:
0
2
:
0
0
4
:
0
2
:
0
0
6
:
0
2
:
0
0
8
:
0
2
:
0
0
1
0
:
0
2
:
0
0
1
2
:
0
2
:
0
0
1
4
:
0
2
:
0
0
1
6
:
0
2
:
0
0
1
8
:
0
2
:
0
0
2
0
:
0
2
:
0
0
2
2
:
0
2
:
0
0
Time (hh:mm:ss)
c
u
m
u
l
a
t
i
v
e
v
o
l
u
m
e
(
v
e
h
i
c
l
e
s
)
Location 1
Location 2
Location 3
Fig. 4.4 Cumulative actual volumes for 24 hours at I-35 site on February 11, 2003
70
10,000
11,000
12,000
13,000
14,000
15,000
8
:
0
0
:
0
0
8
:
0
6
:
0
0
8
:
1
2
:
0
0
8
:
1
8
:
0
0
8
:
2
4
:
0
0
8
:
3
0
:
0
0
8
:
3
6
:
0
0
8
:
4
2
:
0
0
8
:
4
8
:
0
0
8
:
5
4
:
0
0
9
:
0
0
:
0
0
Time (hh:mm:ss)
C
u
m
u
l
a
t
i
v
e
v
o
l
u
m
e
(
v
e
h
i
c
l
e
s
)
Location 1
Location 2
Location 3
Fig. 4.5 Enlarged cumulative volumes for 1 hour at I-35 site on February 11, 2003
40,000
42,000
44,000
46,000
48,000
50,000
52,000
54,000
1
8
:
0
0
:
0
0
1
8
:
0
6
:
0
0
1
8
:
1
2
:
0
0
1
8
:
1
8
:
0
0
1
8
:
2
4
:
0
0
1
8
:
3
0
:
0
0
1
8
:
3
6
:
0
0
1
8
:
4
2
:
0
0
1
8
:
4
8
:
0
0
1
8
:
5
4
:
0
0
1
9
:
0
0
:
0
0
Time (hh:mm:ss)
C
u
m
u
l
a
t
i
v
e
v
o
l
u
m
e
(
v
e
h
i
c
l
e
s
)
Location 3
Location 1
Location 2
Fig. 4.6 Enlarged cumulative volumes for 1 hour at I-35 site on February 11, 2003
71
In this dissertation, an assumption was made that any of the detectors could be malfunctioning
and hence the optimization is done equally to all the detectors under study. However, as
discussed already, if the data for cross-checking are available and the malfunctioning detector(s)
is(are) exactly known, the present method can still be applied specifying the particular location
data to be optimized, instead of optimizing the data from all the locations.
To summarize, it was found that the field data, even after the preliminary data quality control,
violated the conservation of vehicles principle. An assumption of equal error from all the
detectors involved is made due to lack of specific data on which detector(s) are malfunctioning.
Hence, the algorithm developed for removing the type of discrepancies in the data set should (1)
make the cumulative flow at successive detector points smaller than the previous point and (2)
keep the difference between the cumulative flows in the successive points not more than the
maximum number of vehicles that can be accommodated within that length of road. The GRG
algorithm with the objective function and constraints as given in equations 4.26-4.29 was
implemented using MATLAB.
As discussed in the GRG theory, the first step in the implementation was to convert the
inequality constraints to equality constraints. Thus, equations 4.27 4.29 need be converted to
equality constraints by adding suitable variables as shown below:
(1) (2)
( ) 0
1
Q Q v
t t
= , and
(2) (3)
2
( ) 0
t t
Q Q v = , (4.30)
(1) (2)
3
( )
t t
Q Q v z + = , and
(2) (3)
4
( )
t t
Q Q v z + = , (4.31)
(1) (1)
1 5
0
t t
Q Q v
= ,
(2) (2)
1 6
0
t t
Q Q v
= , and
(3) (3)
1 7
0.
t t
Q Q v
= (4.32)
As can be seen from equation 4.30-4.32 there are six original variables (n = 6) and are
(1)
t
Q ,
(2)
,
t
Q
(3)
,
t
Q
(1)
1
,
t
Q
(2)
1
,
t
Q
and
(3)
1 t
Q
,
(2) (2)
0
1
Q Q
t
t
,
(3) (3)
0
1
Q Q
t
t
,
(4) (4)
0
1
Q Q
t
t
, and
(5) (5)
0
1
Q Q
t
t
.
(4.34)
(4.35)
(4.36)
Here, the number of original variables is 10, and the number of slack/surplus variables is 13,
making a total of 23 variables. The size of the resulting A and B matrix will be (1310) and
(1313), respectively.
It was found that the complexity of the problem as well as the computational time increases with
increase in the number of variables. This can be seen from the increase in the size of A matrix
from (76) to (1310) and B matrix from (77) to (1313), when the analysis was changed
89
from 3-detector series to 5-detector series. This increase in size of the matrices will lead to more
computational time for the matrix operations involved such as matrix inversion, multiplication,
etc. For example, the optimization of a one-day data (24-hours data having 720 data records)
iterates the matrix manipulations approximately 1600 times for each record (i.e. a total of 720
1600 = 1152000 times). For a 3-detector data series, the matrix sizes were (76) and (77) and
for a five day data, these matrix operations have to be performed on (1310) and (1313)
matrices. This lead to high computation times and, hence, in this dissertation only one sample
run was conducted for a five-detector series, just to illustrate the performance of the optimization
method for longer sections.
Sample results from February 10, 2003, for a series of five detectors are shown in Figures 4.28
and 4.29. This included all the five detector locations shown in Figure 3.10. Figure 4.28 shows
the cumulative flows before optimization for a 1-hour period from 06:30:00 to 07:30:00. It can
be seen that the conservation of vehicles is violated, with location 4 cumulative volume greater
than the cumulative values at locations 1, 2, or 3. Figure 4.29 shows the cumulative volumes
after optimization for the same time period. Similar figures for another 1-hour period from
08:00:00 09:00:00 on February 10, 2003, are shown in Figures 4.30 and 4.31. In Figure 4.30 it
can be seen that the cumulative flow at location 2 is more than that at location 1. Figure 4.31
shows the corresponding cumulative volumes after optimization. Thus, the optimization
procedure proves to be useful for optimizing longer sections with more detectors.
Tables 4.1 and 4.2 show the minimum and maximum values in the original and optimized 24-
hour data for all 5 days. From the results it can be seen that the optimization method was able to
correct data under varying traffic flow conditions for longer sections also.
9
0
5,000
6,000
7,000
8,000
9,000
10,000
11,000
12,000
13,000
14,000
15,000
6
:
3
0
:
0
0
6
:
3
2
:
0
0
6
:
3
4
:
0
0
6
:
3
6
:
0
0
6
:
3
8
:
0
0
6
:
4
0
:
0
0
6
:
4
2
:
0
0
6
:
4
4
:
0
0
6
:
4
6
:
0
0
6
:
4
8
:
0
0
6
:
5
0
:
0
0
6
:
5
2
:
0
0
6
:
5
4
:
0
0
6
:
5
6
:
0
0
6
:
5
8
:
0
0
7
:
0
0
:
0
0
7
:
0
2
:
0
0
7
:
0
4
:
0
0
7
:
0
6
:
0
0
7
:
0
8
:
0
0
7
:
1
0
:
0
0
7
:
1
2
:
0
0
7
:
1
4
:
0
0
7
:
1
6
:
0
0
7
:
1
8
:
0
0
7
:
2
0
:
0
0
7
:
2
2
:
0
0
7
:
2
4
:
0
0
7
:
2
6
:
0
0
7
:
2
8
:
0
0
7
:
3
0
:
0
0
Time (hh:mm:ss)
C
u
m
u
l
a
t
i
v
e
v
o
l
u
m
e
(
v
e
h
i
c
l
e
s
)
Location 1
Location 2
Location 3
Location 4
Location 5
2
1
3
5
4
Fig. 4.28 Cumulative actual volumes for 1 hour on February 10, 2003, for five consecutive detector stations from 159.500 to 161.405
9
1
6,000
7,000
8,000
9,000
10,000
11,000
12,000
6
:
3
0
:
0
0
6
:
3
2
:
0
0
6
:
3
4
:
0
0
6
:
3
6
:
0
0
6
:
3
8
:
0
0
6
:
4
0
:
0
0
6
:
4
2
:
0
0
6
:
4
4
:
0
0
6
:
4
6
:
0
0
6
:
4
8
:
0
0
6
:
5
0
:
0
0
6
:
5
2
:
0
0
6
:
5
4
:
0
0
6
:
5
6
:
0
0
6
:
5
8
:
0
0
7
:
0
0
:
0
0
7
:
0
2
:
0
0
7
:
0
4
:
0
0
7
:
0
6
:
0
0
7
:
0
8
:
0
0
7
:
1
0
:
0
0
7
:
1
2
:
0
0
7
:
1
4
:
0
0
7
:
1
6
:
0
0
7
:
1
8
:
0
0
7
:
2
0
:
0
0
7
:
2
2
:
0
0
7
:
2
4
:
0
0
7
:
2
6
:
0
0
7
:
2
8
:
0
0
7
:
3
0
:
0
0
Time (hh:mm:ss)
C
u
m
u
l
a
t
i
v
e
v
o
l
u
m
e
(
v
e
h
i
c
l
e
s
)
Location 1
Location 2
Location 3
Location 4
Location 5
Fig. 4.29 Cumulative optimized volumes for 1 hour on February 10, 2003, for five consecutive detector stations from 159.500 to 161.405
9
2
12,000
13,000
14,000
15,000
16,000
17,000
18,000
19,000
8
:
0
0
:
0
0
8
:
0
2
:
0
0
8
:
0
4
:
0
0
8
:
0
6
:
0
0
8
:
0
8
:
0
0
8
:
1
0
:
0
0
8
:
1
2
:
0
0
8
:
1
4
:
0
0
8
:
1
6
:
0
0
8
:
1
8
:
0
0
8
:
2
0
:
0
0
8
:
2
2
:
0
0
8
:
2
4
:
0
0
8
:
2
6
:
0
0
8
:
2
8
:
0
0
8
:
3
0
:
0
0
8
:
3
2
:
0
0
8
:
3
4
:
0
0
8
:
3
6
:
0
0
8
:
3
8
:
0
0
8
:
4
0
:
0
0
8
:
4
2
:
0
0
8
:
4
4
:
0
0
8
:
4
6
:
0
0
8
:
4
8
:
0
0
8
:
5
0
:
0
0
8
:
5
2
:
0
0
8
:
5
4
:
0
0
8
:
5
6
:
0
0
8
:
5
8
:
0
0
9
:
0
0
:
0
0
Time (hh:mm:ss)
C
u
m
u
l
a
t
i
v
e
v
o
l
u
m
e
(
v
e
h
i
c
l
e
s
)
Location 1
Location 2
Location 3
Location 4
Location 5
Fig. 4.30 Cumulative actual volumes for 1 hour on February 10, 2003, for five consecutive detector stations from 159.500 to 161.405
9
3
13,500
14,000
14,500
15,000
15,500
16,000
16,500
17,000
17,500
18,000
8
:
0
0
:
0
0
8
:
0
2
:
0
0
8
:
0
4
:
0
0
8
:
0
6
:
0
0
8
:
0
8
:
0
0
8
:
1
0
:
0
0
8
:
1
2
:
0
0
8
:
1
4
:
0
0
8
:
1
6
:
0
0
8
:
1
8
:
0
0
8
:
2
0
:
0
0
8
:
2
2
:
0
0
8
:
2
4
:
0
0
8
:
2
6
:
0
0
8
:
2
8
:
0
0
8
:
3
0
:
0
0
8
:
3
2
:
0
0
8
:
3
4
:
0
0
8
:
3
6
:
0
0
8
:
3
8
:
0
0
8
:
4
0
:
0
0
8
:
4
2
:
0
0
8
:
4
4
:
0
0
8
:
4
6
:
0
0
8
:
4
8
:
0
0
8
:
5
0
:
0
0
8
:
5
2
:
0
0
8
:
5
4
:
0
0
8
:
5
6
:
0
0
8
:
5
8
:
0
0
9
:
0
0
:
0
0
Time (hh:mm:ss)
C
u
m
u
l
a
t
i
v
e
v
o
l
u
m
e
(
v
e
h
i
c
l
e
s
)
Location 1
Location 2
Location 3
Location 4
Location 5
Fig. 4.31 Cumulative optimized volumes for 1 hour on February 10, 2003, for five consecutive detector stations from 159.500 to 161.405
9
4
Table 4.1 Data Details at the Study Sites Before Optimization
Number of vehicles in
link 1 before optimization
Number of vehicles in
link 2 before optimization
Number of vehicles in
link 3 before
optimization
Number of vehicles in link
4 before optimization
Date
Minimum Maximum Minimum Maximum Minimum Maximum Minimum Maximum
February 10, 2003 -4922 195 -136 1747 -2934 40 7 11167
February 11, 2003 -5090 126 -128 2340 -3232 -3 2 12273
February 12, 2003 -4387 9 -53 1868 -3038 -1 -4249 1421
February 13, 2003 -5015 79 -149 2001 -3757 -15 1 12833
February 14, 2003 -3564 303 -4 2201 -3287 2 12 12744
Table 4.2 Data Details at the Study Sites After Optimization
Number of vehicles in
link 1 after optimization
Number of vehicles in link
2 after optimization
Number of vehicles in
link 3 after optimization
Number of vehicles in
link 4 after optimization
Date
Minimum Maximum Minimum Maximum Minimum Maximum Minimum Maximum
February 10, 2003 0 216 0 110 0 89 0 54
February 11, 2003 0 109 0 69 0 105 0 59
February 12, 2003 0 183 0 136 0 70 0 66
February 13, 2003 0 186 0 124 0 86 0 49
February 14, 2003 0 498 0 220 0 71 0 39
95
4.6 VALIDATION
Validation of the optimization procedure can be carried out either using field ground truth data
or using simulated data. However, as discussed earlier, the present study was using archived data
and the corresponding ground truth flow data were not available. Hence, simulated data
generated using CORSIM simulation software was used for validation purposes. The use of
simulated data also has the advantage that there can be more control over the data, and it will be
easier to carry out sensitivity analysis for varying amounts of errors.
A traffic network similar to the field test bed was created in CORSIM, and detectors were placed
at 0.5-mile distances as discussed in Chapter III. Data were generated for 5 hours from 06:00:00
to 10:00:00. Traffic volumes from the field were used as input to CORSIM at 30-minute
intervals. Ten different flow rates were input for the 5-hour study, in order to have simulated
flow variations comparable to the field data variations. Detectors were placed in each link to
collect the flow, speed, and occupancy rate. The output data were extracted from the simulation
at 1-minute intervals as detailed in Chapter III.
In the field, different types of malfunctions occur to loop detectors, most of which may be
identified by analyzing the detector data at individual locations. However, analyzing the data at
individual locations may not identify systematic errors, such as detectors continuously
undercounting or overcounting the vehicles. This kind of constant bias in the data is one of the
main reasons for the violation of conservation of vehicles in the data. These are the errors that
are to be identified and corrected by the present optimization procedure. Hence, such errors were
introduced in the simulation data and the performance of optimization was studied. A sensitivity
analysis was carried out to find out the performance of optimization under varying types and
magnitudes of errors. The purpose of this sensitivity analysis was to find out the range up to
which the optimization procedure can be applied as an acceptable procedure for diagnosing the
data.
Three consecutive detectors were selected from the simulation and a four-hour data were
used for the sensitivity analysis. First the effect of constant undercounting or over
counting of the detectors was studied. The detector data without any error, if given as
96
input to the optimization will not be optimized as it satisfies all of the minimum
requirements, thus giving an MAPE of zero. The error was first introduced in the data as
a constant 10% over counting at detector location 2 by adding 10% of actual data to the
observations as given below:
(1 ),
t t
q q
new
old
= + (4.37)
where,
t
q
new
= data after introducing error,
t
q
old
= actual data, and
= bias introduced.
The optimization procedure was carried out on the data with the introduced error. The simulated
data, the data with errors, and the data after optimization were compared. Figure 4.32 shows the
plot of the actual data, the data after introducing the error, and the data after optimization, for the
detector for which the error was introduced. It can be seen that the optimization was able to
correct the error in the data in this case with a minimal change to the original data. Figure 4.33
shows the effect of this optimization on the corresponding cumulative flow data of the same
detector.
The optimized volumes are compared with the corresponding actual values obtained from the
simulation to check whether the integrity of the original data is maintained after optimization.
MAPE, as defined in Chapter III, is used as a performance measure. MAPE for the data with
errors and for the optimized data were calculated with respect to the true simulated values. The
magnitude of the MAPE values reduced from 10% to 4.57% after optimization.
9
7
15
20
25
30
35
40
45
6
:
1
0
:
0
0
6
:
1
7
:
0
0
6
:
2
4
:
0
0
6
:
3
1
:
0
0
6
:
3
8
:
0
0
6
:
4
5
:
0
0
6
:
5
2
:
0
0
6
:
5
9
:
0
0
7
:
0
6
:
0
0
7
:
1
3
:
0
0
7
:
2
0
:
0
0
7
:
2
7
:
0
0
7
:
3
4
:
0
0
7
:
4
1
:
0
0
7
:
4
8
:
0
0
7
:
5
5
:
0
0
8
:
0
2
:
0
0
8
:
0
9
:
0
0
8
:
1
6
:
0
0
8
:
2
3
:
0
0
8
:
3
0
:
0
0
8
:
3
7
:
0
0
8
:
4
4
:
0
0
8
:
5
1
:
0
0
8
:
5
8
:
0
0
9
:
0
5
:
0
0
9
:
1
2
:
0
0
9
:
1
9
:
0
0
9
:
2
6
:
0
0
9
:
3
3
:
0
0
9
:
4
0
:
0
0
9
:
4
7
:
0
0
9
:
5
4
:
0
0
Time (hh:mm:ss)
V
o
l
u
m
e
(
v
e
h
i
c
l
e
s
/
2
m
i
n
u
t
e
s
)
Actual simulated data
Data with error
Data after optimization
Fig. 4.32 Validation of optimization performance using simulated data
98
-
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
6
:
1
0
:
0
0
6
:
2
0
:
0
0
6
:
3
0
:
0
0
6
:
4
0
:
0
0
6
:
5
0
:
0
0
7
:
0
0
:
0
0
7
:
1
0
:
0
0
7
:
2
0
:
0
0
7
:
3
0
:
0
0
7
:
4
0
:
0
0
7
:
5
0
:
0
0
8
:
0
0
:
0
0
8
:
1
0
:
0
0
8
:
2
0
:
0
0
8
:
3
0
:
0
0
8
:
4
0
:
0
0
8
:
5
0
:
0
0
9
:
0
0
:
0
0
9
:
1
0
:
0
0
9
:
2
0
:
0
0
9
:
3
0
:
0
0
9
:
4
0
:
0
0
9
:
5
0
:
0
0
Time (hh:mm:ss)
C
u
m
u
l
a
t
i
v
e
v
o
l
u
m
e
(
V
e
h
i
c
l
e
s
)
Data with error
Actual
Optimized
Fig. 4.33 Validation of optimization performance using simulated cumulative data
This analysis was repeated with varying error values. The error was varied from 1% to 150% and
the results are shown in Table 4.3 and are plotted in figure 4. 34. It can be seen that as the error
in the input data is increasing, the MAPE between the optimized value and the actual value also
increasing.
Table 4.3 MAPE with Varying Amount of Over Counting Error in the Input Data
Error% MAPE
0 0
1 1.563299
10 4.579592
20 7.947224
30 11.25403
40 14.61498
50 17.97029
70 24.67011
100 34.70231
150 50.94552
99
0
20
40
60
80
100
120
140
01
1
0
2
0
3
0
4
0
5
0
7
0
1
0
0
1
5
0
Error (%)
M
A
P
E
MAPE of data with introduced error
MAPE of optimized data
Fig. 4.34 Performance of the optimization with varying amount of over counting of the detector
In a similar way, the effect of undercounting on the optimization performance was carried out.
The results obtained are shown in Figure 4.35.
0
10
20
30
40
50
60
70
80
90
100
015
1
0
2
0
3
0
4
0
5
0
7
0
9
0
9
9
Error (%)
M
A
P
E
MAPE of data with introduced error
MAPE of optimized data
Fig. 4.35 Performance of the optimization with varying amount of under counting of the detector
100
The effect of having random error at a detector location is also studied. A normal distribution
was assumed for the error distribution with the following density function.
2
1
2
1
( ; , )
2
x
f x e
| |
|
\ .
= , (4.38)
where,
x = value for which normal distribution is needed,
= mean of the distribution, and
= standard deviation of the distribution.
The standard deviation was varied from 0 to 50 in this dissertation. The MAPE in each of the
cases was calculated and is plotted in Figure 4.36.
0
20
40
60
80
100
120
140
0
1
0
2
0
3
0
4
0
5
0
Standard Deviation of Normal Distribution
M
A
P
E
MAPE of optimized data
MAPE of data with introduced error
Fig. 4.36 Performance of the optimization with varying amount of random error
It can be seen that the optimization was able to give acceptable results, assuming a 40% MAPE
as the maximum acceptable error, up to 100% over counting or under counting. In the case of
random errors, the optimization gave acceptable results up to a standard deviation of 40. Thus, it
can be seen that the optimization procedure was able to perform well with constant errors as well
101
as random errors.
Also, the performance of the optimization was checked under situations where 2 out of the 3
detectors are malfunctioning. This was carried out under two different scenarios. The first
scenario considered two detectors having constant bias in the data and the second scenario
considered one detector having a random error and the other having a constant bias.
One sample run was carried out for each of the two scenarios discussed above. The results of the
first scenario with
1
= -10% and
2
= 20% at location 1 and 2 are shown in Figures 4.37 and
4.38 respectively. Figure 4.37 shows the actual volume, volume with introduced error and the
volumes after the optimization for location 1. Similar figure for location 2 is shown in Figure
4.38. It can be seen that even with two out of the three detectors having error, the optimization
was able to reduce the error in the data. The MAPE value reduced from 10% to 5% at location 1
and from 20% to 6% at location 2.
In the second scenario, optimization was carried out with a constant error of 10% at the first
location and a random error following normal distributions with a standard deviation of 10 at the
second location. The results obtained are given in Figures 4.39 and 4.40
1
0
2
10
15
20
25
30
35
40
6
:
1
1
:
0
0
6
:
1
7
:
0
0
6
:
2
3
:
0
0
6
:
2
9
:
0
0
6
:
3
5
:
0
0
6
:
4
1
:
0
0
6
:
4
7
:
0
0
6
:
5
3
:
0
0
6
:
5
9
:
0
0
7
:
0
5
:
0
0
7
:
1
1
:
0
0
7
:
1
7
:
0
0
7
:
2
3
:
0
0
7
:
2
9
:
0
0
7
:
3
5
:
0
0
7
:
4
1
:
0
0
7
:
4
7
:
0
0
7
:
5
3
:
0
0
7
:
5
9
:
0
0
8
:
0
5
:
0
0
8
:
1
1
:
0
0
8
:
1
7
:
0
0
8
:
2
3
:
0
0
8
:
2
9
:
0
0
8
:
3
5
:
0
0
8
:
4
1
:
0
0
8
:
4
7
:
0
0
8
:
5
3
:
0
0
8
:
5
9
:
0
0
9
:
0
5
:
0
0
Time (hh:mm:ss)
V
o
l
u
m
e
(
v
e
h
i
c
l
e
s
/
2
m
i
n
u
t
e
s
)
Actual volume
Volume with error
Volume after optimization
Fig. 4.37 Comparison of the performance of optimization at Location 1
1
0
3
0
5
10
15
20
25
30
35
40
45
50
6
:
1
1
:
0
0
6
:
1
6
:
0
0
6
:
2
1
:
0
0
6
:
2
6
:
0
0
6
:
3
1
:
0
0
6
:
3
6
:
0
0
6
:
4
1
:
0
0
6
:
4
6
:
0
0
6
:
5
1
:
0
0
6
:
5
6
:
0
0
7
:
0
1
:
0
0
7
:
0
6
:
0
0
7
:
1
1
:
0
0
7
:
1
6
:
0
0
7
:
2
1
:
0
0
7
:
2
6
:
0
0
7
:
3
1
:
0
0
7
:
3
6
:
0
0
7
:
4
1
:
0
0
7
:
4
6
:
0
0
7
:
5
1
:
0
0
7
:
5
6
:
0
0
8
:
0
1
:
0
0
8
:
0
6
:
0
0
8
:
1
1
:
0
0
8
:
1
6
:
0
0
8
:
2
1
:
0
0
8
:
2
6
:
0
0
8
:
3
1
:
0
0
8
:
3
6
:
0
0
8
:
4
1
:
0
0
8
:
4
6
:
0
0
8
:
5
1
:
0
0
8
:
5
6
:
0
0
9
:
0
1
:
0
0
9
:
0
6
:
0
0
Time (hh:mm:ss)
V
o
l
u
m
e
(
v
e
h
i
c
l
e
s
\
2
m
i
n
u
t
e
s
)
Actual volume
Volume with error
Volume after optimization
Fig. 4.38 Comparison of the performance of optimization at Location 2
1
0
4
10
15
20
25
30
35
40
45
6
:
1
1
:
0
0
6
:
1
7
:
0
0
6
:
2
3
:
0
0
6
:
2
9
:
0
0
6
:
3
5
:
0
0
6
:
4
1
:
0
0
6
:
4
7
:
0
0
6
:
5
3
:
0
0
6
:
5
9
:
0
0
7
:
0
5
:
0
0
7
:
1
1
:
0
0
7
:
1
7
:
0
0
7
:
2
3
:
0
0
7
:
2
9
:
0
0
7
:
3
5
:
0
0
7
:
4
1
:
0
0
7
:
4
7
:
0
0
7
:
5
3
:
0
0
7
:
5
9
:
0
0
8
:
0
5
:
0
0
8
:
1
1
:
0
0
8
:
1
7
:
0
0
8
:
2
3
:
0
0
8
:
2
9
:
0
0
8
:
3
5
:
0
0
8
:
4
1
:
0
0
8
:
4
7
:
0
0
8
:
5
3
:
0
0
8
:
5
9
:
0
0
9
:
0
5
:
0
0
Time (hh:mm:ss)
V
o
l
u
m
e
(
v
e
h
i
c
l
e
s
/
2
m
t
)
Actual volume
Volume with error
Volume after optimization
Fig. 4.39 Comparison of the performance of optimization at Location 1
1
0
5
0
10
20
30
40
50
60
70
6
:
1
1
:
0
0
6
:
1
6
:
0
0
6
:
2
1
:
0
0
6
:
2
6
:
0
0
6
:
3
1
:
0
0
6
:
3
6
:
0
0
6
:
4
1
:
0
0
6
:
4
6
:
0
0
6
:
5
1
:
0
0
6
:
5
6
:
0
0
7
:
0
1
:
0
0
7
:
0
6
:
0
0
7
:
1
1
:
0
0
7
:
1
6
:
0
0
7
:
2
1
:
0
0
7
:
2
6
:
0
0
7
:
3
1
:
0
0
7
:
3
6
:
0
0
7
:
4
1
:
0
0
7
:
4
6
:
0
0
7
:
5
1
:
0
0
7
:
5
6
:
0
0
8
:
0
1
:
0
0
8
:
0
6
:
0
0
8
:
1
1
:
0
0
8
:
1
6
:
0
0
8
:
2
1
:
0
0
8
:
2
6
:
0
0
8
:
3
1
:
0
0
8
:
3
6
:
0
0
8
:
4
1
:
0
0
8
:
4
6
:
0
0
8
:
5
1
:
0
0
8
:
5
6
:
0
0
9
:
0
1
:
0
0
9
:
0
6
:
0
0
Time (hh:mm:ss)
V
o
l
u
m
e
(
v
e
h
i
c
l
e
s
\
2
m
i
n
u
t
e
s
)
Actual volume
Volume with error
Volume after optimization
Fig. 4.40 Comparison of the performance of optimization at Location 2
106
It can be seen that with this combination of errors, the optimized values at location 1 have more
variation than it had before the optimization. This was due to the fact that it has to take into
account the wide variation at location 2 also into account as shown in Figure 4.40. The MAPE at
the first location increased slightly from 10% to 12%, while the second location MAPE was
reduced from 29% to 12%.
For the optimization of the field data, a comparison was carried out between the optimized
volumes and the corresponding field values. Even though it is known that the field data have
discrepancies, this was carried out to check how much the integrity of the original data is
maintained after the optimization.
The results are tabulated in Table 4.4 for all the selected locations and all the selected days for
the whole 24-hour period. The small magnitude of the MAPE values around 10% as shown in
Table 4.4 (a value of 40% or more is considered to be large in practical applications
(ezforecaster 2003)) show that the GRG method is able to perform the optimization without
changing the original datas integrity. Thus, it can be observed that the optimization procedure
follows all the requirements, namely, the data follow conservation of vehicles at all points at all
time, handle a large amount of data, and preserve the integrity of the observed data as much as
possible.
Table 4.4 Performance Measure at Each Site
Date Location 1
MAPE (%)
Location 2
MAPE (%)
Location 3
MAPE (%)
Location 4
MAPE (%)
Location 5
MAPE (%)
February 10, 2003 7.56 6.19 5.88 8.17 14.86
February 11, 2003 8.43 6.63 7.06 8.71 15.76
February 12, 2003 7.67 6.32 6.73 8.06 10.19
February 13, 2003 8.32 6.64 6.43 8.65 11.30
February 14, 2003 5.99 5.56 5.85 8.01 14.56
107
4.7 OTHER APPLICATIONS
In addition to removing the discrepancy in the available data, the proposed optimization
technique can also be used for imputing missing data if any of the detector locations under
consideration miss recording data for some period of time. As discussed in Chapter II, missing
data values (nonresponse) is a common occurrence with ITS data, and different imputation
methods were reported. The detectors in general report data at 20- or 30-second intervals.
However, sometimes the intervals get skipped and the data are reported at a larger interval,
which can range from 1-minute to 10-minutes. Some of the reasons for this can be detector
malfunctions, communication disruption, or software failure.
In this dissertation, the efficacy of the proposed optimization approach for imputation is
illustrated in the following manner. Separate sample data sets with missing values were
generated for locations 1, 2, 3, 4, and 5 of I-35 for February 10, 2003, by replacing the data with
zeros for an interval of 15 minutes. The optimization program was run for these data with
missing numbers. Based on the objective function and the constraints specified for the
optimization, the missing numbers will be imputed depending on the values at the other locations
for that time step, as well as the optimized number for the same location in the previous time
step. Table 4.5 shows a sample set of data to illustrate the imputation for a 12-minute interval at
location 2 on February 10, 2003. The values given are the time interval in the first column,
followed by the field values obtained for each of the time intervals. The third column shows the
optimized values corresponding to the field values. The fourth column shows the data introduced
as zero to represent the missing numbers, and the fifth column gives the corresponding
imputation results. The MAPE between the actual and the imputed values is calculated and is
shown in the last column. It can be seen that the optimization procedure was able to impute the
missing data reasonably well.
108
Table 4.5 Imputation of Missing Data Using GRG
Time Actual Optimized Missing Imputed MAPE %
0:30 29 29.93 0 20.26 30
0:32 28 29.18 0 19.98 28
0:34 23 23.93 0 16.19 29
0:36 21 22.74 0 15.59 25
0:38 26 24.99 0 16.46 36
0:40 21 26.93 0 20.09 4.4
0:42 22 23.73 0 16.33 25
Figures 4.41 to 4.45 show the results obtained for missing data at locations 1, 2, 3, 4, and 5
respectively, after optimization for the five-detector series. For comparison, the result from
optimization for the corresponding data without any missing values also is plotted in the
corresponding figures. The original field data and the data with the missing numbers are also
plotted in each of the graphs. It can be seen that with the missing values, the optimization
retained the trend in the data and the numbers are able to follow the original optimized results.
The MAPE values were calculated by comparing the optimization results with missing values
with the corresponding actual values for the missing period of 12 minutes with 8 observations.
The MAPE values obtained are also shown in the corresponding figures.
109
0
5
10
15
20
25
30
35
40
45
0
:
0
2
:
0
0
0
:
0
8
:
0
0
0
:
1
4
:
0
0
0
:
2
0
:
0
0
0
:
2
6
:
0
0
0
:
3
2
:
0
0
0
:
3
8
:
0
0
0
:
4
4
:
0
0
0
:
5
0
:
0
0
0
:
5
6
:
0
0
Time (hh:mm:ss)
V
o
l
u
m
e
(
v
e
h
i
c
l
e
s
/
2
m
i
n
u
t
e
s
)
Imputed data
Actual data
Missing data
Optimized data
Fig. 4.41 Imputation results with missing data in location 1 on February 10, 2003
0
5
10
15
20
25
30
35
40
45
0
:
0
2
:
0
0
0
:
0
6
:
0
0
0
:
1
0
:
0
0
0
:
1
4
:
0
0
0
:
1
8
:
0
0
0
:
2
2
:
0
0
0
:
2
6
:
0
0
0
:
3
0
:
0
0
0
:
3
4
:
0
0
0
:
3
8
:
0
0
0
:
4
2
:
0
0
0
:
4
6
:
0
0
0
:
5
0
:
0
0
0
:
5
4
:
0
0
0
:
5
8
:
0
0
Time (hh:mm:ss)
V
o
l
u
m
e
(
v
e
h
i
c
l
e
s
/
2
m
i
n
u
t
e
s
)
Imputed data
Actual data
Missing data
Optimized data
Fig. 4.42 Imputation results with missing data in location 2 on February 10, 2003
MAPE
12 minutes
= 29.36%
MAPE
12 minutes
= 28.20%
110
0
10
20
30
40
50
60
70
0
:
0
2
:
0
0
0
:
0
8
:
0
0
0
:
1
4
:
0
0
0
:
2
0
:
0
0
0
:
2
6
:
0
0
0
:
3
2
:
0
0
0
:
3
8
:
0
0
0
:
4
4
:
0
0
0
:
5
0
:
0
0
0
:
5
6
:
0
0
Time (hh:mm:ss)
V
o
l
u
m
e
(
v
e
h
i
c
l
e
s
/
2
m
i
n
u
t
e
s
)
Actual data
Optimized data
Imputed data
Missing data
Fig. 4.43 Imputation results with missing data in location 3 on February 10, 2003
0
5
10
15
20
25
30
35
40
45
50
0
:
0
2
:
0
0
0
:
0
8
:
0
0
0
:
1
4
:
0
0
0
:
2
0
:
0
0
0
:
2
6
:
0
0
0
:
3
2
:
0
0
0
:
3
8
:
0
0
0
:
4
4
:
0
0
0
:
5
0
:
0
0
0
:
5
6
:
0
0
Time (hh:mm:ss)
V
o
l
u
m
e
(
v
e
h
i
c
l
e
s
/
2
m
i
n
u
t
e
s
)
Actual data
Optimized data
Imputed data
Missing data
Fig. 4.44 Imputation results with missing data in location 4 on February 10, 2003
MAPE
12 minutes
= 32.82%
MAPE
12 minutes
= 44%
111
0
5
10
15
20
25
30
35
40
45
0
:
0
2
:
0
0
0
:
0
6
:
0
0
0
:
1
0
:
0
0
0
:
1
4
:
0
0
0
:
1
8
:
0
0
0
:
2
2
:
0
0
0
:
2
6
:
0
0
0
:
3
0
:
0
0
0
:
3
4
:
0
0
0
:
3
8
:
0
0
0
:
4
2
:
0
0
0
:
4
6
:
0
0
0
:
5
0
:
0
0
0
:
5
4
:
0
0
0
:
5
8
:
0
0
Time (hh:mm:ss)
V
o
l
u
m
e
(
v
e
h
i
c
l
e
s
/
2
m
i
n
u
t
e
s
)
Actual data
Optimized
data
Fig. 4.45 Imputation results with missing data in location 5 on February 10, 2003
Also, this method can be used for finding the worst-performing detector stations based on the
amount of error at each location. This information can be used for prioritizing the detectors for
maintenance. This can be accomplished by comparing the MAPE values. For example, it can be
seen in Table 4.4 that the MAPE for location 5 is higher than all the other locations, which is an
indication that the detectors at location 5 is performing poorer than those at all the other
locations. This will be useful to decide that the detectors at location 5 need priority in
maintenance. However, with the present method of analysis, where all the detectors at a location
are added together and assumed as a single detector, it will not be possible to identify which
specific detector in that location is malfunctioning. To identify the specific malfunctioning
detector within the identified location, the analysis should be carried out at a lane-by-lane level.
The issues related to this kind of analysis of the detectors for each lane separately is discussed in
detail in Chapter III.
MAPE
12 minutes
= 47.7%
112
4.8 ALTERNATIVE OBJECTIVE FUNCTIONS AND CONSTRAINTS
As discussed already, the constraints in the present optimization are selected based on
restrictions, such as the cumulative flow at each detector location should be greater than or equal
to the cumulative flow at the succeeding detector at all times. Another constraint is that the
maximum difference between the cumulative flows should not exceed the maximum number of
vehicles that can be accommodated in that road length at jam density conditions. These
constraints are based on the worst and best scenarios. However, one could also choose different
constraints which will change the computation time and the accuracy of the results.
The objective function used in the present strategy can also be modified to make the
optimization procedure to incorporate more features of traffic flow. For example, the use of a
weighted objective function can be one of the possibilities. One way to carry out this will be to
assign weight to the variables in the objective function based on the standard deviation or
variance at each of the locations (Taylor et al. 1969). However, to take variance into account in
the present study, one of the following assumptions must be made:
a) Variance is the same for a small interval of time, say 10 minutes (constant
temporal flow), and/or
b) Variance is the same for the consecutive locations (constant spatial flow).
The first assumption will assume a constant variation in traffic flow over time, while the
second one makes an assumption that the flow is uniform in nature. Also, there is a need
to know the relationship between the variance in the data and the error due to
malfunctioning of the ILD. However, no literature was found on the relationship
between the variance of traffic flow and the accuracy of the data recorded by the
detectors. Thus, another assumption about how much variance is due to error and how
much is due to natural variation in the traffic flow needs to be made.
113
In case the assumption is made that more variance at one location means more error at
that point, one can assign more weight to the cumulative flow at that point based on the
weight calculation given below (Miller and Miller 1993).
n
s
s
w
i
i
i
=
2
2
, (4.39)
where,
2
i
s = variance, and
n = number of observations.
Then, this weight can be assigned to the variables in the original objective function of
the optimization given in equation 4.26 based on the assumptions made. For example, if
the assumption is that variance should be same for the interval of time under
consideration, the objective function can be:
(1) (1) (2) (2) (2) (2) (3) (3) 2 2
min ( ) ( ) , Q w Q w Q w Q w
t t t t
+
(
(
(4.40)
where,
w
(1)
= weight based on the variance at location 1 as given in Equation 4.39.
If the assumption is that the variance is same for the consecutive locations, the objective function
can be:
(1) (2) (2) (3) 2 1 2 2 2 3
min ( ) ( ) , Q Q w Q Q w
t t t t
+
(
(
(4.41)
where,
w
1-2
= weight based on the difference in variance between locations 1 and 2.
However, the accuracy of the above assumptions are not clear. As discussed already, there is a
need to find out the relation between error in the detector data and variance of the data. This will
114
give an idea of how much of variance is due to error and how much is due to natural variation in
the traffic flow. For example, Smith et al. (2003) argued that reducing the natural variance in the
traffic data is an undesirable approach. Thus, the only reasonable inference one can make based
on the variance at consecutive locations is that if there is a large change in variance at one
location compared to the neighboring locations that may indicate a malfunction of the detector at
that location. In the present study, the variance in the data at the consecutive locations was
compared. Figure 4. 46 show the plot of the variance in the data obtained from the consecutive
locations. It can be seen that the variances did not have much variation between the locations.
Hence, minimization of variance is not taken in the objective function or constraints in the
present study. However, if it is known that the variation at a location is due to error in data
collection, incorporating the variance in the objective function may lead to better result.
0
5
10
15
20
25
30
0
:
0
2
:
0
0
1
:
4
8
:
0
0
3
:
3
4
:
0
0
5
:
2
0
:
0
0
7
:
0
6
:
0
0
8
:
5
2
:
0
0
1
0
:
3
8
:
0
0
1
2
:
2
4
:
0
0
1
4
:
1
0
:
0
0
1
5
:
5
6
:
0
0
1
7
:
4
2
:
0
0
1
9
:
2
8
:
0
0
2
1
:
1
4
:
0
0
2
3
:
0
0
:
0
0
Time (hh:mm:ss)
V
a
r
a
i
n
c
e
(
2
-
m
i
n
u
t
e
d
a
t
a
g
r
o
u
p
e
d
f
r
o
m
2
0
-
s
e
c
d
a
t
a
)
Location 1
Location 2
Location 3
Fig. 4.46 Variance from three consecutive locations on February 11, 2003
4.9 CONCLUDING REMARKS
In this chapter, the loop detector data initially screened and corrected for common discrepancies
were considered for further analysis. The data were analyzed as a series, rather than as individual
115
locations, and it was found that the conservation of vehicles principle was violated in one of the
two following ways: in the first case there were a larger number of vehicles exiting than entering
the test section, while in the second, the cumulative volume entering became unreasonably
higher than the cumulative volume exiting. The cumulative volume curves of the data after the
usual error corrections clearly showed that this approach of observing the detectors as a series
could identify discrepancies that were unidentified with the commonly adopted error-checking
procedures at individual locations. An optimization algorithm to adjust the volume data so that
they will satisfy the conservation of vehicles was proposed. The objective of the method was to
minimize the difference between the entry-exit observations using a GRG optimization. The data
obtained after the optimization were consistent with the conservation of vehicles without
violating any constraints. This method of correcting the loop detector data is more useful and
convenient than the application of volume adjustment factors when dealing with large amounts
of data for a longer duration and having large discrepancies. Also, the optimization technique
proved to be very useful for imputing missing data as well as for prioritizing the detector stations
for maintenance. This dissertation represents the first application of this kind of an optimization
technique for quality control of the ILD data. The optimized data will be used in the estimation
of travel time and will be discussed in the next chapter. The influence of this optimization on the
final estimated travel time will also be discussed in that chapter.
116
CHAPTER V
ESTIMATION OF TRAVEL TIME
5.1 INTRODUCTION
Travel time, or the time required to traverse a roadway between any two points of interest, is a
fundamental measure in transportation. Engineers and planners have used travel time and delay
studies since 1920s to evaluate transportation facilities and plan improvements (Travel time data
collection handbook 1998). In recent times with the increasing interest in Advanced Traveler
Information Systems (ATIS) and Advanced Traffic Management Systems (ATMS), providing
travelers with accurate and timely travel time information has gained paramount importance.
Travel time can be measured directly using probe vehicles/test vehicles, license plate matching,
electronic distance measuring instruments, Automatic Vehicle Identification (AVI), Automatic
Vehicle Location (AVL), and video imaging, or it can be estimated from indirect sources like
Inductance Loop Detectors (ILD), weigh-in-motion stations, or aerial video. While techniques
like AVI and probe vehicles have less error, they are more expensive and often require new
types of sensors as well as public participation; hence they are not widely deployed in urban
areas (Turner 1996). Other methods, such as the test vehicle method, are time consuming, labor
intensive, and expensive for collecting large amounts of data. On the other hand, most of the
metropolitan areas in North America have their freeway network instrumented with ILD, which
makes them the best source of traffic data over a wide area for a long period of time. Hence, at
present ILDs are the most cost effective and popular way of obtaining travel time information for
ATIS applications.
As discussed in earlier chapters, ILDs can be either single-loop or dual-loop. The data supplied
by single-loop detectors include volume and occupancy. An algorithm is then used for estimating
the speed using inputs such as effective loop length, average vehicle length, time over the
detector, and the number of vehicles counted (Klein 2001). In the case of dual-loop detectors, the
speed value will be automatically calculated based on the known distance between the two loops
and the time a vehicle takes to cross the two loops. However, neither of these ILDs can collect
travel time data directly, and so travel time has to be estimated from the available ILD data such
117
as flow, speed, or occupancy. Also, the data obtained from the ILDs are not for individual
vehicles, but an aggregated value for all the vehicles traveling in the interval in which the data is
reported. Thus, the travel time estimation should be based on the aggregate/average values
reported by the ILDs for the small aggregation intervals, usually 20 or 30 seconds.
Accurate estimation of travel time from loop detector data is a difficult task due to the fact that
the detector data is a point measurement, whereas travel time is a dynamic parameter averaged
over distance. Thus, the travel time estimated based on spot speeds tends to underestimate
section travel times due to the failure to capture traffic congestion occurring between the detector
stations. For example, the most popular method adopted in the field today for the estimation of
travel time from ILD data is based on the extrapolation of the point speed values. However, it is
known that the accuracy of speed-based methods declines as the flow becomes larger because
this method cannot take into account the variation in flow between the two measurement points.
Oh et al. (2003) reported that the travel time estimated from single or dual-loop detector speed
values would be correct only under the assumption that the traffic condition in the section is
either homogenous or a linear combination of the two points. However, this assumption is not
valid under congested traffic conditions. Thus, the travel time estimated tends to be biased under
congested traffic conditions. Other estimation methods include statistical and traffic flow theory
based models, the majority of which are developed for either the free-flow condition or the
congested-flow condition (Nam and Drew 1996, 1998; Hoogendoorn 2000; Oh et al. 2003).
Thus, most of these models were not developed taking into account the varying traffic flow
conditions during the transition period from peak to off-peak or off-peak to peak conditions.
Some attempts have been made in the past to estimate travel time using re-identification of
vehicles at the second location (Coifman 1998; Coifman and Cassidy 2002; Sun et al. 1998,
1999). However, these methods require the use of sophisticated equipments and/or programs,
which are not typically available to most traffic management centers.
The present study proposes a travel time estimation procedure using ILD data. The methodology
proposed is based on a theoretical model suggested by Nam and Drew (1999) for the estimation
of travel time from ILD flow data. Several modifications to this theoretical model are proposed
in this dissertation. The details of the model by Nam and Drew (1999) are discussed in the next
section followed by the proposed changes in the model. In the results section, a comparison is
118
carried out between the results obtained from the Nam and Drew model and the proposed model.
Also the travel time estimated using the proposed method is compared with the results from the
extrapolation method as well as with the direct travel time measured using AVI. In order to
perform a more comprehensive analysis, the modifications are validated using simulated data
from CORSIM simulation software.
5.2 TRAFFIC DYNAMICS MODEL
The traffic dynamics model (which will be called the N-D model henceforth) for estimating
freeway travel time from ILD flow measurements suggested by Nam and Drew (1995, 1996,
1998, 1999) is based on the characteristics of the stochastic vehicle counting process and the
principle of conservation of vehicles. An inductive modeling approach was adopted in their
study along with geometric interpretations of cumulative arrival-departure diagrams. The link
travel time was calculated as the area between the cumulative volume curves from loop detectors
at either end of the link. Instead of the usual approach of generalizing point measurements over a
link, this work showed a judicious application of traffic flow theory to yield better travel time
estimates from point data. Exponential averaging was used to increase the stability of the time
series estimation of travel time.
The method can be explained using a one-lane road with two detectors located at each end, as
shown in Figure 5.1. The number of vehicle arrivals and departures are measured continuously at
the upstream location x
1
and downstream location x
2
.
Fig. 5.1 Illustration of the conservation of vehicles
x
1
x
2
x
119
Referring to Figure 5.1, let q (x
1
, t) denote the flow per unit time measured at location x
1
at time
t, and let q(x
2
, t) denote the flow measured at location x
2
at the same time t. The flows are
regularly aggregated at t intervals for each detector. Thus the total number of vehicles entering
and exiting the link during t respectively are
1 2
( , ) , ( , ) q x t t t q x t t t + + . (5.1)
Under the principle of conservation of vehicles, the difference between the above two quantities
equals the change in the density, k (t), over the link distance x. The equation of conservation of
vehicles then becomes
| |
1 2
( , ) ( , ) ( ) ( ) q x t t q x t t t k t t k t x
(
+ + = +
. (5.2)
Rearranging the terms in the above equation, the conservation equation was written by Nam and
Drew in the following form:
1 2 1 1 2 2
( , ) ( , ) ( , ) ( , ) ( , ) ( , )
( ) ( )
q x t q x t q x t t q x t q x t t q x t
k t t k t
x x x t
+ +
+
+ =
. (5.3)
Let Q (x
1
, t
n
) and Q (x
2
, t
n
) be the cumulative number of vehicles entering and exiting the link
respectively, which can be expressed as
1 1
( , ) ( , )
1
n
Q x t t q x t
n
i
i
=
=
, and (5.4)
2 2
( , ) ( , )
1
n
Q x t t q x t
n
i
i
=
=
. (5.5)
The initial conditions were
1 0 2 0 0
( , ) 0, ( , ) ( ) 0 Q x t Q x t n t = = , (5.6)
120
where,
0
( ) n t = number of vehicles traveling on the link at time
0
t
.
The relationship between the link distance x and the data aggregation interval t is maintained
as
5min
x
t
v
f
< , (5.7)
where, v
f
= the free-flow speed on the link.
According to the characteristics of the stochastic vehicle counting process, the variables Q (x
1
, t
n
)
and Q (x
2
, t
n
) are nonnegative and nondecreasing, and this leads to equation 5.8 and 5.9.
1 1 1 1
( , ) ( , ) ( ) 0,
, n
Q x t Q x t q x t t
n n
= (5.8)
2 2 1 2,
( , ) ( , ) ( ) 0. Q x t Q x t q x t t
n n n
=
(5.9)
Also, the cumulative number of vehicles leaving downstream cannot exceed those arriving at
upstream (based on the conservation of vehicles principle). Therefore,
1 2
( , ) ( , ) Q x t Q x t
n n
. (5.10)
The equality condition in equation 5.10 holds when there are no arrivals and subsequently the
link is empty for the time interval t.
Let n(t) be the number of vehicles traveling over the link distance x between the detector
stations x
1
and x
2
at time t
n
and is given as
1 2
( ) ( , ) ( , ) n t Q x t Q x t
n n n
= . (5.11)
121
Then, the density at time t
n
, k (t
n
) is calculated as follows:
k (t
n
) =
( ) n t
n
x
=
1 2
Q (x , t ) - Q (x , t )
n n
x
. (5.12)
The N-D study developed two separate models: one for normal-flow conditions and the other for
congested-flow conditions. The distinction between normal and congested-flow was made based
on the number of vehicles entering and exiting during the specific time interval. This variable
m(t
n
) was defined as the number of vehicles that enter the link during the interval t
n-1
to t
n
and
that exit the link during the same interval. Under the first-in first-out condition m(t
n
) is given as
2 1 1
( ) ( , ) ( , ) m t Q x t Q x t
n n n
=
. (5.13)
The variable m(t
n
) was considered as a dynamic link performance measure, and different
equations for estimating the travel time were suggested depending on whether m(t
n
) is positive
(normal-flow) or equal to or less than zero (congested-flow).
5.2.1 Case 1. Normal-flow Condition
Nam and Drew assumed that the traffic characteristics of vehicles traveling under normal-flow
conditions are represented by the vehicles that enter the link during the interval t
n-1
to t
n
and that
exit the link during the same interval. The total travel time of these vehicles is schematically
shown in Figure 5.2 as the hatched area.
122
Fig. 5.2 Schematic representation of the total travel time during the interval (t
n-1,
t
n
) under
normal-flows
(Source: Nam and Drew 1999)
Thus, analytically the total travel time of all the vehicles that entered and exited the link in that
time period is equal to the shaded area and can be calculated as
( ) ( )
1
1
( ) ( )
2
T t t t t t m t
n n n n
= +
(
, (5.14)
where,
t = time of entry into the link of the last vehicle that exits the link during the interval and
t = time of departure from the link of the first vehicle that enters the link during the
interval.
After interpolating the values of t and t , and substituting them in Equation 5.14, the travel
time T(t
n
) was calculated for the vehicles that enter and exit during the same interval (m (t
n
)) and
is given in Equation 5.15:
123
( ) ( )
( ) ( )
1
1
1
, ( ) , ( )
( )
2
, ,
q x t k t q x t k t
n n n n x
i i
T t
n
q x t q x t
n n
i i
+
+
=
+
(
(
, (5.15)
where,
x = distance between the detector locations (meters),
q(x
i
,t
n
) = flow at location i from t
n-1
to t
n
(vehicles per second), and
k(t
n
) = density in the link between location i and i+1 at time t
n
(vehicles per meter).
5.2.2 Case 2. Congested-flow Condition
Fig. 5.3 Schematic representation of the total travel time during the interval (t
n-1,
t
n
) under
congested-flows
(Source: Nam and Drew 1999)
Nam and Drew stipulated the traffic condition as congested when the value of the variable m(t
n
)
is either zero or negative. Under such conditions, none of the vehicles that enter the link during
the interval t
n-1
to t
n
exit the link during the same interval. Then, the travel time is calculated
based on all the vehicles that enter during the interval under consideration, and the value
corresponding to m (t
n
) for congested condition is calculated as
1 1 1
''
( ) ( , ) ( , ) m t Q x t Q x t
n n n
=
. (5.16)
124
Thus, under the congested-flow conditions, the travel time is calculated as the shaded area in
Figure 5.3, and this is equal to
( ) 1
1
'' ' ''
( ) ( )
2
t t t t m t
n n n
+
(
(
, (5.17)
where,
t = expected time of departure from the link of the last vehicle that enters the link during
the interval (t
n-1
, t
n
), and
t = expected time of departure from the link of the first vehicle that enters the link during
the same time interval.
After interpolating the values of t and t , and substituting them in Equation 5.17, the travel
time T(t
n
) and is calculated as shown in Equation 5.18:
( ) ( )
( )
1
1
( )
2
.
k t k t
x
n n-
T t
n
q x ,t
n
i
+
=
+
(
(5.18)
After the calculation of travel time, exponential averaging was applied to smooth the dynamic
travel time estimates. This numerical technique favored the most recent estimate by assigning
weight factors. Thus, the instantaneous travel time estimate at the next time interval (t
n-1
, t
n
) was
calculated as
1 1
( ) ( ) ( ) ( ) T t T t T t T t T
n n n n
f
= +
(
, (5.19)
where,
= exponential weighing factor
t
T
= +
+
(
(
(
(
(
(
(5.21)
All the variables in Equation 5.21 are the same as in Equations 5.15 and 5.18. It can be seen that
this modification helps to model the transition flow in a more accurate way. For instance, when
the flow condition is completely normal, the value of ( )
p n
m t
will be 1, and hence the second
term in the above equation will vanish. In the transition stage the second term will take into
account the travel time of those vehicles that fall in the congested condition that were ignored in
the N-D model.
This freeway travel time function given in equation 5.21 has two independent measures, q (x
1
, t
n
)
and q (x
2
, t
n
). The relationship between the travel time and the flow rates can be found by
differentiating this function with respect to the two flow variables. Re-writing equation 5.21
using Equation 5.3 the following equation is obtained:
| |
| |
1 1 2 1 2 2
1 2
1 1 2
2
( )
( ) ( ) ( )
2
1 ( )
2 ( ) ( ) .
2
m t
p n
T k t x q q t q q q
n
i
q q
m t
p n
k t x q q t
n
q
= + + +
(5.22)
The final differential with respect to q (x
1
, t
n
) is obtained as follows:
1
2
2
1 1 2
( ) ( ( ) ) (1 ( ))
2 2
m t q t k t x m t t
T p n n p n
q q q
= +
, (5.23)
where,
q
1
= q (x
1
, t
n
), and
q
2
= q (x
2
, t
n
).
132
Due to the precondition of the normal-flow (Equation 5.13) that
2 1 1
( , ) ( , )
n n
Q x t Q x t
> , the
quantity
2
q t will always be greater than ( 1)
n
k t x . Also, both the numerator and
denominator in the second term in Equation 5.23 are always positive, making Equation 5.23
always positive. This means that as the traffic demand given by q (x
1
, t
n
) increases, the travel
time also increases.
The final differential with respect to q (x
2
, t
n
) is calculated and is shown in Equation 5.24.
| |
2
2 1 1 1 1
2 2
2 1 2 2
( ) ( ) (1 ( )) 2 ( )
2 2
m t q t k t q x m t k t x q t
T p n n p n n
q q q q
+ +
=
( ( (
(
(
( (
. (5.24)
It can be seen that both the numerator and the denominator for the first and second terms are
positive, making Equation 5.24 always negative. This means that as the outflow quantity q(x
2
,t
n
)
increases, the travel time decreases. Thus, it can be seen that the new travel time function has a
desirable relationship with both the flow variables under normal and congested conditions by
increasing with increasing inflow and decreasing with increasing outflow.
As in the original model, exponential averaging was applied to smooth the dynamic travel time
estimates. This smoothing gives stable estimates over time. An value of 0.2 was adopted, thus
smoothing the exponentially averaged estimates over the time interval 5 t.
5.3.3 Modification III. Calculation of Density
The N-D model calculates the density from the cumulative flow values. Thus, the accuracy of
the estimated travel time depends solely on the accuracy of the measured flow values. If the
point detectors are working perfectly, this method is appropriate to calculate the true density in a
section. However, in reality the detectors may not be working perfectly (Vanajakshi and Rilett
2004b; Turner et al. 2000; Chen and May 1987). Moreover, if there is a malfunction in the
detectors, the flow data get more affected. This is because of the nature in which the detectors
collect traffic data. The flow data from the detectors are reported as a cumulative number,
whereas speed and occupancy are averaged data for the accumulation time interval (every 20- to
133
30-second interval). Hence the effect of a detector malfunction, like missing vehicles, will have
less impact on speed and occupancy in comparison to flow data. In such cases, the calculation of
density from the flow values may not yield the best results.
Even though in the present study the flow data are corrected for the discrepancy based on the
conservation of vehicles constraint, there can still be more unaccounted errors in the data.
Hence, in the present study the use of occupancy values for the calculation of density is
suggested instead of using the flow values. The present study calculated the density from the
ILD occupancy values using the following equation (May 1990).
52.8
( )
O
k
Lv Ld
=
+
, (5.25)
where,
k = density (vehicles per mile),
Lv = average vehicle length (feet),
L
d
= detection zone length (feet), and
O = percent occupancy.
Even though this method has the disadvantage of requiring an estimate of the average vehicle
length, it was found that compared to the use of cumulative flow curves, this method gave more
reasonable results. This particular fact will be illustrated in the subsequent results sections.
5.3.4 Modification IV. Use of Extrapolation Method for Low Volume Conditions
Many of the previous studies have reported that speed, and in turn travel time, is not dependent
on the flow under low traffic flow conditions (Van Aerde and Yagar 1983; Persaud and Hurdle
1988; Sisiopiku et al 1994a; Faouzi and Lesort 1995; HCM 2000; Bovy and Thijs 2000;
Coifman 2001). The Highway Capacity Manual (2000) remarks on this issue as follows: All
recent studies indicate that speed on freeways is insensitive to flow in the low to moderate
range and the low to moderate volume includes up to 1300 passenger cars per hour per lane
(pcphpl) for a 70 mph freeway system. Sisiopiku et al. (1994a), in their study on the correlation
134
between travel time and detector data concluded that travel time is independent of both flow and
occupancy under low traffic conditions.
Thus, the accuracy of the estimated travel time during low traffic conditions is questionable in
the N-D model because the estimated travel time in that model is a function of the measured
flow. This issue was not addressed in the original work because the data analyzed in their study
was restricted to only the morning peak traffic flow. In this dissertation the analysis was carried
out for continuous 24 hours, which included very low traffic flow conditions also.
On the other hand, as discussed earlier, methods based on speed values tend to have more bias in
the resulting travel time during congested periods due to the failure to capture the variations
occurring between the detector stations. However, under low traffic flow conditions they are
more suitable than the methods based on flow. Hence, in the present study, the use of the
extrapolation method is suggested for low flow conditions so that accuracy can be maintained
consistently under all varying flow conditions. A cut-off value of 50 vehicles per 2-minute
interval over all the three lanes added together is set for this data based on the HCM
recommendation. Thus, when the flow is less than 50 vehicles per 2 minutes over the three lanes,
the method based on speed values will be used, and the developed model based on flow values
will be used otherwise. This can be algorithmically represented as follows:
if flow < 50 vehicles/2 minute/3 lane,
then use extrapolation method (Equation 2.1, 2.2, or 2.3);
else
use developed method (Equation 5.18 or 5.21). (5.26)
In summary, the major modifications to the N-D model in the application of travel time
estimation can be summarized as follows:
1. The original N-D model is based on the premise that the loop detector data follow the
conservation of vehicle principle at all times. However, in reality, the loop detector data
collected from the field show serious violation of this constraint. The N-D model was
illustrated using data for a short period of time (4 hours), and hence used adjustment
135
factors for correcting this discrepancy. In this dissertation, a more systematic method
based on a nonlinear optimization by GRG method is used for correcting this
discrepancy.
2. The relation for travel time estimation during normal-flow conditions is modified such
that the travel time will be estimated based on all the vehicles entering in that time
period, instead of considering only those vehicles which enter and exit in the same
period, as in the N-D model.
3. The N-D model calculated density from the cumulative flow values. This was found to
be a good method to calculate density, only if the quality of the flow data is assured. In
cases where the ILD data have errors, the calculation of density from occupancy is a
better choice and hence this method is adopted in this dissertation.
4. The use of an extrapolation method is suggested for very low traffic flow conditions so
that accuracy can be maintained under varying traffic flow conditions.
5.4 RESULTS AND DISCUSSION
The results are illustrated using the data collected from link 1 and link 2 of the I-35 test bed
shown in Figures 3.10. The ILD data from all the 5 days from February 10 to February 14, 2003
are used. The effects of each of the suggested modifications will be illustrated first using the ILD
data. Next, the validation of the modified model will be carried out using AVI data collected
from the same location as the ILD data. Validation will also be carried out using simulated data
generated using CORSIM. Finally, results obtained from a comparative study of the performance
of the proposed model with the extrapolation method using both field data and simulated data are
shown.
5.4.1 Influence of the Modifications on Travel Time Estimation
Figures 5.5 and 5.6 show sample plots of travel time estimated for link 1 and 2 by N-D model
using unmodified ILD data before the optimization is carried out.
136
0
2000
4000
6000
8000
10000
12000
14000
16000
0
:
0
2
:
0
0
2
:
0
2
:
0
0
4
:
0
2
:
0
0
6
:
0
2
:
0
0
8
:
0
2
:
0
0
1
0
:
0
2
:
0
0
1
2
:
0
2
:
0
0
1
4
:
0
2
:
0
0
1
6
:
0
2
:
0
0
1
8
:
0
2
:
0
0
2
0
:
0
2
:
0
0
2
2
:
0
2
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Fig. 5.5 Travel time estimated by the N-D model using actual data for link 1 on
February 11, 2003, for 24 hours
-14000
-12000
-10000
-8000
-6000
-4000
-2000
0
0
:
0
2
:
0
0
2
:
0
2
:
0
0
4
:
0
2
:
0
0
6
:
0
2
:
0
0
8
:
0
2
:
0
0
1
0
:
0
2
:
0
0
1
2
:
0
2
:
0
0
1
4
:
0
2
:
0
0
1
6
:
0
2
:
0
0
1
8
:
0
2
:
0
0
2
0
:
0
2
:
0
0
2
2
:
0
2
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Fig. 5.6 Travel time estimated by the N-D model using actual data for link 2 on
February 11, 2003, for 24 hours
137
It can be seen that the estimated travel time for link 1 varies from 0 to 15,000 seconds
and that the travel time for link 2 came out to be negative for the whole 24 hours.
Figures 5.7 and 5.8 depict the same estimated travel time calculated using the data after the
optimization using the GRG technique described in Chapter IV. It can be seen that the range of
travel time has improved, even though the values are still unreasonably high. In the field data,
the speed variation was from 5 mph to 80 mph, and the corresponding travel time can only vary
from 22.5 seconds to 360 seconds for a 0.5-mile section. In Figures 5.7 and 5.8, it can be seen
that the travel time estimated varies from 0 to 600 seconds, showing the need for further
improvement.
0
100
200
300
400
500
600
700
0
:
0
2
:
0
0
2
:
0
2
:
0
0
4
:
0
2
:
0
0
6
:
0
2
:
0
0
8
:
0
2
:
0
0
1
0
:
0
2
:
0
0
1
2
:
0
2
:
0
0
1
4
:
0
2
:
0
0
1
6
:
0
2
:
0
0
1
8
:
0
2
:
0
0
2
0
:
0
2
:
0
0
2
2
:
0
2
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Fig. 5.7 Travel time estimated by the N-D model using optimized data for link 1 on
February 11, 2003, for 24 hours
138
0
100
200
300
400
500
600
0
:
0
2
:
0
0
2
:
0
2
:
0
0
4
:
0
2
:
0
0
6
:
0
2
:
0
0
8
:
0
2
:
0
0
1
0
:
0
2
:
0
0
1
2
:
0
2
:
0
0
1
4
:
0
2
:
0
0
1
6
:
0
2
:
0
0
1
8
:
0
2
:
0
0
2
0
:
0
2
:
0
0
2
2
:
0
2
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Fig. 5.8 Travel time estimated by the N-D model using optimized data for link 2 on
February 11, 2003, for 24 hours
Modifications II and III were carried out next by replacing the normal-flow model of Nam and
Drew by Equation 5.21 and by calculating the density from occupancy. The resulting graphs are
shown in Figures 5.9 and 5.10. It can be seen that the results have improved and the estimated
travel times are within reasonable limits of 22.5 and 360 seconds calculated earlier. However, it
can be seen that there is a large fluctuation in the estimated travel time under very low flow
conditions. The corresponding flow values from locations 1, 2 and 3 are shown in Figures 5.11,
5.12, and 5.13. It can be seen that the fluctuations in travel time happen when the volume is less
than 50 vehicles per 2-minute interval.
139
0
10
20
30
40
50
60
70
80
90
100
0
:
0
2
:
0
0
2
:
0
2
:
0
0
4
:
0
2
:
0
0
6
:
0
2
:
0
0
8
:
0
2
:
0
0
1
0
:
0
2
:
0
0
1
2
:
0
2
:
0
0
1
4
:
0
2
:
0
0
1
6
:
0
2
:
0
0
1
8
:
0
2
:
0
0
2
0
:
0
2
:
0
0
2
2
:
0
2
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Fig. 5.9 Estimated travel time on link 1 with density calculated from occupancy values on
February 11, 2003, for 24 hours
0
20
40
60
80
100
120
0
:
0
2
:
0
0
2
:
0
2
:
0
0
4
:
0
2
:
0
0
6
:
0
2
:
0
0
8
:
0
2
:
0
0
1
0
:
0
2
:
0
0
1
2
:
0
2
:
0
0
1
4
:
0
2
:
0
0
1
6
:
0
2
:
0
0
1
8
:
0
2
:
0
0
2
0
:
0
2
:
0
0
2
2
:
0
2
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Fig. 5.10 Estimated travel time on link 2 with density calculated from occupancy values on
February 11, 2003, for 24 hours
140
0
50
100
150
200
250
300
0
:
0
2
:
0
0
1
:
2
4
:
0
0
2
:
4
6
:
0
0
4
:
0
8
:
0
0
5
:
3
0
:
0
0
6
:
5
2
:
0
0
8
:
1
4
:
0
0
9
:
3
6
:
0
0
1
0
:
5
8
:
0
0
1
2
:
2
0
:
0
0
1
3
:
4
2
:
0
0
1
5
:
0
4
:
0
0
1
6
:
2
6
:
0
0
1
7
:
4
8
:
0
0
1
9
:
1
0
:
0
0
2
0
:
3
2
:
0
0
2
1
:
5
4
:
0
0
2
3
:
1
6
:
0
0
Time (hh:mm:ss)
V
o
l
u
m
e
(
V
e
h
i
c
l
e
s
/
2
m
i
n
u
t
e
)
Fig. 5.11 Volume distribution on February 11, 2003, for 24 hours at location 1
0
50
100
150
200
250
300
0
:
0
2
:
0
0
1
:
2
2
:
0
0
2
:
4
2
:
0
0
4
:
0
2
:
0
0
5
:
2
2
:
0
0
6
:
4
2
:
0
0
8
:
0
2
:
0
0
9
:
2
2
:
0
0
1
0
:
4
2
:
0
0
1
2
:
0
2
:
0
0
1
3
:
2
2
:
0
0
1
4
:
4
2
:
0
0
1
6
:
0
2
:
0
0
1
7
:
2
2
:
0
0
1
8
:
4
2
:
0
0
2
0
:
0
2
:
0
0
2
1
:
2
2
:
0
0
2
2
:
4
2
:
0
0
Time (hh:mm:ss)
V
o
l
u
m
e
(
V
e
h
i
c
l
e
s
/
2
m
i
n
u
t
e
)
Fig. 5.12 Volume distribution on February 11, 2003, for 24 hours at location 2
141
0
50
100
150
200
250
300
0
:
0
2
:
0
0
1
:
2
2
:
0
0
2
:
4
2
:
0
0
4
:
0
2
:
0
0
5
:
2
2
:
0
0
6
:
4
2
:
0
0
8
:
0
2
:
0
0
9
:
2
2
:
0
0
1
0
:
4
2
:
0
0
1
2
:
0
2
:
0
0
1
3
:
2
2
:
0
0
1
4
:
4
2
:
0
0
1
6
:
0
2
:
0
0
1
7
:
2
2
:
0
0
1
8
:
4
2
:
0
0
2
0
:
0
2
:
0
0
2
1
:
2
2
:
0
0
2
2
:
4
2
:
0
0
Time (hh:mm:ss)
V
o
l
u
m
e
(
V
e
h
i
c
l
e
s
/
2
m
i
n
u
t
e
)
Fig. 5.13 Volume distribution on February 11, 2003, for 24 hours at location 3
To take into account this fluctuation in the estimated travel time, the modification of combining
the extrapolation method at low traffic flow conditions (modification IV) was carried out. The
resulting travel time values are shown in Figures 5.14 and 5.15.
0
10
20
30
40
50
60
70
80
90
100
0
:
0
2
:
0
0
2
:
0
2
:
0
0
4
:
0
2
:
0
0
6
:
0
2
:
0
0
8
:
0
2
:
0
0
1
0
:
0
2
:
0
0
1
2
:
0
2
:
0
0
1
4
:
0
2
:
0
0
1
6
:
0
2
:
0
0
1
8
:
0
2
:
0
0
2
0
:
0
2
:
0
0
2
2
:
0
2
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Fig. 5.14 Effect of combining extrapolation method on estimated travel time for low flow
conditions on link 1 on February 11, 2003
142
0
20
40
60
80
100
120
0
:
0
2
:
0
0
2
:
0
2
:
0
0
4
:
0
2
:
0
0
6
:
0
2
:
0
0
8
:
0
2
:
0
0
1
0
:
0
2
:
0
0
1
2
:
0
2
:
0
0
1
4
:
0
2
:
0
0
1
6
:
0
2
:
0
0
1
8
:
0
2
:
0
0
2
0
:
0
2
:
0
0
2
2
:
0
2
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Fig. 5.15 Effect of combining extrapolation method on estimated travel time for low flow
conditions on the link 2 on February 11, 2003
An overall comparison of the performance of the N-D model and the proposed model using field
data after optimization for a 24-hour period is shown in Figure 5.16 for February 11, 2003. It is
clear from the graph that the performance has improved by adopting the suggested
modifications. Corresponding AVI data are also plotted in Figure 5.16 to illustrate the
improvement in the quality of the estimated travel time. A comparison with the corresponding
AVI data shows that the travel time estimated by the proposed model captures similar trends
during the whole 24-hour period. MAPE was calculated for the estimated travel time using the
N-D model and the proposed model with respect to AVI data and it was found that the error
reduced from 98.82% to 3.91%.
143
0
100
200
300
400
500
600
0
:
0
2
:
0
0
1
:
2
4
:
0
0
2
:
4
6
:
0
0
4
:
0
8
:
0
0
5
:
3
0
:
0
0
6
:
5
2
:
0
0
8
:
1
4
:
0
0
9
:
3
6
:
0
0
1
0
:
5
8
:
0
0
1
2
:
2
0
:
0
0
1
3
:
4
2
:
0
0
1
5
:
0
4
:
0
0
1
6
:
2
6
:
0
0
1
7
:
4
8
:
0
0
1
9
:
1
0
:
0
0
2
0
:
3
2
:
0
0
2
1
:
5
4
:
0
0
2
3
:
1
6
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
o
n
d
s
)
Fig. 5.16 Comparison of N-D model and proposed model using optimized field data for
February 11, 2003
A comparison of results was carried out using simulated data also. Some of the specific results
related to the effect of the selected modifications on the travel time estimation will be shown in
the following section using simulated data. As explained in modification II, the N-D model
ignores a portion of the vehicles from the estimation of travel time at the transition period from
off-peak to peak. This leads to more error in the estimated travel time at the transition period.
m(t
n
), which is defined as the number of vehicles which enter and exit the link under
consideration in the same time period (Equation 5.13), is the measure used by Nam and Drew to
classify normal and congested-flow. Thus, as the transition period start, the value of m(t
n
)
should start decreasing. Once the value of m(t
n
) is less than zero, the flow is considered as
congested-flow. The transition period, hence, is considered as normal-flow in the N-D model,
and the travel time is calculated based on those vehicles that were able to enter and exit in the
same time period. Thus, the portion of vehicles that were not able to exit in the same period gets
ignored, leading to more error in the estimated travel time. This was taken into account in the
proposed model by using the modified Equation 5.21. The variation in the value of m(t
n
) and the
corresponding error in the estimated travel time are plotted in Figure 5.17 for both the N-D
N-D model Proposed
d l
AV
144
model and the developed model using simulated data from CORSIM. The error in estimated
travel time is calculated as the absolute difference between the estimated travel time and the
travel time calculated directly from simulation. The travel time was estimated using the N-D
model and the developed model, and the errors were calculated. It can be seen that the error
values increase with decrease in m(t
n
) in the case of N-D model, whereas the error of the
proposed model remains approximately constant over time.
0
5
10
15
20
25
30
35
1
6
:
2
4
:
0
0
1
6
:
2
6
:
0
0
1
6
:
2
8
:
0
0
1
6
:
3
0
:
0
0
1
6
:
3
2
:
0
0
1
6
:
3
4
:
0
0
1
6
:
3
6
:
0
0
1
6
:
3
8
:
0
0
1
6
:
4
0
:
0
0
1
6
:
4
2
:
0
0
1
6
:
4
4
:
0
0
1
6
:
4
6
:
0
0
1
6
:
4
8
:
0
0
1
6
:
5
0
:
0
0
1
6
:
5
2
:
0
0
1
6
:
5
4
:
0
0
1
6
:
5
6
:
0
0
Time (hh:mm:ss)
A
b
s
o
l
u
t
e
D
i
f
f
e
r
e
n
c
e
b
e
t
w
e
e
n
E
s
t
i
m
a
t
e
d
a
n
d
A
c
t
u
a
l
T
r
a
v
e
l
T
i
m
e
(
s
e
c
s
)
m(tn)
N-D method error
Proposed method error
Fig. 5.17 Variation in the performance of the N-D model and the developed model with varying
values of m(t
n
) during transition from off-peak to peak condition
Similarly, the effect of modification I is tested using simulated data, and the results are shown in
Figure 5.18. This figure illustrates the effect of optimization on the accuracy of the estimated
travel time using simulated data. The travel time calculated before and after the optimization
along with the actual travel time obtained from simulation is shown. This illustration is for an
introduced error of 10% in the flow values. The optimization was carried out as detailed in
Chapter IV on the data with introduced error. The improvement in the performance and the
145
increase in the accuracy of the estimated travel time can be observed in the diagram. The MAPE
value was calculated and was found to be decreasing from 15.94% to 2.91% with the use of
optimized data.
20
25
30
35
40
45
50
55
6
:
1
2
:
0
0
6
:
2
7
:
0
0
6
:
4
2
:
0
0
6
:
5
7
:
0
0
7
:
1
2
:
0
0
7
:
2
7
:
0
0
7
:
4
2
:
0
0
7
:
5
7
:
0
0
8
:
1
2
:
0
0
8
:
2
7
:
0
0
8
:
4
2
:
0
0
8
:
5
7
:
0
0
9
:
1
2
:
0
0
9
:
2
7
:
0
0
9
:
4
2
:
0
0
9
:
5
7
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
o
n
d
s
)
Travel time after optimization
Travel time from simulation
Travel time before optimization
Fig. 5.18 Effect of optimization on the estimated travel time using simulated data
An overall comparison of the performance of the N-D model and the proposed model using
simulated data is shown in Figure 5.19 and it illustrates the performance of the proposed model
under a transition period using simulated data. The analysis was carried out for a 2-hour period,
and the flow values were generated based on field values. The true travel time from the
simulation is plotted along with the values estimated by the models. Estimation was carried out
using the N-D model and the proposed model. It can be seen that the travel time estimated by the
proposed model is in close agreement with the simulation travel time, with an MAPE of 6.58%.
In the case of the N-D model, the MAPE was considerably higher at 48.97%.
1
4
6
0
5
10
15
20
25
30
35
40
45
50
1
6
:
0
1
:
0
0
1
6
:
0
4
:
0
0
1
6
:
0
7
:
0
0
1
6
:
1
0
:
0
0
1
6
:
1
3
:
0
0
1
6
:
1
6
:
0
0
1
6
:
1
9
:
0
0
1
6
:
2
2
:
0
0
1
6
:
2
5
:
0
0
1
6
:
2
8
:
0
0
1
6
:
3
1
:
0
0
1
6
:
3
4
:
0
0
1
6
:
3
7
:
0
0
1
6
:
4
0
:
0
0
1
6
:
4
3
:
0
0
1
6
:
4
6
:
0
0
1
6
:
4
9
:
0
0
1
6
:
5
2
:
0
0
1
6
:
5
5
:
0
0
1
6
:
5
8
:
0
0
1
7
:
0
1
:
0
0
1
7
:
0
4
:
0
0
1
7
:
0
7
:
0
0
1
7
:
1
0
:
0
0
1
7
:
1
3
:
0
0
1
7
:
1
6
:
0
0
1
7
:
1
9
:
0
0
1
7
:
2
2
:
0
0
1
7
:
2
5
:
0
0
1
7
:
2
8
:
0
0
1
7
:
3
1
:
0
0
1
7
:
3
4
:
0
0
1
7
:
3
7
:
0
0
1
7
:
4
0
:
0
0
1
7
:
4
3
:
0
0
1
7
:
4
6
:
0
0
1
7
:
4
9
:
0
0
1
7
:
5
2
:
0
0
1
7
:
5
5
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Travel time from simulation
Proposed model travel time
MAPE = 6.58%
N-D model travel time
MAPE = 48.97%
Fig. 5.19 Overall comparison of the proposed model with N-D model using simulated data
147
5.4.2 Validation of the Developed Model Using Field Data
The model results were validated using field data by comparing them with the corresponding
direct travel time obtained from AVI. The results obtained for selected dates are shown in
Figures 5.20 to 5.22. It may be seen that the travel time obtained from AVI and that calculated
using the developed model are in good agreement for all days. The MAPE between the estimated
travel time and AVI travel time were 1.54, 2.53, and 2.38 % for February 10
th
, 13
th
and 14
th
respectively, which are shown in Figures 5.20 to 5.22.
0
10
20
30
40
50
60
70
80
90
0
:
0
2
:
0
0
1
:
2
2
:
0
0
2
:
4
2
:
0
0
4
:
0
2
:
0
0
5
:
2
2
:
0
0
6
:
4
2
:
0
0
8
:
0
2
:
0
0
9
:
2
2
:
0
0
1
0
:
4
2
:
0
0
1
2
:
0
2
:
0
0
1
3
:
2
2
:
0
0
1
4
:
4
2
:
0
0
1
6
:
0
2
:
0
0
1
7
:
2
2
:
0
0
1
8
:
4
2
:
0
0
2
0
:
0
2
:
0
0
2
1
:
2
2
:
0
0
2
2
:
4
2
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Developed Model
AVI
Fig. 5.20 Estimated travel time with AVI for 24 hours on
February 10, 2003 in link 1
148
0
10
20
30
40
50
60
70
80
90
100
0
:
0
2
:
0
0
1
:
2
2
:
0
0
2
:
4
2
:
0
0
4
:
0
2
:
0
0
5
:
2
2
:
0
0
6
:
4
2
:
0
0
8
:
0
2
:
0
0
9
:
2
2
:
0
0
1
0
:
4
2
:
0
0
1
2
:
0
2
:
0
0
1
3
:
2
2
:
0
0
1
4
:
4
2
:
0
0
1
6
:
0
2
:
0
0
1
7
:
2
2
:
0
0
1
8
:
4
2
:
0
0
2
0
:
0
2
:
0
0
2
1
:
2
2
:
0
0
2
2
:
4
2
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Developed Model
AVI
Fig. 5.21 Estimated travel time with AVI for 24 hours on
February 13, 2003 in link 1
0
10
20
30
40
50
60
70
80
90
100
0
:
0
2
:
0
0
1
:
4
2
:
0
0
3
:
2
2
:
0
0
5
:
0
2
:
0
0
6
:
4
2
:
0
0
8
:
2
2
:
0
0
1
0
:
0
2
:
0
0
1
1
:
4
2
:
0
0
1
3
:
2
2
:
0
0
1
5
:
0
2
:
0
0
1
6
:
4
2
:
0
0
1
8
:
2
2
:
0
0
2
0
:
0
2
:
0
0
2
1
:
4
2
:
0
0
2
3
:
2
2
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Developed Model
AVI
Fig. 5.22 Estimated travel time with AVI for 24 hours on
February 14, 2003 in link 1
149
The performance of the model during the peak and transition periods is enlarged and shown in
Figure 5.23 for February 10, 2003. The MAPE was calculated and was found to be 3.87%. It
can be seen that the performance of the model is consistently satisfactory under peak and
transition periods.
0
10
20
30
40
50
60
70
80
1
7
:
3
0
:
0
0
1
7
:
4
0
:
0
0
1
7
:
5
0
:
0
0
1
8
:
0
0
:
0
0
1
8
:
1
0
:
0
0
1
8
:
2
0
:
0
0
1
8
:
3
0
:
0
0
1
8
:
4
0
:
0
0
1
8
:
5
0
:
0
0
1
9
:
0
0
:
0
0
1
9
:
1
0
:
0
0
1
9
:
2
0
:
0
0
1
9
:
3
0
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Model
AVI
Fig. 5.23 Estimated travel time with AVI for peak and transition periods (February 10, 2003) in
link 1
The performance of the model during normal-flow condition is shown in Figure 5.24. The
calculated MAPE was 0.75%, showing a good performance of the model under off-peak period.
MAPE = 3.87
150
20
22
24
26
28
30
32
1
2
:
3
0
:
0
0
1
2
:
4
2
:
0
0
1
2
:
5
4
:
0
0
1
3
:
0
6
:
0
0
1
3
:
1
8
:
0
0
1
3
:
3
0
:
0
0
1
3
:
4
2
:
0
0
1
3
:
5
4
:
0
0
1
4
:
0
6
:
0
0
1
4
:
1
8
:
0
0
1
4
:
3
0
:
0
0
1
4
:
4
2
:
0
0
1
4
:
5
4
:
0
0
1
5
:
0
6
:
0
0
1
5
:
1
8
:
0
0
1
5
:
3
0
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Developed Model
AVI
Fig. 5.24 Estimated travel time with AVI for an off-peak period (February 10, 2003) in link 1
However, it should be noted that some assumptions are needed to compare AVI travel time with
the travel time estimated from the loop detector data. First of all, the AVI data samples a
percentage of the vehicle population and gives the travel time of these selected vehicles. In the
case of loop detectors, the data are collected from all the vehicles that cross it, and an average
travel time for the interval under consideration is calculated. For example, in the present study
for the analyzed 5 days data, the ILD data available at 2-minute interval is 720 observations per
day and the corresponding AVI data available varied from 100 to 200.
Also, the time interval of the reported loop data and the time of the AVI data may not match
exactly. For example, in the February 10, 2003 data, the loop data are collected from midnight
12:00:00 at 2-minute intervals. The first AVI data reported on that day entered the link at
01:38:22 and exited at 01:42:52. The corresponding loop data available are from 01:38:00 to
01:42:00. Also, the detector location and the AVI location may not match exactly. For example,
the starting milepost of the ILD in the present study was 159.500, and the nearest AVI station
MAPE = 0.75
151
was at 158.989. Thus, the data need to be extrapolated to match with each other spatially and
temporally.
5.4.3 Validation of the Model Using Simulated Data
Due to the above-mentioned reasons, validation of the models was carried out using simulated
data also. A traffic network similar to the field test bed was created in CORSIM and ILDs
were placed every 0.5 miles, to be comparable to field condition. The vehicles were also
generated based on the field values to mimic the field scenario. Traffic volumes from the field
were given as input to CORSIM at every 30-minute interval. Detectors were placed in each link
to collect the flow, speed, and occupancy rate. Data were generated for 2 hours, which included
both peak and off-peak flows. These data were used for checking the validity of the proposed
model. The detector output was reported in the OUT file of CORSIM and was used to get the
flow, occupancy, and speed values. Travel time was estimated based on these flow, occupancy
and speed values and was compared to the travel time given by the simulation. The binary .TSD
file from CORSIM, which contains the snap shot data at every time step, was used to calculate
the real travel time of individual vehicles from simulation as detailed in Chapter III.
Figure 5.25 illustrate the performance of the developed model during the off-peak period using
simulated data. The data were simulated for 4 hours during evening off-peak flow. It can be seen
that the travel time estimated by the proposed model follows the travel time calculated directly
from CORSIM. The MAPE was found to be 1.8% in this case.
152
20
22
24
26
28
30
6
:
0
0
6
:
1
0
6
:
2
0
6
:
3
0
6
:
4
0
6
:
5
0
7
:
0
0
7
:
1
0
7
:
2
0
7
:
3
0
7
:
4
0
7
:
5
0
8
:
0
0
8
:
1
0
8
:
2
0
8
:
3
0
8
:
4
0
8
:
5
0
9
:
0
0
9
:
1
0
9
:
2
0
9
:
3
0
9
:
4
0
Time (hh:mm)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
s
)
Travel time from simulation
Travel time from proposed model
Fig. 5.25 Validation of the Travel time estimation model using simulation data for the off-peak
condition
Figure 5.26 shows a similar comparison where the data were simulated for 2 hours. The true
travel time from the simulation is plotted along with the estimated values. Again, the estimated
travel time by the developed model follows the trends in the actual data. The MAPE came to be
6.58 % in this case.
MAPE = 1.8
153
0
5
10
15
20
25
30
35
40
45
50
1
6
:
0
1
:
0
0
1
6
:
0
7
:
0
0
1
6
:
1
3
:
0
0
1
6
:
1
9
:
0
0
1
6
:
2
5
:
0
0
1
6
:
3
1
:
0
0
1
6
:
3
7
:
0
0
1
6
:
4
3
:
0
0
1
6
:
4
9
:
0
0
1
6
:
5
5
:
0
0
1
7
:
0
1
:
0
0
1
7
:
0
7
:
0
0
1
7
:
1
3
:
0
0
1
7
:
1
9
:
0
0
1
7
:
2
5
:
0
0
1
7
:
3
1
:
0
0
1
7
:
3
7
:
0
0
1
7
:
4
3
:
0
0
1
7
:
4
9
:
0
0
1
7
:
5
5
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Travel time from simulation
Estimated travel time
Fig. 5.26 Validation of the travel time estimation model using simulation data for peak condition
5.4.4 Comparison with Extrapolation Results
Though the extrapolation methods, which were discussed in Chapter II, have many drawbacks,
they are the most popular methods adopted in the field, and hence a comparison was carried out
with the results obtained from the extrapolation as well. The travel time estimated using the
proposed model is compared with the results from extrapolation methods. The three different
extrapolation methods as discussed in Chapter II (Equation 2.1, 2.2, or 2.3) were analyzed, and
the most suitable one was chosen for further comparison. The travel time estimated by the three
extrapolation methods is shown in Figures 5.27 and 5.28 for links 1 and 2. Method 1 (Equation
2.1) assumes the effect of speed from each detector for half the distance, method 2 (Equation
2.2) considers the average speed, and method 3 (Equation 2.3) take the minimum speed out of
the two detectors, as explained in Chapter II.
MAPE = 6.5
154
From the results obtained at different sites it was found that method 3 tended to overestimate the
travel time compared to methods 1 and 2. However, there was no significant difference between
the performance of method 1 and 2, and any one of these can be used for further comparison
(Eisele 2001). In this dissertation, method 2, which considers the average speed of the two
detectors, is used.
A comparison of the travel time estimated by the proposed method and the extrapolation method
with the AVI travel time is shown in Figure 5.29 for February 13, 2003. As discussed previously,
it may be difficult to reach any solid conclusions by comparing the AVI travel time and the
travel time calculated from loop data. However, it can be used for checking whether the
estimated data follow the trend in the actual data. It can be seen that the travel time estimated by
the developed model is able to capture the variations in the travel time more efficiently than the
extrapolation methods. Also, it can be seen that at peak flow conditions, the extrapolation
method overestimated the travel time due to the failure to capture the change in speed within the
section.
1
5
5
0
20
40
60
80
100
120
140
160
180
200
0
:
0
2
:
0
0
0
:
4
4
:
0
0
1
:
2
6
:
0
0
2
:
0
8
:
0
0
2
:
5
0
:
0
0
3
:
3
2
:
0
0
4
:
1
4
:
0
0
4
:
5
6
:
0
0
5
:
3
8
:
0
0
6
:
2
0
:
0
0
7
:
0
2
:
0
0
7
:
4
4
:
0
0
8
:
2
6
:
0
0
9
:
0
8
:
0
0
9
:
5
0
:
0
0
1
0
:
3
2
:
0
0
1
1
:
1
4
:
0
0
1
1
:
5
6
:
0
0
1
2
:
3
8
:
0
0
1
3
:
2
0
:
0
0
1
4
:
0
2
:
0
0
1
4
:
4
4
:
0
0
1
5
:
2
6
:
0
0
1
6
:
0
8
:
0
0
1
6
:
5
0
:
0
0
1
7
:
3
2
:
0
0
1
8
:
1
4
:
0
0
1
8
:
5
6
:
0
0
1
9
:
3
8
:
0
0
2
0
:
2
0
:
0
0
2
1
:
0
2
:
0
0
2
1
:
4
4
:
0
0
2
2
:
2
6
:
0
0
2
3
:
0
8
:
0
0
2
3
:
5
0
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
s
)
Method 1
Method 2
Method 3
Fig. 5.27 Travel time estimated by different extrapolation methods for link 1 on February 11, 2003
1
5
6
0
20
40
60
80
100
120
140
160
0
:
0
2
:
0
0
0
:
4
2
:
0
0
1
:
2
2
:
0
0
2
:
0
2
:
0
0
2
:
4
2
:
0
0
3
:
2
2
:
0
0
4
:
0
2
:
0
0
4
:
4
2
:
0
0
5
:
2
2
:
0
0
6
:
0
2
:
0
0
6
:
4
2
:
0
0
7
:
2
2
:
0
0
8
:
0
2
:
0
0
8
:
4
2
:
0
0
9
:
2
2
:
0
0
1
0
:
0
2
:
0
0
1
0
:
4
2
:
0
0
1
1
:
2
2
:
0
0
1
2
:
0
2
:
0
0
1
2
:
4
2
:
0
0
1
3
:
2
2
:
0
0
1
4
:
0
2
:
0
0
1
4
:
4
2
:
0
0
1
5
:
2
2
:
0
0
1
6
:
0
2
:
0
0
1
6
:
4
2
:
0
0
1
7
:
2
2
:
0
0
1
8
:
0
2
:
0
0
1
8
:
4
2
:
0
0
1
9
:
2
2
:
0
0
2
0
:
0
2
:
0
0
2
0
:
4
2
:
0
0
2
1
:
2
2
:
0
0
2
2
:
0
2
:
0
0
2
2
:
4
2
:
0
0
2
3
:
2
2
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
s
)
Method 1
Method 2
Method 3
Fig. 5.28 Travel time estimated by different extrapolation methods for link 2 on February 11, 2003
1
5
7
0
20
40
60
80
100
120
0
:
0
2
:
0
0
0
:
3
8
:
0
0
1
:
1
4
:
0
0
1
:
5
0
:
0
0
2
:
2
6
:
0
0
3
:
0
2
:
0
0
3
:
3
8
:
0
0
4
:
1
4
:
0
0
4
:
5
0
:
0
0
5
:
2
6
:
0
0
6
:
0
2
:
0
0
6
:
3
8
:
0
0
7
:
1
4
:
0
0
7
:
5
0
:
0
0
8
:
2
6
:
0
0
9
:
0
2
:
0
0
9
:
3
8
:
0
0
1
0
:
1
4
:
0
0
1
0
:
5
0
:
0
0
1
1
:
2
6
:
0
0
1
2
:
0
2
:
0
0
1
2
:
3
8
:
0
0
1
3
:
1
4
:
0
0
1
3
:
5
0
:
0
0
1
4
:
2
6
:
0
0
1
5
:
0
2
:
0
0
1
5
:
3
8
:
0
0
1
6
:
1
4
:
0
0
1
6
:
5
0
:
0
0
1
7
:
2
6
:
0
0
1
8
:
0
2
:
0
0
1
8
:
3
8
:
0
0
1
9
:
1
4
:
0
0
1
9
:
5
0
:
0
0
2
0
:
2
6
:
0
0
2
1
:
0
2
:
0
0
2
1
:
3
8
:
0
0
2
2
:
1
4
:
0
0
2
2
:
5
0
:
0
0
2
3
:
2
6
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Travel time from extrapolation
Travel time from model
AVI travel time
Fig. 5.29 Comparison of estimated travel time from extrapolation method, developed method, and AVI using field data on
February 13, 2003
158
Figures 5.30 to 5.33 show graphs comparing the estimated travel time by extrapolation method
and the developed method separately for the off-peak, peak, and transition periods on February
11, 2003 on link 2. The plots are continuous from 15:00:00 to 19:00:00, during which time the
flow varied from off-peak to peak and then to off-peak values. From the plots it can be seen that
the two values match under off-peak conditions. The mean absolute difference (MAD) between
the travel time estimated by the extrapolation method and the proposed model is calculated using
Equation 5.27.
MAD =
mod extrapolated el
N
. (5.27)
The MAD was found to be 2.71 from 14:00:00 to 15:00:00 as shown in Figure 5.30. However,
during the transition period and peak flow conditions, the values differ reasonably, with the
MAD going up to 14.29. This agrees with the findings from previous studies that the
extrapolation method fails to capture the changes in flow during congested conditions. The
availability of AVI data for the corresponding hours was scarce, and so these data were not
included in the plots.
Figure 5.30 displays the travel time estimated by the selected extrapolation method and the travel
time estimated by the developed method in the afternoon off-peak hours from 15:00:00 to
16:00:00 on February 11, 2003. It can be seen that both the travel times are close to each other,
with an absolute difference of 2.71 between the values.
159
0
20
40
60
80
100
120
140
160
1
5
:
0
0
:
0
0
1
5
:
0
4
:
0
0
1
5
:
0
8
:
0
0
1
5
:
1
2
:
0
0
1
5
:
1
6
:
0
0
1
5
:
2
0
:
0
0
1
5
:
2
4
:
0
0
1
5
:
2
8
:
0
0
1
5
:
3
2
:
0
0
1
5
:
3
6
:
0
0
1
5
:
4
0
:
0
0
1
5
:
4
4
:
0
0
1
5
:
4
8
:
0
0
1
5
:
5
2
:
0
0
1
5
:
5
6
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
s
)
Extrapolation
Proposed
Fig. 5.30 Comparison of extrapolation and developed model results during afternoon
off-peak hours
Figure 5.31 displays the travel time estimated by the selected extrapolation method and the travel
time estimated by the developed method in the afternoon transition period from off-peak to peak
on February 11, 2003. The MAD value was 7.01 and it can be seen that both the travel times are
close to each other until the flow increases.
0
20
40
60
80
100
120
140
160
1
6
:
0
0
:
0
0
1
6
:
0
4
:
0
0
1
6
:
0
8
:
0
0
1
6
:
1
2
:
0
0
1
6
:
1
6
:
0
0
1
6
:
2
0
:
0
0
1
6
:
2
4
:
0
0
1
6
:
2
8
:
0
0
1
6
:
3
2
:
0
0
1
6
:
3
6
:
0
0
1
6
:
4
0
:
0
0
1
6
:
4
4
:
0
0
1
6
:
4
8
:
0
0
1
6
:
5
2
:
0
0
1
6
:
5
6
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
s
)Extrapolation
Proposed
Fig. 5.31 Comparison of extrapolation and developed model results during the start of evening
peak hours
160
Figure 5.32 displays the travel time estimated by the selected extrapolation method and the travel
time estimated by the developed method in the afternoon peak period from 16:00:00 to 17:00:00
on February 11, 2003. The MAD came to be 14.29, showing that both the travel times differ
considerably from each other.
0
20
40
60
80
100
120
140
160
1
7
:
0
0
:
0
0
1
7
:
0
4
:
0
0
1
7
:
0
8
:
0
0
1
7
:
1
2
:
0
0
1
7
:
1
6
:
0
0
1
7
:
2
0
:
0
0
1
7
:
2
4
:
0
0
1
7
:
2
8
:
0
0
1
7
:
3
2
:
0
0
1
7
:
3
6
:
0
0
1
7
:
4
0
:
0
0
1
7
:
4
4
:
0
0
1
7
:
4
8
:
0
0
1
7
:
5
2
:
0
0
1
7
:
5
6
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
o
n
d
s
)
Extrapolation
Proposed
Fig. 5.32 Comparison of extrapolation and developed model results during evening peak hours
Figure 5.33 shows the travel time estimated by the extrapolation method and the travel time
estimated by the developed method in the transition period from peak to off-peak on February
11, 2003. The MAD in this case was 7.23, and it can be seen that both the travel times agree with
each other after the peak flow is over.
161
0
20
40
60
80
100
120
140
160
1
8
:
0
0
:
0
0
1
8
:
0
4
:
0
0
1
8
:
0
8
:
0
0
1
8
:
1
2
:
0
0
1
8
:
1
6
:
0
0
1
8
:
2
0
:
0
0
1
8
:
2
4
:
0
0
1
8
:
2
8
:
0
0
1
8
:
3
2
:
0
0
1
8
:
3
6
:
0
0
1
8
:
4
0
:
0
0
1
8
:
4
4
:
0
0
1
8
:
4
8
:
0
0
1
8
:
5
2
:
0
0
1
8
:
5
6
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
o
n
d
s
)
Extrapolation
Proposed
Fig. 5.33 Comparison of extrapolation and developed model results during transition to evening
off peak hours
The data pertaining to February 10, 2003 were analyzed in a similar manner and the AVI data
were also included. Because the number of AVI data is less compared to ILD data, the analysis
was carried out for a longer duration and the results are shown in Figures 5.34 and 5.35. The
results again confirm that the performance of extrapolation is reducing at peak flow conditions,
whereas the developed model is able to perform uniformly during varying traffic flow
conditions. This can be seen from the calculated MAPE of 1.21 and 1.84 for the developed
model and extrapolation method respectively under off-peak condition and the corresponding
MAPE under congested-flow condition being 4.39 and 6.35.
162
0
10
20
30
40
50
60
70
80
1
0
:
0
0
:
0
0
1
0
:
1
0
:
0
0
1
0
:
2
0
:
0
0
1
0
:
3
0
:
0
0
1
0
:
4
0
:
0
0
1
0
:
5
0
:
0
0
1
1
:
0
0
:
0
0
1
1
:
1
0
:
0
0
1
1
:
2
0
:
0
0
1
1
:
3
0
:
0
0
1
1
:
4
0
:
0
0
1
1
:
5
0
:
0
0
1
2
:
0
0
:
0
0
1
2
:
1
0
:
0
0
1
2
:
2
0
:
0
0
1
2
:
3
0
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
o
n
d
s
)Model
AVI
Extrapolation
Fig. 5.34 Comparison of extrapolation and developed model results with AVI values during off
peak hours on February 10, 2003
0
10
20
30
40
50
60
70
80
1
7
:
3
0
:
0
0
1
7
:
3
8
:
0
0
1
7
:
4
6
:
0
0
1
7
:
5
4
:
0
0
1
8
:
0
2
:
0
0
1
8
:
1
0
:
0
0
1
8
:
1
8
:
0
0
1
8
:
2
6
:
0
0
1
8
:
3
4
:
0
0
1
8
:
4
2
:
0
0
1
8
:
5
0
:
0
0
1
8
:
5
8
:
0
0
1
9
:
0
6
:
0
0
1
9
:
1
4
:
0
0
1
9
:
2
2
:
0
0
1
9
:
3
0
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
o
n
d
s
)
Model
AVI
Extrapolation
Fig. 5.35 Comparison of extrapolation and developed model results with AVI values during peak
and transition periods on February 10, 2003
163
Similar comparisons were also carried out using simulated data. Figure 5.36 shows a comparison
of the extrapolation method and the developed method for the simulated data by CORSIM. The
MAPE values in this case were 6.5 and 48.97% respectively, for the proposed method and
extrapolation method. It can be seen that, as expected, the performance of the extrapolation
reduces as the flow value increases.
0
5
10
15
20
25
30
35
40
45
50
1
6
:
0
1
:
0
0
1
6
:
0
9
:
0
0
1
6
:
1
7
:
0
0
1
6
:
2
5
:
0
0
1
6
:
3
3
:
0
0
1
6
:
4
1
:
0
0
1
6
:
4
9
:
0
0
1
6
:
5
7
:
0
0
1
7
:
0
5
:
0
0
1
7
:
1
3
:
0
0
1
7
:
2
1
:
0
0
1
7
:
2
9
:
0
0
1
7
:
3
7
:
0
0
1
7
:
4
5
:
0
0
1
7
:
5
3
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
s
)
Actual
Extrapolataion
Developed Model
Fig. 5.36 Comparison of the extrapolation method with the developed method using simulated
data
Finally, a comparison of the estimated travel time with the variables obtained from field was
carried out to check the trends in the values. A plot of the estimated travel time using the
developed model is made along with the corresponding occupancy and speed values obtained
from the field and is shown in Figure 5.37. It can be seen that the developed model was able to
estimate the travel time under varying traffic flow conditions.
1
6
4
0
10
20
30
40
50
60
70
80
90
100
0
:
0
2
:
0
0
0
:
4
4
:
0
0
1
:
2
6
:
0
0
2
:
0
8
:
0
0
2
:
5
0
:
0
0
3
:
3
2
:
0
0
4
:
1
4
:
0
0
4
:
5
6
:
0
0
5
:
3
8
:
0
0
6
:
2
0
:
0
0
7
:
0
2
:
0
0
7
:
4
4
:
0
0
8
:
2
6
:
0
0
9
:
0
8
:
0
0
9
:
5
0
:
0
0
1
0
:
3
2
:
0
0
1
1
:
1
4
:
0
0
1
1
:
5
6
:
0
0
1
2
:
3
8
:
0
0
1
3
:
2
0
:
0
0
1
4
:
0
2
:
0
0
1
4
:
4
4
:
0
0
1
5
:
2
6
:
0
0
1
6
:
0
8
:
0
0
1
6
:
5
0
:
0
0
1
7
:
3
2
:
0
0
1
8
:
1
4
:
0
0
1
8
:
5
6
:
0
0
1
9
:
3
8
:
0
0
2
0
:
2
0
:
0
0
2
1
:
0
2
:
0
0
2
1
:
4
4
:
0
0
2
2
:
2
6
:
0
0
2
3
:
0
8
:
0
0
2
3
:
5
0
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Speed
Travel time
Occupancy
Fig. 5.37 Relation between speed, occupancy, and travel time from February 10, 2003
165
5.5 CONCLUDING REMARKS
Travel time estimation from loop detector data has achieved increasing interest with the
development of ITS applications such as in-vehicle route guidance systems and advanced
traveler information systems. Accurate and timely information has to be obtained in a quick
fashion to meet the demands of these real-time applications. At present, travel time estimation is
carried out in the field based on extrapolation methods, assuming a constant speed for the
distance between the detector stations. Studies have shown that the accuracy of the extrapolation
method reduces as the flow increases. This is due to the inability of these methods to capture the
dynamics of traffic in congested conditions. Thus, there is a need for models that can take into
account varying traffic flow conditions.
This dissertation presented several modifications to an existing theoretical model for travel time
estimation on freeways, such that the model can estimate travel time for varying traffic flow
conditions directly from the loop detector data. The approach was designed for analyzing ILD
data for longer intervals of time and was robust enough to suspect or missing data. The system is
based on detector data obtained from the field and the travel time estimation is based on the
traffic flow theory. Simulated data using CORSIM simulation software was used for validating
the results. After the validation, the model was used to estimate travel time from field data. The
travel time estimated is compared with the AVI data collected from the field. The travel time
estimated was also compared to the results obtained from different available methods such as the
extrapolation method. The results indicate the developed model as a promising method to
estimate travel time from loop detector data under varying traffic flow conditions.
166
CHAPTER VI
SHORT-TERM TRAVEL TIME PREDICTION
6.1 INTRODUCTION
After the estimation of travel time from loop detector data was carried out, as explained in
Chapter V, the next and final stage in this dissertation was the prediction of travel time. Travel
time prediction refers to predicting the travel time before a vehicle traverses the link or route of
interest. The ability to predict travel time based on real-time data and historic data, collected by
various systems in transportation networks, is vital to many Intelligent Transportation Systems
(ITS) applications, such as Route Guidance Systems (RGS), Advanced Traveler Information
Systems (ATIS), and Advanced Traffic Management Systems (ATMS).
One of the applications of the above ITS applications is to provide real-time traffic information
to traffic management centers, using which traffic information can be provided back to the
travelers in real-time. The accuracy of this information is important since travelers make
appropriate decisions to bypass congested segments of the network, to change departure times or
destinations etc., based on the information. The travel time information provided to travelers
through ATIS can be classified into three distinct groups: historic, real-time, and predictive.
Historic, as its name implies, is based on archived data, while real-time is based on the current
values obtained from the system. Predictive is the predicted future values calculated using the
real-time or historic information. For pretrip planning and en-route decisions, it is argued that
predicted information would be more useful than real-time or historic information. If the current
or historic traffic values are used, the performance of a given application will be constrained
because by the time the user makes the trip, the situation would have changed. The travel time
prediction becomes very important under such situations where traffic conditions are changing,
such as during transition periods. Then the travel time will be a function of 1) when the driver
arrives at the link in question and 2) how fast travel times are changing. Thus, the methodology
should anticipate the values in the next few minutes under dynamic traffic conditions and inform
travelers accordingly.
167
Previous traffic prediction efforts have used historic and real-time algorithms, time-series and
Kalman filtering models, and Artificial Neural Network (ANN) models. More details of these
methods and the literature related to the application of these methods on travel time prediction is
detailed in Chapter II. However, there is no consensus on the best method for travel time
forecasting because all the above methods have both advantages and disadvantages. Also, most
of the results reported are data specific and cannot be used for choosing one single method that
can be applied in all situations. Thus, based on the data characteristics and the specific
application requirements, different methods are adopted in different studies.
The objective of the study in this chapter is to investigate the potential of a recently developed
pattern classification and regression technique called Support Vector Machines (SVM) for the
short-term prediction of travel time. A multilayer perceptron ANN model as well as historic and
real-time methods are also developed for comparison purposes. The analysis considered
forecasts ranging from a few minutes ahead up to an hour into the future. Up to 4 days data
were used for training the networks and 1 days data were left for cross validation to evaluate the
prediction errors. The data used were the estimated travel time obtained from the models
described in the previous chapter.
In the following sections, a brief discussion of the historic method, real-time method, ANN, and
SVM methods will be given followed by the implementation details for the application of travel
time prediction.
6.2 MODELS FOR TRAFFIC PREDICTION
6.2.1 Historic and Real-time Methods
The historic approach is based on the assumption that the historic profile can represent the traffic
characteristics for a given time of the day. Thus, a historical average value will be used for
predicting future values. This method can be valuable in the development of prediction models
since they explain a substantial amount of the variation in traffic over many days. However, for
the same reason, the reliability of the prediction is limited because of its implicit assumption that
the projection ratio remains constant (Hoffman and Janko 1990). Commuters, in general, have an
168
idea about the average traffic conditions and will be more interested in abnormal conditions.
That is, they are most interested in conditions when average values are not representative of the
current or future traffic conditions.
In the real-time approach, it is assumed that the travel time from the data available at the instant
when prediction is performed represents the future condition. This method can perform
reasonably well for the prediction into the immediate future under traffic flow conditions without
much variation (Thakuriah et al. 1992). More details of the historic and real-time methods and
the literature related to the application of these two methods on travel time prediction are
detailed in Chapter II.
6.2.2 ANN
ANN in the most general sense is an information processing structure whose design is motivated
by the design and functioning of human brains and components thereof. Thus, ANNs are
computing techniques, which can be trained to learn a complex relationship in a data set.
Basically it is a parallel computing system composed of interconnected simple processing nodes,
which are non-algorithmic, non- parametric, and intensely parallel (Kecman 2001; Haykin
1999).
Over the past several years, both in research and in practical applications, neural networks have
proven to be a very powerful method of mathematical modeling. In particular, neural networks
are well suited for pattern recognition and classification and to model nonlinear relationships
effectively. The use of neural networks has been proven successful in a number of applications
where the input-output mapping is highly non-linear and where the functional form of the
underlying distributions of the data is difficult to reach. ANNs are applied typically in areas such
as sensor processing, pattern recognition, and data analysis and control, which may require
information processing structures for which the algorithms or rules are not known.
One major application area of ANNs is forecasting. Several features of ANNs make them
valuable and attractive for a forecasting task. First, as opposed to the traditional model-based
methods, ANNs are data-driven, self-adaptive methods where very few a priori assumptions
169
about the models are needed. Also, because they learn from example data, they can capture
subtle functional relationships among the data even if the underlying relationship is unknown or
hard to describe. Thus ANNs are suited for problems whose solutions require knowledge that is
difficult to specify, but for which there are enough data available. This modeling approach with
the ability to learn from experience is very useful for many practical problems because it is often
easier to obtain data than to have a good theoretical understanding about the underlying laws
governing the system from which data are generated. These abilities of ANNs make them a good
tool for forecasting.
Neural networks have been widely used in transportation studies, and a review of these
applications can be found in Dougherty (1995), Faghri and Hua (1992), and Nakatsuji and
Shibuya (1998). The ANN model, with its learning capabilities, is suitable for solving complex
problems like prediction of traffic parameters. ANN models were chosen for traffic prediction
mainly because of their ability to take into account spatial and temporal information
simultaneously (Park and Rilett 1999). Some of the applications of ANN in the prediction of
speed, flow and occupancy in traffic forecasting can be found in Dougherty and Cobbett (1997),
Smith and Demetsky (1994), Park et al. (1998), Yun et al. (1998), Dia (2001), Mahalel and
Hakkert (1985), Mc Fadden et al. (2001), Nair et al. (2001), Xiao et al. (2003), Ishak et al.
(2003), Huang and Ran (2003), and Lee et al. (1998). The literature related to the use of ANN
for travel time prediction is reviewed in detail in Chapter II. A brief description of the ANN
technique and how it works is detailed below.
ANNs are the primary information processing structures of a technological discipline called
neuro-computing (Simon 1993). Neuro-computing is concerned with parallel, distributed,
adaptive information processing systems. The difference between neuro-computing and other
branches of computing is that in neuro-computing the algorithms are data driven. Rather than
the computer working through lists of instructions written by a programmer, it learns the
strengths of different relationships by being exposed to a set of examples of the behavior
concerned. By absorbing the pattern in the data, the network learns to generalize (Dougherty et
al. 1994).
170
There are two main groups of ANNs, namely continuous and discrete. As their names imply, the
former can take continuous valued input and output, whereas the latters input and output space
are discrete in nature. Different types of discrete/binary neural nets include hopfield net,
hamming net, carpenter/grossberg classifier, etc. The networks that can take continuous input
include perceptrons, multi-layer perceptrons, Kohonen self-organizing maps etc. (Lippman
1987). The details of most of these networks can be found in any of the standard textbooks on
ANN (Haykin 1994; Wasserman 1989; Dayhoff 1990; Beale and Jackson 1990).
Perceptrons are one of the most widely used ANNs, and since it is used in this dissertation also,
it is briefly discussed here. A simple perceptron consist of an input layer and an output layer.
Each neuron in the input layer will be connected to each neuron in the output layer, and these
connections between the input and output layers are adjusted as the network is trained. The
multi-layer perceptron (MLP) is based on the original simple perceptron model but with
additional hidden layers of neurons between the input and output layers (Lippman 1987).
Figure 6.1 shows a schematic diagram of a single perceptron, and Figure 6.2 show a multi-layer
perceptron.
Fig. 6. 1. Schematic diagram of a perceptron
(Source: Dougherty 1995)
171
Fig. 6. 2. Multi-layer perceptron
(Source: Dougherty 1995)
A neural network consists of the following elements (Dougherty et al., 1994):
Nodes: The basic building block of ANNs is the neuron, also known as a node or processing
element. A node takes in a set of inputs and computes an output according to a transfer function.
This is carried out by multiplying each input by a corresponding weight and then summing up all
these weighted inputs to determine the activation level of the neuron.
Connection weights: A neural network is composed of many nodes joined together by
connections, making the outputs of some nodes as the inputs to others. These connections are of
varying strength, and each connection has a weight associated with it.
Bias: The bias is a shifting function that is much like a weight, except that it has a constant input
of 1. The bias has the effect of lowering or increasing the net input of the activation function,
depending on whether it is negative or positive, respectively.
172
Transfer function/Activation function: Typically the output state of a single neuron can be
characterized as either on or off. A change from one state to the other is triggered when the
sum of the inputs (weighted by the strength of their respective connections) exceeds some
threshold. This threshold is usually represented by transfer functions such as sigmoid, logistic,
hyperbolic, linear, etc.
Layers: In theory, any topological arrangement of nodes and connections will be sufficient.
However, to make the visualization easier, it is usual to arrange the neurons in layers, with all
nodes in adjacent layers connected to each other. A neural network thus has an input layer, an
output layer, and one or more hidden layers.
Figure 6.3 below shows the model of a neuron with all the above elements. In the case of
perceptrons, an input vector p is transformed to an intermediate vector of hidden variables n
using an activation function f .
Fig. 6. 3. Model of a neuron
(Source: MathWorks, Inc. 2003)
The output of the j
th
node in a hidden layer can be mathematically represented as:
173
1 1
, 1
1
N
n f w p b
j i j i j
i
= +
=
| |
|
\ .
, (6.1)
where,
1
j
b = bias of the j
th
node in the hidden layer, and
1
, i j
w = weight of the connection between the j
th
node in the hidden layer and the i
th
input
node.
The superscript 1 denotes that the connections are between the input layer and the hidden layer.
The output vector a of the network is obtained from the vector of intermediate variables through
a similar transformation using the activation function as:
2 2
2
1
,
M
a f w n b
k l k l k
i
= +
=
| |
|
\ .
, (6.2)
where, the superscript 2 denotes that the connections are between the hidden layer and the output
layer. The training of an MLP network involves finding values of the connection weights that
minimize the error function between the actual network output and the corresponding target
values in the training set.
Thus, the performance of ANNs mainly depends on the training rules used. There are different
training rules available to train neural networks. These training rules specify an initial set of
weights (usually random in the range of [-0.5, 0.5]) and indicate how the weights should be
adapted during the training to improve the performance. In other words, the purpose of the
learning algorithm is to adjust the network so that the network produces the correct outputs for
the given set of examples. The learning methods are mainly categorized into supervised and
unsupervised. Supervised learning consists of training a network with a set of examples for
which the desired outputs are known. In each step, the calculated output is compared with the
desired output and a global error function is computed. The weights are then adjusted to reduce
the error, and this process occurs over and over as the weights are continually tweaked. The set
of data, which enables the training, is called the training set. During the training of a network
the same set of data is processed many times as the connection weights are refined. In
174
unsupervised learning, the training of the network is entirely input data-driven and no target
results for the input data vectors are provided.
The learning algorithm used can be non-constructive or constructive in nature. Non-constructive
means the algorithms for which the topology of the network has to be fixed apriori, while in
constructive ones the algorithm itself automatically determines the topology of the network.
Most of the learning algorithms used in ANN are non-constructive and supervised. Some of the
more popular non-constructive supervised learning algorithms are the perceptron learning
algorithm (Rosenblatt 1962) and the back propagation algorithm (Rumelhart et al. 1986)
Back propagation is one of the earliest, most widely used, and the most successful learning
algorithms. The present study also uses this algorithm, and so it will be described in more detail.
Back propagation is a supervised learning algorithm that provides a method to adjust the weights
in a multilayer network of connected processing units. The back propagation algorithm is an
extension of the least mean square (LMS) algorithm, which will minimize the errors between the
actual and the desired output.
A gradient based approach is used to minimize the error at the outputs in the back propagation
method. This is done by calculating the error function for each input pattern and then back
propagating the error from one layer to the previous one. The weights of a node are adjusted in
direct proportion to the error in the units to which it is connected. Any of the measures of error
such as the sum of the mean square error can be used for this purpose.
The steps involved in training a back propagation network are as follows:
1. Initialize weights;
2. Present input and desired output pair to the network;
3. Compute an output which emerges from the output layer (forward pass) using the
starting connection weights;
4. Compare this output with the value of output that was expected for this example by
computing an error function;
5. Update the connection weights by a small amount to displace the output towards the
desired output. This updating starts from the output layer and works backwards to adapt
175
weights. This is achieved by back propagating the global error function (backward
pass). The weights are updated as given in Equation 6.3:
( 1) ( ) w t w t x
ij ij j j
+ = + , (6.3)
where,
( ) w t
ij
= weight of the connection between the node i and node j at time t,
x
j
=
either output from node j or the input to the network,
= gain term, and
= error term for node j.
If node j is an output node, the error term is calculated as:
(1 )( ) y y d y
j j j j j
= , (6.4)
where,
d
j
= desired output of node j, and
y
j
= actual output of node j.
If node j is an internal hidden node, then the error term is:
(1 ) x x w
j j j k jk
k
= , (6.5)
where,
k = all nodes in the layers above node j.
6. Present the next input pattern;
7. Calculate total error by calculating the outputs for all training patterns; and
8. Adapt weights starting from the output layer.
If the training is successful, the squared difference reduces over time, as the algorithm
continuously iterates through the example data. The convergence can be checked by checking
the Root Mean Square (RMS) error values. The rate of convergence varies greatly, and there are
various methods to increase it, such as the use of variable momentum term and learning rate. The
176
variable momentum helps to update the weights during iteration as a function of the previous
weight. The learning rate is used to identify the step size to be used for updating the weights.
Hence, the selection of these two should be carried out judiciously. A large momentum and large
learning rates may lead to local minima, rather than the global minimum.
If a momentum term is added, Equation 6.3 becomes:
( 1) ( ) ( ( ) ( 1)) w t w t x w t w t
ij ij j j ij ij
+ = + + , (6.6)
where,
= the momentum term.
The main disadvantage of the back propagation algorithm is the step size problem to find the
global minimum in the overall error function. If the step size is too small, a local minimum will
be reached. If the step size is too large, the network may oscillate around the global minimum
without reaching it. Also this algorithm assumes that the changes in one weight have no effect on
the error gradient of other weights, which may not be true.
Several variations of the back propagation algorithm were developed to take into account the
above-mentioned drawbacks. Some of the examples include the quickprop algorithm, bold driver
method, Levenberg-Marquardt (LM) algorithm, etc. The LM algorithm appears to be the fastest
method for training moderate-sized feed forward neural networks (MathWorks, Inc. 2003). It
also has a very efficient MATLAB implementation because the solution of the matrix equation is
a built-in function, and hence its positive attributes become even more pronounced in a
MATLAB programming environment.
A major perceived disadvantage of ANN models is that, unlike other statistical models, they
provide no information about the relative importance of the various parameters (Dougherty et al.
1994). In ANNs, as the knowledge acquired during training is stored in an implicit manner, it is
very difficult to come up with a reasonable interpretation of the overall structure of the network.
This has led to the term black box, which many researchers use while referring to ANNs
behavior (Speed and Spiegelman 1998; Kecman 2001).
177
6.2.3 SVM
At present ANN is one of the most popular methods in use for the prediction of traffic
parameters. However, there are numerous practical shortcomings associated with conventional
ANNs including the difficulty in selecting the optimum number of hidden layers and hidden
neurons. Another common concern about ANN is the difficulty in providing a reasonable
interpretation of the overall design of the ANN network, as discussed previously. In response, a
number of modifications have been proposed to alleviate these shortcomings and some of them
were applied for the problem of travel time prediction by Park and Rilett (1998), Park et
al.(1999), Rilett and Park (2001), Kisgyorgy and Rilett (2002).
These shortcomings also led to explore alternative techniques for the prediction of traffic
parameters. In this dissertation one such alternative technique, namely SVM, is explored for the
prediction of travel time. The performance of SVM for the prediction of traffic speed is also
explored in this dissertation to check whether the results are data specific. Several studies
compared the performance of ANN and SVM in other applications. Gunn (2003) reported that
the traditional neural network approaches have limitations on generalization, giving rise to
models that may over-fit the training data. This deficiency is due to the optimization algorithm
used in ANN for the selection of parameters and the statistical measure used for selecting the
model (Gunn 2003). Valyon and Horvath (2002) also discussed the issue of poor generalization
and over fitting of ANN when presented with noisy training data. Samanta (2004) and Jack and
Nandi (2002) compared the performance of SVM and ANN for the application of gear fault
detection. Samata (2004) reported almost equal performance from both the methods, with
slightly better performance from SVM. However, Jack and Nandi (2002) reported that the
generalization from ANN was better than SVM.
The main difference between SVM and ANN is in the principle of risk minimization (RM). In
the case of SVM, the structural risk minimization (SRM) principle is used, which minimizes an
upper bound on the expected risk, whereas in ANN, traditional empirical risk minimization
(ERM) is used which minimizes the error in the training data. Training in SVM involves the
optimization of a convex cost function without any local minima to complicate the learning
178
process (Campbell 2002). The comparison between ANN and SVM was addressed by Kecman
(2001) as: NNs had a more heuristic origin. This does not mean that NNs are of lesser value for
not being developed from clear theoretical considerations. It just happens that their progress
followed an experimental path, with a theory being evolved in the course of time. SVMs had a
reverse development: from theory to implementation and experiments. It is interesting to note
that the very strong theoretical underpinnings of SVMs did not make them widely appreciated at
first.
SVM has been successfully applied to a number of applications ranging from particle
identification to database marketing (Campbell 2002). The approach is systematic and is
motivated by statistical learning theory. Support vector machines are constructed from a unique
learning algorithm that extracts training vectors that lie closest to the class boundary, and makes
use of them to construct a decision boundary that optimally separates the different classes of
data. These sets of training patterns which carry all relevant information about the classification
problem are called support vector (Hearst 1998). Thus, the model constructed has an explicit
dependence on a subset of the data points (the support vectors). SVMs represent novel learning
techniques that have been introduced in the framework of structural risk minimization (SRM).
Support vector algorithms can be used in the case of problems which are complex, yet the
method is simple enough to be analyzed mathematically, because it can be shown to correspond
to a linear method in a high dimensional feature space non-linearly related to input space. But, it
does not involve any computations in the high dimensional space. By the use of kernels, all the
necessary computations are performed directly in the input space.
In the case of a binary classification problem, SVM attempts to place a linear boundary between
two different classes, and orient it in such a way that the margin is maximized. In essence, the
learning problem is cast as a constrained nonlinear optimization problem. In the case of
classification of linearly separable data, the approach is to find among the hyperplanes the ones
that minimize the training error as shown in Figure 6.4. The SVM tries to orient the boundary
such that the distance between the boundary and the nearest data point in each class is maximal
as shown in Figure 6.5. The boundary is then placed in the middle of this margin between the
two points. The maximal margin is used for better classification of new data (generalize). The
179
nearest data points are used to define the margins and are known as support vectors. Once the
support vectors are selected, the rest of the data can be discarded (Samanta 2004). Thus, SVM
uses the strategy of keeping the error fixed and minimizing the confidence interval.
Fig. 6. 4. Separating hyperplanes
Fig. 6. 5. Support vectors with maximum margin boundary
Support Vectors
180
In the following section, a simple model of SVM for a classification problem of two separate
classes is illustrated. This model problem gives an overview of how SVM works. For more
detailed explanations any of the tutorials or standard textbooks can be referred (Vapnik 1998;
Burges 1998; Smola and Scholkopf 1998; Cristianini and Shawe-Taylor 2000; Kecman 2001).
Let the binary classification data points be
{ }
1 1
( , ), ....( , ) , , { 1,1}
l l n
D x y x y = x y (6.7)
where,
y = a binary value representing the two classes, and
x = the input vector.
As explained previously, there are a number of hyperplanes that can separate these two sets of
data and the problem is to find out the one with the largest margin. The SV classifiers are based
on the class of hyperplanes called boundary lines,
( . ) 0, ,
n
b b + = w x w , (6.8)
where,
w = the boundary,
x = the input vector, and
b = the scalar threshold.
To remove redundancy, the hyperplane is considered in canonical form defined by a unique pair
of values (w,b) at the margins satisfying the condition:
( . ) 1, b + = w x (6.9)
( . ) 1. b + = w x (6.10)
181
The quantities w and b will be scaled for this to be true, and therefore the support vectors
correspond to the extremities of the data. Thus, the decision function that can be used to classify
the data is:
(( . ) ) sign b = + y w x . (6.11)
Thus, a separating hyper plane in canonical form must satisfy the following constraints:
( . ) 1, 1, ... y b i l
i i
+ = (
w x (6.12)
where,
l = the number of training sets.
There can be many possible hyperplanes that can separate the training data into the two classes.
However, the optimal separating hyperplane is the unique one that not only separates the data
without error but also maximizes the margin. This means that it should maximize the distance
between the closest vectors in both classes to the hyperplane. This margin, is the sum of the
absolute distance between the hyperplane and the closest training data points in each class.
This distance d (w,b;x) of a point x from the hyperplane (w,b) is:
( . )
( , ; ) .
b
i
d w b x
+
=
w x
w
(6.13)
Thus, the sum of the absolute distance between the hyperplane and the closest training data
points in each class i and j, is calculated as given in Equation 6.14.
( . )
( . )
2
min min
b
b
j
i
+
+
= + =
w x
w x
w w w
. (6.14)
The optimal canonical hyperplane is the one that maximizes the above margin. Thus, the optimal
hyperplane, with the maximal margin of separation between the two classes can be uniquely
182
constructed by solving a constrained quadratic optimization whose solution is in terms of a
subset of training patterns that lie on the margin. These training patterns, called support vectors,
carry all relevant information about the classification problem.
In cases where the given classes cannot be linearly separated in the original input space, the
SVM first non-linearly transforms the original input space into a higher dimensional feature
space as shown in Figure 6.6. This transformation is carried out by using various non-linear
mappings: polynomial, sigmoidal, radial basis, etc. After the non-linear transformation step,
SVM finds a linear optimal separating hyperplane in this feature space (Kecman 2001; Campbell
2002). Thus, a non-linear function is learned by a linear learning machine in a kernel induced
feature space.
Fig. 6. 6. The kernel method for classification
In Support Vector regression (SVR) the basic idea is to map the data into a high-dimensional
feature space F via a non-linear mapping and to do linear regression in this space.
( ) ( . ( )) f x w b = + x , (6.15)
with : , ,
n
R F w F (6.16)
where,
b = the threshold.
183
Thus, linear regression in a high dimensional (feature) space corresponds to non-linear
regression in the low dimensional input space
n
R (Kecman 2001).
Overall, the construction of an SVM incorporates the idea of structural risk minimization.
According to this principle, the generalization error rate is upper bounded by a formula. By
minimizing this formula, an SVM can assure a known upper limit of the generalization error.
The primary advantage of the SVM method is that it automatically calculates the optimal (with
respect to generalization error) network structure for a given problem. In practice it means that a
lot of questions that had to be answered during the design of a traditional NN (e.g., the number
of neurons, the length and structure of the learning cycle, etc.) are eliminated. However, some
other questions arise, namely the proper selection of some other parameters that are used in
SVM. Such parameters are the loss function (), which determines the cost of deviation from the
training sample, the width of the Gaussian radial bases, and C, which is a trade off between the
minimization of the training error and the number of training points falling outside the error
boundary.
The drawbacks of the SVM method are addressed in some of the previous literature (Talukder
and Casasent 2001), and they are discussed briefly here. SVMs for classification involve
designing classifiers based on only a few so-called support vectors that lie close to the decision
boundary between the two classes. Linear SVM calculates a linear basis function that maximizes
the minimum distance between the classes. It is a linear combination of the training vectors.
Thus, when the training data set is large (>5000), the problem cannot be solved on a PC or
equivalent computer without data and problem decomposition. Another reported drawback of
SVM is that when the data classes overlap, a user-defined cost parameter to measure the amount
of misclassification is needed.
SVMs have been successfully applied to a number of applications ranging from face
identification to time series prediction. Some of the recent applications for the pattern
recognition case are: handwritten digit recognition (Cortes and Vapnik 1995; Scholkopf et al.
1995, 1996; Burges and Scholkopf 1997), object recognition (Blanz et al. 1996), speaker
identification (Schmidt 1996), face detection (Osuna et al. 1997a) and text recognition (Joachims
184
1997). For the regression estimation case, SVMs have been compared with time series prediction
sets (Muller et al. 1999; Mukherjee et al. 1997; Osuna et al. 1997b).
Reported applications of SVM in the field of transportation engineering are very few and are
discussed below. Yuan and Cheu (2003) used SVM for incident detection in an arterial network
(simulated) and a freeway network (actual). Two different non-linear kernels were trained and
tested. The method was compared to a multi-layer feed forward (MLF) ANN and a probabilistic
neural network. Based on their results they reported that SVM had a lower misclassification rate,
higher correct detection rate and slightly faster detection time than the multi-layer feed forward
neural network and probabilistic neural network models while using simulated data. While using
real data from the field, the detection performance was reported as equal to that of the MLF
network. Ding et al. (2002) proposed a traffic time series prediction based on the SVM theory.
Another reported application of SVM in the traffic engineering area is for vehicle detection (Sun
et al. 2002a, 2002b). Vanajakshi and Rilett (2004a) studied the application of SVR in traffic
speed prediction and compared the results with the performance of a multi-layer feed forward
neural network, and real-time and historic methods.
6.3 MODEL PARAMETERS
6.3.1 ANN
In this dissertation a multi-layer perceptron network with back propagation algorithm is used
because of its excellent predictive capacities as reported in previous studies for similar
applications (Smith and Demetsky 1994, 1997; Lee et al. 1998). In particular, multi-layer feed
forward neural networks that utilize a back propagation algorithm have been applied successfully
for forecasting traffic parameters (Mc Fadden et al. 2001; Huang and Ran 2003; Park and Rilett
1999).
In this dissertation, programs were developed in MATLAB for the neural network application.
For an application using ANN, first the network needs to be trained, where the weights and node
biases are calculated. For this the available data set is divided into a training set and testing set.
A training set is used to estimate the arc weights and node biases, and the testing data are used
185
for measuring the generalization ability of the network. The parameter selection was carried out
carefully to get the best results, the details of which are given below.
6.3.1.1 Number of Hidden Layers and Nodes
Because most theoretical works show that a single hidden layer is sufficient for ANNs to
approximate any complex non-linear function with any desired accuracy (Cybenko 1989; Hornik
et al. 1989) most of the forecasting applications use only one hidden layer. In this dissertation
also, a single hidden layer was selected. The issue of determining the optimal number of hidden
nodes was a more complicated one. Networks with fewer hidden nodes are preferable as they
usually have better generalization ability and less overfitting problems. But networks with too
few hidden nodes may not have enough power to model and learn the data. The most common
way of determining the optimal number of hidden nodes is by a sensitivity analysis. In this
dissertation, 10 neurons in the hidden layer were found to be the optimum.
6.3.1.2 Number of Input Nodes
The number of input nodes corresponds to the number of variables in the input vector used for
forecasting the future values. In the case of travel time prediction from loop detector data, either
the travel time can be estimated from the detector data variables like speed flow or occupancy
and then can be predicted to future time steps, or the detector data can be first predicted to future
time steps and then the corresponding travel time can be calculated. The first method was
adopted in this dissertation, since it gave better results in previous studies compared to the
indirect method of predicting speed, flow or occupancy and then calculating the corresponding
travel time (Kisgyorgy and Rilett 2002). A fixed number of lagged observations of the travel
times from the same link were selected as the input variables as in any time series forecasting
problems. Travel time information from the previous five time periods was selected as input,
based on previous studies (Park and Rilett 1998, 1999). Data normalization was performed to
standardize the data and to avoid computational problems. If the data are not normalized, inputs
with higher values will drive the training process, masking the contribution of lower valued
inputs (Desa 2001).
186
6.3.1.3 The Number of Output Nodes
The number of output nodes is relatively easy to specify as it is directly related to the problem
under study. For a time series forecasting problem, the number of output nodes often
corresponds to the forecasting horizon. The forecasting can be one-step-ahead or multi-step-
ahead prediction. In this dissertation multi-step-ahead forecasting was adopted and prediction up
to 30 time steps ahead were attempted to see how many time steps ahead the prediction
performance is better than the historic method. There are two ways of performing multi-step-
ahead forecasting. The first method is called the iterative forecasting method where the forecast
values are used as input for the next forecast. In this case, only one output node is necessary. The
second method, namely the direct method, is to let the neural network have several output nodes
to directly forecast each step into the future. Zhang et al. (1998) reported that the direct method
performed better than the iterative method whereas Weigend et al. (1992) reported that the direct
method performed worser than the iterative method. An advantage of using the direct method is
that the neural network can be built directly to forecast multi-step-ahead values. In the case of
iterative method, only a single function is used to predict one point each time and then iterates
this function on its own outputs to predict points in the future. As the forecast moves forward,
past observations are dropped. Instead, forecasts are used to forecast further future points.
Hence, it is typical that the longer the forecasting horizon, the less accurate the iterative method
is (Zhang et al. 1998). In this dissertation the direct method was chosen. The normalized output
values obtained from the ANN were transferred back to the actual values.
6.3.1.4 Interconnection of the Nodes
The network architecture is also characterized by the interconnection of the nodes in different
layers. For most forecasting applications the networks are fully connected to all the nodes in the
next higher layer and this approach was adopted in this dissertation.
6.3.1.5 Activation Function
Different activation functions such as sigmoid, logistic, hyperbolic, linear etc. have been used in
previous studies. In this dissertation, a logistic sigmoid activation function, which makes the
input and output spaces continuous, was used. Figure 6.7 shows the sigmoid activation function
187
with its mathematical form in Equation 6.17. This transfer function takes the input, which may
have any value between plus and minus infinity, and squashes the output into the range 0 to 1.
1
( )
1
f n a
n
e
= =
+
. (6.17)
Fig. 6. 7. Log-sigmoid transfer function
6.3.1.6 Training Algorithm
As discussed previously, the ANN training is an unconstrained non-linear minimization problem
where the weights are iteratively modified to minimize the overall error between desired output
and actual output. The most popular algorithm for this is the back propagation algorithm, which
requires the selection of a step size (learning rate). Small rates lead to a slow learning process
whereas large rates will lead to oscillations around a global minimum. To improve this, a
momentum parameter can be used, which selects the next weight change in more or less the
same direction as the previous one and hence reduces the oscillation effect of larger learning
rates. Standard back propagation with momentum is selected in most studies. The momentum
parameter and learning rate are usually selected through trial and error. However, there is no
consistent conclusion with regard to the best learning parameter combination.
Hence, more high performance algorithms that can converge from 10 to 100 times faster than the
conventional back propagation methods were developed (MathWorks, Inc. 2003). These faster
188
algorithms fall into two main categories. The first category uses heuristic techniques, which were
developed from an analysis of the performance of the standard steepest descent algorithm. One
heuristic modification is the momentum technique. The second category of fast algorithms uses
standard numerical optimization techniques. Conjugate gradient, quasi-Newton, and Levenberg-
Marquardt (LM) are some of the examples that fall in this category. Their faster convergence,
robustness and ability to find good local minima make them attractive in ANN training. In this
dissertation the LM method was adopted. However, its use is restricted to small networks (less
than a few hundred weights) with a single output layer (Statsoft Pacific Pty Ltd. 2004).
6.3.1.7 Training and Testing Data
The training sample is used to train the data and testing data to evaluate the forecasting ability of
the model. The main point here is to have both the training and testing data representative of the
population data. Most researchers select them based on the rule of 90% vs. 10%, 80% vs. 20%,
70% vs. 30% etc. In this dissertation, 80% vs. 20% was used for training and testing.
6.3.1.8 Performance Measures
Commonly adopted measures for checking the accuracy of the predicted data are the mean
absolute error, sum of squared error, root mean squared error, mean absolute percentage error
(MAPE) etc. In this dissertation MAPE was used as given in Equation 3.7.
6.3.2 SVM
The SVM toolbox for Matlab developed by Steve Gunn (2003) was used in the present study.
The parameters to be chosen are the loss function (), cost function C and kernel function (Gunn
2003).
6.3.2.1 Loss Function,
The choice of loss function determines the approximation error achieved, the training time, and
the complexity of the solutions; the last two depend directly on the number of support vectors. A
given training example becomes a support vector only if the approximation error on that example
189
is larger than . Therefore, the number of support vectors is a decreasing function of . In
practice one should ensure that the value of is sufficiently small so that the theoretical risk it
defines constitutes a reasonable measure of the approximation error. A robust compromise
suggested is the percentage of support vectors is equal to 50% (Mattera and Haykin 1999). A
larger value of can be utilized to reduce the training time and the network complexity. Thus,
the loss function determines the measure of accuracy of the result in the regression. Each choice
of loss function will result in a different overall strategy for performing regression. A loss
function that ignores errors that are within a certain distance of the true value is referred to as the
-insensitive loss function (Cristianini and Shawe-Taylor 2000). In this dissertation the -
insensitive loss function with an of 0.05 was selected.
6.3.2.2 Cost Function C
The choice of cost function involves a tradeoff between the minimization of the training error
and the number of training points falling outside the error boundary. C defines the range of the
values assumed by linear coefficients, and its choice affects the range of the possible output. If
the range of output is [0,B] and if C is very small compared to B, it would be impossible to
obtain a good approximation. A value of C that is very large compared to B will lead to
numerical instability. Therefore, a value of C that is approximately equal to B is suggested as a
robust choice (Mattera and Haykin 1999). Cost function C signifies the tolerance to
misclassification errors. If the value of C is high, the tolerance will be less (Talukder and
Casasent 2001). In this dissertation a C of 100 was selected by trial and error.
6.3.2.3 Kernel Function
The kernel function implicitly maps the input vector into the feature space and calculates their
inner product in the feature space. Any symmetric function such as linear spline, B-spline,
sigmoidal, polynomial, radial basis function etc., can be used as a kernel function. In the present
study, the SVR model used a radial basis kernel function. The parameter determines the width
of the Gaussian radial bases. A value of 15 was selected based on a preliminary analysis.
190
6.4 RESULTS
The results are illustrated using the data collected from the I-35 test bed shown in Figures 3.10a
and 3.10b. The ILD data from all 5 days from February 10 to February 14, 2003, are used.
Travel time was predicted into future time steps using the historic method, real-time method,
ANN method, and SVM method, and the results are compared. The analysis considered
prediction times ranging from 2 minute ahead up to 1 hour ahead. Up to 4 days data was used
for training, and 1 days data was left for cross validation and to evaluate the prediction errors.
First, the 2-minute aggregated data were normalized based on the range of the travel time values.
The input and output data were selected as the travel time for the five previous time step values
and the travel time for the next time step value, respectively. Thus, for a 3-day data for training
will have a training matrix of size 2155 5 and a testing matrix of size 2155 1. Because the
data was grouped in 2-minute intervals, five time steps correspond to a 10-minute interval. Thus,
the prediction was based on the previous 10-minute travel time values. The model then predicts
the next 2-minute travel time as shown in the following equation:
T(k+t) = f (T(k-4 t), T(k-3 t), T(k-2 t), T(k-t), T(k)) , (6.18)
where,
t = time interval,
T = travel time, and
k = current time interval.
The prediction was subsequently carried out to 4 minutes, 6 minutes, etc., up to 1 hour ahead.
The training data were varied from 1 days data to four days data, and testing was done for a
separate day.
191
6.4.1 All Day Data from Link 1
The travel time from link 1 on all five days was analyzed first. The training was carried out
based on data from February 10 to 13, 2003, (Monday to Thursday). The data from Friday,
February 14 was kept for validation. Figure 6.8 shows the travel time distribution on all 5 days
on link 1. It can be seen that the Tuesday, February 11, 2003, data is showing less magnitude
throughout the day compared to all the other days. Also, it can be seen that in February 12,
2003
data the peak in the travel time is small compared to other days. All the other days shows similar
travel time values. The MAD as given in Equation 5.27 was calculated between each days data
with Friday data. The MAD came to be 3.85, 7.85, 4.87, and 3.99 for Monday, Tuesday,
Wednesday, and Thursday data, respectively.
First the historic method, which assumes that the historic average represents the future travel
time, was used. The results obtained using a single days data (Monday data alone) for prediction
is shown in Figure 6.9. In this case, since only the Monday data were used, the historic value
equals the Monday travel time. It can be seen that the patterns of travel time from these two days
are very similar, with an MAD of 3.85, except for the magnitude at the peak period. The MAPE
between the predicted travel time and the actual travel time was calculated for the 24-hour period
and was 9.36%.
1
9
2
0
20
40
60
80
100
120
140
0
:
0
2
:
0
0
0
:
4
0
:
0
0
1
:
1
8
:
0
0
1
:
5
6
:
0
0
2
:
3
4
:
0
0
3
:
1
2
:
0
0
3
:
5
0
:
0
0
4
:
2
8
:
0
0
5
:
0
6
:
0
0
5
:
4
4
:
0
0
6
:
2
2
:
0
0
7
:
0
0
:
0
0
7
:
3
8
:
0
0
8
:
1
6
:
0
0
8
:
5
4
:
0
0
9
:
3
2
:
0
0
1
0
:
1
0
:
0
0
1
0
:
4
8
:
0
0
1
1
:
2
6
:
0
0
1
2
:
0
4
:
0
0
1
2
:
4
2
:
0
0
1
3
:
2
0
:
0
0
1
3
:
5
8
:
0
0
1
4
:
3
6
:
0
0
1
5
:
1
4
:
0
0
1
5
:
5
2
:
0
0
1
6
:
3
0
:
0
0
1
7
:
0
8
:
0
0
1
7
:
4
6
:
0
0
1
8
:
2
4
:
0
0
1
9
:
0
2
:
0
0
1
9
:
4
0
:
0
0
2
0
:
1
8
:
0
0
2
0
:
5
6
:
0
0
2
1
:
3
4
:
0
0
2
2
:
1
2
:
0
0
2
2
:
5
0
:
0
0
2
3
:
2
8
:
0
0
Time (hh:mm:ss)
E
s
t
i
m
a
t
e
d
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Monday
Tuesday
Wednesday
Thursday
Friday
Fig. 6. 8. Travel time distribution on link 1 on all 5 days
193
0
20
40
60
80
100
120
140
0
:
0
2
:
0
0
1
:
1
6
:
0
0
2
:
3
0
:
0
0
3
:
4
4
:
0
0
4
:
5
8
:
0
0
6
:
1
2
:
0
0
7
:
2
6
:
0
0
8
:
4
0
:
0
0
9
:
5
4
:
0
0
1
1
:
0
8
:
0
0
1
2
:
2
2
:
0
0
1
3
:
3
6
:
0
0
1
4
:
5
0
:
0
0
1
6
:
0
4
:
0
0
1
7
:
1
8
:
0
0
1
8
:
3
2
:
0
0
1
9
:
4
6
:
0
0
2
1
:
0
0
:
0
0
2
2
:
1
4
:
0
0
2
3
:
2
8
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Prediction by Historic Method
Actual Testing Data
Fig. 6. 9. Travel time predicted by historic method for link 1 on February 10, 2003
Figure 6.10 shows the predicted travel time using the real-time method, which assumes that the
current travel time is going to continue to the future time step using a single days data for
prediction as detailed in 2.4.1. As expected, the predicted travel time leads the actual travel time
by the 2-minute prediction interval. The corresponding MAPE for the whole 24-hour period
came to be 9.66 %.
194
0
20
40
60
80
100
120
140
0
:
0
2
:
0
0
1
:
2
6
:
0
0
2
:
5
0
:
0
0
4
:
1
4
:
0
0
5
:
3
8
:
0
0
7
:
0
2
:
0
0
8
:
2
6
:
0
0
9
:
5
0
:
0
0
1
1
:
1
4
:
0
0
1
2
:
3
8
:
0
0
1
4
:
0
2
:
0
0
1
5
:
2
6
:
0
0
1
6
:
5
0
:
0
0
1
8
:
1
4
:
0
0
1
9
:
3
8
:
0
0
2
1
:
0
2
:
0
0
2
2
:
2
6
:
0
0
2
3
:
5
0
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Actual Testing Data
Prediction by Real Time Method
Fig. 6. 10. Travel time predicted by real-time method for link 1 on February 10, 2003
Figures 6.11 and 6.12 show the predicted travel time using ANN method and SVM method using
a single days data for training. It can be seen that the travel time predicted by both SVM and
ANN were able to follow the trends in the actual data, with MAPE values of 8.64% and 7.38%
respectively.
195
0
20
40
60
80
100
120
140
0
:
0
2
:
0
0
1
:
2
6
:
0
0
2
:
5
0
:
0
0
4
:
1
4
:
0
0
5
:
3
8
:
0
0
7
:
0
2
:
0
0
8
:
2
6
:
0
0
9
:
5
0
:
0
0
1
1
:
1
4
:
0
0
1
2
:
3
8
:
0
0
1
4
:
0
2
:
0
0
1
5
:
2
6
:
0
0
1
6
:
5
0
:
0
0
1
8
:
1
4
:
0
0
1
9
:
3
8
:
0
0
2
1
:
0
2
:
0
0
2
2
:
2
6
:
0
0
2
3
:
5
0
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Prediction by ANN
Actual Testing Data
Fig. 6. 11. Travel time predicted by ANN method for link 1 on February 10, 2003
0
20
40
60
80
100
120
140
0
:
0
2
:
0
0
1
:
2
4
:
0
0
2
:
4
6
:
0
0
4
:
0
8
:
0
0
5
:
3
0
:
0
0
6
:
5
2
:
0
0
8
:
1
4
:
0
0
9
:
3
6
:
0
0
1
0
:
5
8
:
0
0
1
2
:
2
0
:
0
0
1
3
:
4
2
:
0
0
1
5
:
0
4
:
0
0
1
6
:
2
6
:
0
0
1
7
:
4
8
:
0
0
1
9
:
1
0
:
0
0
2
0
:
3
2
:
0
0
2
1
:
5
4
:
0
0
2
3
:
1
6
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Actual Testing Data
Prediction by SVM
Fig. 6. 12. Travel time predicted by SVM method for link 1 on February 10, 2003
196
An enlarged view of the actual travel time values and the corresponding predicted values for a 2-
hour evening peak and off-peak for a 2-minute ahead prediction using all the four methods is
shown in Figure 6.13. The training data used in this particular example is from Monday data
alone. This figure clearly illustrates the historic method with Monday data alone for training
performing very poorly for the prediction of peak period. Also it can be seen that in the case of
the real-time method the predicted travel time leads the actual travel time by the 2-minutes
prediction interval. The SVM and ANN followed the trends in the actual travel time. The MAPE
for this 2 hour prediction was calculated separately and were 26.74%, 15.90%, 12.18%, and
11.35% for the historic method, real-time method, ANN, and SVM respectively.
To illustrate the technique further, the travel time prediction was extended to 4-minutes ahead, 6-
minutes ahead etc., up to 1 hour into the future for the Friday data. The prediction was carried
out for the full 24-hour data. The performance measure used was Mean Absolute Percentage
Error (MAPE). This was calculated based on the difference between the predicted travel time by
each of the methods and the actual travel time of Friday for the 24-hour period.
Figure 6.14 shows the error in prediction when 1 days data (Monday) was used for training the
network and Friday travel time was predicted. MAPE values are shown from 2-minutes ahead up
to 1 hour ahead prediction. The MAPE for the historic method, the real-time method, the ANN,
and SVM methods are shown in this figure. It can be seen that the historic method outperformed
the real-time method throughout the prediction. SVM performed better than historic only up to 6
minutes of prediction and ANN performed better up to 10 minutes of prediction ahead. Thus, it
can be seen that historic method outperformed the other methods in this case after 10 minutes of
prediction time ahead, which can be explained based on Figure 6.8. As it can be seen in Figure
6.8, both the training data (Monday) and testing data (Friday) had similar pattern with an MAD
of 3.85. It can also be observed that ANN performed better than the SVM in this case.
1
9
7
0
20
40
60
80
100
120
140
1
7
:
0
0
:
0
0
1
7
:
0
4
:
0
0
1
7
:
0
8
:
0
0
1
7
:
1
2
:
0
0
1
7
:
1
6
:
0
0
1
7
:
2
0
:
0
0
1
7
:
2
4
:
0
0
1
7
:
2
8
:
0
0
1
7
:
3
2
:
0
0
1
7
:
3
6
:
0
0
1
7
:
4
0
:
0
0
1
7
:
4
4
:
0
0
1
7
:
4
8
:
0
0
1
7
:
5
2
:
0
0
1
7
:
5
6
:
0
0
1
8
:
0
0
:
0
0
1
8
:
0
4
:
0
0
1
8
:
0
8
:
0
0
1
8
:
1
2
:
0
0
1
8
:
1
6
:
0
0
1
8
:
2
0
:
0
0
1
8
:
2
4
:
0
0
1
8
:
2
8
:
0
0
1
8
:
3
2
:
0
0
1
8
:
3
6
:
0
0
1
8
:
4
0
:
0
0
1
8
:
4
4
:
0
0
1
8
:
4
8
:
0
0
1
8
:
5
2
:
0
0
1
8
:
5
6
:
0
0
1
9
:
0
0
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
s
)
ANN
Actual Testing Data
Real Time Method
SVM
Historical Method
Fig. 6. 13. Comparison of the predicted values with 1 day training data for link 1 on February 10, 2003
198
0
2
4
6
8
10
12
14
16
0
:
0
2
:
0
0
0
:
0
6
:
0
0
0
:
1
0
:
0
0
0
:
1
4
:
0
0
0
:
1
8
:
0
0
0
:
2
2
:
0
0
0
:
2
6
:
0
0
0
:
3
0
:
0
0
0
:
3
4
:
0
0
0
:
3
8
:
0
0
0
:
4
2
:
0
0
0
:
4
6
:
0
0
0
:
5
0
:
0
0
0
:
5
4
:
0
0
0
:
5
8
:
0
0
Prediction Time Ahead (hh:mm:ss)
M
A
P
E
Historic
Real Time
ANN
SVM
Fig. 6. 14. MAPE for prediction using 1 days data for training
Figure 6.15 shows the MAPE values when 2 days data were used for training (Monday and
Tuesday) and when the 24-hour Friday data was predicted.
0
2
4
6
8
10
12
14
16
0
:
0
2
:
0
0
0
:
0
6
:
0
0
0
:
1
0
:
0
0
0
:
1
4
:
0
0
0
:
1
8
:
0
0
0
:
2
2
:
0
0
0
:
2
6
:
0
0
0
:
3
0
:
0
0
0
:
3
4
:
0
0
0
:
3
8
:
0
0
0
:
4
2
:
0
0
0
:
4
6
:
0
0
0
:
5
0
:
0
0
0
:
5
4
:
0
0
0
:
5
8
:
0
0
Prediction Time Ahead (hh:mm:ss)
M
A
P
E
Historic
Real Time
ANN
SVM
Fig. 6. 15. MAPE for prediction using 2 days data for training
199
Comparing Figure 6.14 with Figure 6.15, it is seen that there is an increase in the prediction error
using the historic method from 9.3% to 14.8% when the training data were changed from
Monday data alone to Monday and Tuesday data together. This is due to the fact that the
Tuesday travel time data differed in magnitude when compared to Monday and Friday data and
this is illustrated in Figure 6.16. Figure 6.16 shows the travel time values on Monday, Tuesday
and Friday for a 5-hour period from 11:00:00 to 16:00:00. It can be seen that the Monday and
Friday data have very similar trends throughout. The MAD of 3.85 between Monday and Friday
data opposed to the MAD of 7.84 between Tuesday and Friday data for the 24-hour period also
illustrate this fact.
0
5
10
15
20
25
30
35
40
1
1
:
0
0
:
0
0
1
1
:
1
6
:
0
0
1
1
:
3
2
:
0
0
1
1
:
4
8
:
0
0
1
2
:
0
4
:
0
0
1
2
:
2
0
:
0
0
1
2
:
3
6
:
0
0
1
2
:
5
2
:
0
0
1
3
:
0
8
:
0
0
1
3
:
2
4
:
0
0
1
3
:
4
0
:
0
0
1
3
:
5
6
:
0
0
1
4
:
1
2
:
0
0
1
4
:
2
8
:
0
0
1
4
:
4
4
:
0
0
1
5
:
0
0
:
0
0
1
5
:
1
6
:
0
0
1
5
:
3
2
:
0
0
1
5
:
4
8
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
T
i
m
e
(
s
e
c
)
Monday
Tuesday
Friday
Fig. 6. 16. Travel time pattern of Monday, Tuesday, and Friday
This difference of Tuesday data makes the training data different from testing data, reducing the
performance of the historic method. It can be seen that this reduced the performance of ANN
also. The SVM method out-performed all other methods in this case.
Figure 6.17 shows similar results when 3 days data were used for training (Monday, Tuesday,
and Wednesday) and the Friday data was predicted. It can be seen that with more data being
200
added to the training set, the effect of Tuesday data is declining. As in the previous case, here
also the SVM performed better than all the other methods. Up to 30 minutes of prediction ahead,
the other methods performed better than the historic method.
0
2
4
6
8
10
12
14
16
0
:
0
2
:
0
0
0
:
0
6
:
0
0
0
:
1
0
:
0
0
0
:
1
4
:
0
0
0
:
1
8
:
0
0
0
:
2
2
:
0
0
0
:
2
6
:
0
0
0
:
3
0
:
0
0
0
:
3
4
:
0
0
0
:
3
8
:
0
0
0
:
4
2
:
0
0
0
:
4
6
:
0
0
0
:
5
0
:
0
0
0
:
5
4
:
0
0
0
:
5
8
:
0
0
Prediction Time Ahead (hh:mm:ss)
M
A
P
E
Historic
Real Time
ANN
SVM
Fig. 6. 17. MAPE for prediction using 3 days data for training
Figure 6.18 shows similar result when 4 days data were used for training (Monday, Tuesday,
Wednesday, and Thursday) and the Friday data was predicted. Here also for approximately up to
30 minutes of prediction time, the other methods performed better than the historic method.
201
0
2
4
6
8
10
12
14
16
0
:
0
2
:
0
0
0
:
0
6
:
0
0
0
:
1
0
:
0
0
0
:
1
4
:
0
0
0
:
1
8
:
0
0
0
:
2
2
:
0
0
0
:
2
6
:
0
0
0
:
3
0
:
0
0
0
:
3
4
:
0
0
0
:
3
8
:
0
0
0
:
4
2
:
0
0
0
:
4
6
:
0
0
0
:
5
0
:
0
0
0
:
5
4
:
0
0
0
:
5
8
:
0
0
Prediction Time Ahead (hh:mm:ss)
M
A
P
E
Historic
Real Time
ANN
SVM
Fig. 6. 18. MAPE for prediction using 4 days data for training
As more and more data are added to the training set, the influence of the Tuesday data declines
and this is reflected in the reduction in error for the historic and ANN methods. Comparison of
the performance of ANN with SVM for prediction using 4 days data for training shows a slight
advantage to SVM. This performance of SVM can be explained based on the inherent nature of
the SVM training process. Once SVM chooses the data points, which can represent the input
data (support vectors), its performance is more or less independent of the amount of training
202
data. Hence, if the support vectors selected from the training data are not affected, SVMs
performance may not get affected by the amount of training data. However, in the case of ANN,
the network can learn more about the data as the amount of training data increases, and this
changes the results for the better up to an extent.
Overall, the performance of both SVM and ANN were comparable to each other. The historic
method is a better choice when the training data and the testing data have the same magnitude as
well as the same pattern. SVM becomes a better choice for the short-term prediction of travel
time if the training data have more variations compared to the testing data. Also, it was found
that the influence of the amount of training data used is greater on the ANN method than on the
SVM method. To check the validity of these conclusions, travel time prediction was carried out
for link 2 where all days travel time had similar trends.
6.4.2 All Day Data from Link 2
Data from February 10
to February 14, 2003 were used for link 2 also. The travel time from all
five days for link 2 is shown in Figure 6.19, and it is seen that all days have similar trends in the
data except Wednesday, where the peak was relatively low. The MAD between Friday data and
Monday, Tuesday, Wednesday and Thursday data were 4.2, 4.2, 5.7, and 4.2, respectively.
2
0
3
0
20
40
60
80
100
120
0
:
0
2
:
0
0
0
:
3
8
:
0
0
1
:
1
4
:
0
0
1
:
5
0
:
0
0
2
:
2
6
:
0
0
3
:
0
2
:
0
0
3
:
3
8
:
0
0
4
:
1
4
:
0
0
4
:
5
0
:
0
0
5
:
2
6
:
0
0
6
:
0
2
:
0
0
6
:
3
8
:
0
0
7
:
1
4
:
0
0
7
:
5
0
:
0
0
8
:
2
6
:
0
0
9
:
0
2
:
0
0
9
:
3
8
:
0
0
1
0
:
1
4
:
0
0
1
0
:
5
0
:
0
0
1
1
:
2
6
:
0
0
1
2
:
0
2
:
0
0
1
2
:
3
8
:
0
0
1
3
:
1
4
:
0
0
1
3
:
5
0
:
0
0
1
4
:
2
6
:
0
0
1
5
:
0
2
:
0
0
1
5
:
3
8
:
0
0
1
6
:
1
4
:
0
0
1
6
:
5
0
:
0
0
1
7
:
2
6
:
0
0
1
8
:
0
2
:
0
0
1
8
:
3
8
:
0
0
1
9
:
1
4
:
0
0
1
9
:
5
0
:
0
0
2
0
:
2
6
:
0
0
2
1
:
0
2
:
0
0
2
1
:
3
8
:
0
0
2
2
:
1
4
:
0
0
2
2
:
5
0
:
0
0
2
3
:
2
6
:
0
0
Time (hh:mm:ss)
T
r
a
v
e
l
t
i
m
e
(
s
e
c
)
Monday
Tuesday
Wednesday
Thursday
Friday
Fig. 6. 19. Travel time distribution for link 2 from February 10 to February 14, 2003
204
The prediction interval was varied in this case also from 2 minutes ahead up to 1 hour ahead, and
the MAPE at each time step was calculated as in the case of link 1. Figure 6.20 shows the MAPE
values for the prediction of 24-hour Friday data when only 1 days data were used for training.
As expected, the error from the historic method is very small, and the performance of ANN and
SVM are similar, with a slight advantage to ANN.
0
2
4
6
8
10
12
14
16
18
0
:
0
2
:
0
0
0
:
0
6
:
0
0
0
:
1
0
:
0
0
0
:
1
4
:
0
0
0
:
1
8
:
0
0
0
:
2
2
:
0
0
0
:
2
6
:
0
0
0
:
3
0
:
0
0
0
:
3
4
:
0
0
0
:
3
8
:
0
0
0
:
4
2
:
0
0
0
:
4
6
:
0
0
0
:
5
0
:
0
0
0
:
5
4
:
0
0
0
:
5
8
:
0
0
Prediction Time Ahead (hh:mm:ss)
M
A
P
E
Historic
Real Time
ANN
SVM
Fig. 6. 20. MAPE for prediction using 1 days data for training
Figure 6.21 shows the MAPE values when 2 days data were used for training. As can be seen
from Figure 6.19, Tuesday data also represent the Friday data (testing data) very well with MAD
of 4.2. Hence the prediction result remains same as in Figure 6.20, with historic method
performing better than the other methods after 10 minutes of prediction ahead. Also ANN
outperforms SVM in this case also.
205
0
2
4
6
8
10
12
14
16
18
0
:
0
2
:
0
0
0
:
0
6
:
0
0
0
:
1
0
:
0
0
0
:
1
4
:
0
0
0
:
1
8
:
0
0
0
:
2
2
:
0
0
0
:
2
6
:
0
0
0
:
3
0
:
0
0
0
:
3
4
:
0
0
0
:
3
8
:
0
0
0
:
4
2
:
0
0
0
:
4
6
:
0
0
0
:
5
0
:
0
0
0
:
5
4
:
0
0
0
:
5
8
:
0
0
Prediction Time Ahead (hh:mm:ss)
M
A
P
E
Historic
Real Time
ANN
SVM
Fig. 6. 21. MAPE for prediction using 2 days data for training
Figure 6.22 shows the MAPE values when 3 days data were used for training. Here it can be
seen that SVM had a slight advantage over ANN with the MAPE values being smaller than that
of ANN. This may be due to the small variation in the Wednesday data from other days.
0
2
4
6
8
10
12
14
16
18
0
:
0
2
:
0
0
0
:
0
6
:
0
0
0
:
1
0
:
0
0
0
:
1
4
:
0
0
0
:
1
8
:
0
0
0
:
2
2
:
0
0
0
:
2
6
:
0
0
0
:
3
0
:
0
0
0
:
3
4
:
0
0
0
:
3
8
:
0
0
0
:
4
2
:
0
0
0
:
4
6
:
0
0
0
:
5
0
:
0
0
0
:
5
4
:
0
0
0
:
5
8
:
0
0
Prediction Time Ahead (hh:mm:ss)
M
A
P
E
Historic
Real Time
ANN
SVM
Fig. 6. 22. MAPE for prediction using 3 days data for training
206
Figure 6.23 shows the MAPE values when 4 days data were used for training. As expected,
when the travel time data do not have much variation, historic and real-time method are able to
predict the future conditions well. The performance of ANN and SVM are comparable, with
ANN being slightly better in the case where the data did not have much variation.
0
2
4
6
8
10
12
14
16
18
0
:
0
2
:
0
0
0
:
0
6
:
0
0
0
:
1
0
:
0
0
0
:
1
4
:
0
0
0
:
1
8
:
0
0
0
:
2
2
:
0
0
0
:
2
6
:
0
0
0
:
3
0
:
0
0
0
:
3
4
:
0
0
0
:
3
8
:
0
0
0
:
4
2
:
0
0
0
:
4
6
:
0
0
0
:
5
0
:
0
0
0
:
5
4
:
0
0
0
:
5
8
:
0
0
Prediction Time Ahead (hh:mm:ss)
M
A
P
E
Historic
Real Time
ANN
SVM
Fig. 6. 23. MAPE for prediction using 4 days data for training
It should also be noted that both ANN and SVM performed better than the real-time and historic
methods when the testing data had variations from the training data as indicated by a bigger
magnitude of MAD between the training and testing data. From the results obtained for the two
links described above, one can see that SVM has a better predictive capability when the training
data has lot of variations. As discussed earlier, the accuracy of the SVM prediction does not
depend on the amount of data used once the support vectors are selected. Hence, in scenarios
where the training data has variations (as in the example of link 1) and the availability of data is
limited, SVM will be a better choice than ANN. On the other hand, in cases where large amounts
of data are available and the training and testing data have similar trends, ANN is a better
predictive algorithm. Links 3 and 4 had very similar trends in the travel time values for all 5
days, similar to link 2, and hence the results obtained are not repeated here.
207
6.4.3 Speed Prediction
The conclusions drawn from the results of travel time prediction need to be checked to find out
whether they are data specific. To ensure that the above conclusions relate as well to other traffic
parameters, investigation related to the prediction of speed was also carried out. Field data from
detector number 159.998, as shown in Figure 3.10, are analyzed from August 4 to 8, 2003. The
speed distribution for the 5 days is shown in Figure 6.24.
From Figure 6.24 it can be seen that the speed data pertaining to all days except Monday have
similar trends. Monday data do not show an evening peak but show a morning peak. The MAD
was calculated for each of the days with the Friday data and was 3.5, 2.9, 3.3, and 2.4,
respectively for the 24-hour period. This one week data was used for predicting speed on Friday.
The MAPE obtained is plotted in Figure 6.25 when the Monday data alone are used for training
the network.
2
0
8
0
10
20
30
40
50
60
70
80
0
:
0
2
:
0
0
0
:
4
6
:
0
0
1
:
3
0
:
0
0
2
:
1
4
:
0
0
2
:
5
8
:
0
0
3
:
4
2
:
0
0
4
:
2
6
:
0
0
5
:
1
0
:
0
0
5
:
5
4
:
0
0
6
:
3
8
:
0
0
7
:
2
2
:
0
0
8
:
0
6
:
0
0
8
:
5
0
:
0
0
9
:
3
4
:
0
0
1
0
:
1
8
:
0
0
1
1
:
0
2
:
0
0
1
1
:
4
6
:
0
0
1
2
:
3
0
:
0
0
1
3
:
1
4
:
0
0
1
3
:
5
8
:
0
0
1
4
:
4
2
:
0
0
1
5
:
2
6
:
0
0
1
6
:
1
0
:
0
0
1
6
:
5
4
:
0
0
1
7
:
3
8
:
0
0
1
8
:
2
2
:
0
0
1
9
:
0
6
:
0
0
1
9
:
5
0
:
0
0
2
0
:
3
4
:
0
0
2
1
:
1
8
:
0
0
2
2
:
0
2
:
0
0
2
2
:
4
6
:
0
0
2
3
:
3
0
:
0
0
Time (hh:mm:ss)
S
p
e
e
d
(
m
i
l
e
s
/
h
r
)
Monday
Tuesday
Wednesday
Thursday
Friday
Fig. 6. 24. Speed distribution at 159.998 for 1 week from August 4 to 8, 2003
209
0
2
4
6
8
10
12
14
16
13579
1
1
1
3
1
5
1
7
1
9
2
1
2
3
2
5
2
7
2
9
Prediction Time Ahead (hh:mm:ss)
M
A
P
E
Historic
Real time
ANN
SVM
Fig. 6. 25. Performance comparison using 1 days data for training
It can be seen that SVM performs better than all the other methods in this case. As discussed
previously, the training data used was the one that was having the maximum difference from the
testing data. Figure 6.26 show the MAPE when 2 days data were used for training.
0
2
4
6
8
10
12
14
16
0
:
0
2
:
0
0
0
:
0
6
:
0
0
0
:
1
0
:
0
0
0
:
1
4
:
0
0
0
:
1
8
:
0
0
0
:
2
2
:
0
0
0
:
2
6
:
0
0
0
:
3
0
:
0
0
0
:
3
4
:
0
0
0
:
3
8
:
0
0
0
:
4
2
:
0
0
0
:
4
6
:
0
0
0
:
5
0
:
0
0
0
:
5
4
:
0
0
0
:
5
8
:
0
0
Prediction Time Ahead (hh:mm:ss)
M
A
P
E
Historic
Real time
ANN
SVM
Fig. 6. 26. Performance comparison with 2 days training data
210
It can be noted that Tuesday data were in agreement with the Friday data, and hence the effect of
variation declines. The results obtained with 3 days data for prediction are shown in Figure
6.27.
0
2
4
6
8
10
12
14
16
0
:
0
2
:
0
0
0
:
0
6
:
0
0
0
:
1
0
:
0
0
0
:
1
4
:
0
0
0
:
1
8
:
0
0
0
:
2
2
:
0
0
0
:
2
6
:
0
0
0
:
3
0
:
0
0
0
:
3
4
:
0
0
0
:
3
8
:
0
0
0
:
4
2
:
0
0
0
:
4
6
:
0
0
0
:
5
0
:
0
0
0
:
5
4
:
0
0
0
:
5
8
:
0
0
Prediction Time Ahead (hh:mm:ss)
M
A
P
E
Historic
Real time
ANN
SVM
Fig. 6. 27. Performance comparison with 3 days data for training
Figure 6.27 shows that ANN started performing better than all the other methods as the quality
and quantity of the training data are increased. The errors of historic method as well as the ANN
method declines as more data are included in the training set, which reduces the variation from
the testing data. Results obtained from 4 days of data for training are shown in Figure 6.28.
211
0
2
4
6
8
10
12
14
16
0
:
0
2
:
0
0
0
:
0
6
:
0
0
0
:
1
0
:
0
0
0
:
1
4
:
0
0
0
:
1
8
:
0
0
0
:
2
2
:
0
0
0
:
2
6
:
0
0
0
:
3
0
:
0
0
0
:
3
4
:
0
0
0
:
3
8
:
0
0
0
:
4
2
:
0
0
0
:
4
6
:
0
0
0
:
5
0
:
0
0
0
:
5
4
:
0
0
0
:
5
8
:
0
0
Prediction Time Ahead (hh:mm:ss)
M
A
P
E
Historic
Real time
ANN
SVM
Fig. 6. 28. Performance comparison with 4 days data for training
From the above figures (Figures 6.25 to 6.28), the conclusions drawn from the travel time
estimation have been confirmed. The results obtained confirmed that ANN and SVM are
powerful tools with performance better than the real-time or historic methods under varying
traffic conditions. Also, it was found that SVM is a powerful tool for the prediction of traffic
parameters with the performance comparable to ANN under most of the situations. When the
training data were non-representative of test data, SVM outperformed ANN, showing that it can
be considered as a viable alternative to ANN under situations with a lesser quantity of quality
training data.
6.5 CONCLUDING REMARKS
This chapter presented a comparison of the performance of two machine-learning techniques,
namely, ANN and SVM for the short-term prediction of travel time. The ANN model used is a
multi-layer feed forward neural network and the SVM model used was a support vector
regression with a radial basis kernel function. The analysis considered forecasts ranging from 2
minutes ahead up to 1 hour into the future. One days data was left for crossvalidation to
212
evaluate the prediction errors. The training data were varied from 1 days data to 4 days data.
The results were compared with historic and real-time approach results.
Results of this comparison indicate that the explanatory power of SVR is comparable to ANN.
Also, SVR performed better than ANN when the training data had more variations. To check
whether the results were data specific, speed predictions were also carried out using the field
data. Based on the investigation conducted in this dissertation, it was found that SVR is a viable
alternative to ANN for short-term prediction, especially when the training data have variations
and the amount of training data is less. The performance of ANN depends largely on the amount
of data available for training the network. Thus, in situations where there is less available data,
there is a need for an alternative method for prediction. Due to the characteristic nature of the
SVM method, the performance of SVM is almost independent of the number of data available,
once the network chooses the support vectors. Hence, SVM can be used for the prediction of
traffic parameters when the amount of data available for training is less. In cases where the
training data is more, the performance of SVM is comparable to ANN, thus making it as an
alternative option for prediction problems.
Comparison of the overall performance showed ANN and SVM outperforming the traditional
methods, namely real-time and historic methods, especially under varying traffic flow
conditions. However, for long-range predictions, the use of historic data proved to be more
useful. The study also showed that current traffic conditions are good predictors while the traffic
conditions are not having variations. The ANN and SVM methods performed well for some
range into future. Also, both these methods have good dynamic response and show better
performance compared to the traditional models. The training of both SVM and ANN may not
make them attractive for online applications. However, both of them can be trained offline, and
then used for on-line prediction. Once the networks are trained and the network parameters are
stored off-line, the system can be used for online-applications, where the travel time
corresponding to the incoming data needs to be predicted quickly.
As discussed earlier in this chapter, to the knowledge of this author there have been very few
studies that explored the use of SVM in transportation applications and there have been none that
used SVM for the prediction of traffic variables. A lot more work is needed to exploit the
213
explanatory power of this powerful tool to the fullest. Also, more work is needed to explore the
effect of each of the different parameters of SVM, such as kernel function and cost function on
the prediction performance.
As explained already, the present study used the SVM toolbox developed by Steve Gunn (2003)
for MATLAB. The running time required by this toolbox was relatively high, taking two hours
for training 1 days data and going up to 4 days for training a 4-day data set. The running time
of the corresponding ANN model was in the order of 5 to 10 minutes. This may be due to the
fact that the toolbox for SVM may not be using the best optimization technique. The
performance of this toolbox is yet to be tested by MATLAB, and clearly some optimization
techniques similar to that used by ANN are needed to increase the computational efficiency in
the MATLAB environment. On the other hand, the ANN toolbox used in this dissertation is
developed and distributed as part of the MATLAB package, which is standardized and optimized
for fast and optimum performance. Being a new technique, SVM is yet to be explored fully to
get the best performance in terms of training time. Since the aim of this work was to investigate
the potential of SVM for the prediction of travel time, these issues are clearly out of scope of this
dissertation and hence not considered.
214
CHAPTER VII
SUMMARY AND CONCLUSIONS
7.1 SUMMARY
The problem statement of this dissertation identified three main needs: 1) the need to perform
data quality control of loop detector data at system level by analyzing the detectors as a series; 2)
the need to estimate travel time from loop detector data under varying traffic flow conditions;
and 3) the need to predict travel time to future time steps in an accurate way. A summary of how
each of these problems is addressed in this dissertation and the conclusions reached with
recommendations for further researches are provided in the following subsections.
Overall, this dissertation developed a comprehensive automated technique that is comprised of
different techniques at each individual stage, to predict travel time from the ILD data collected
from the field. The first step in this multi-step analysis was to carry out quality control of the
ILD data. Since in this dissertation the detectors were analyzed as a series, in addition to the
usual tests for checking the data discrepancies, quality control tests using the constraints based
on conservation of vehicles was also carried out. A non-linear constrained optimization
technique was adopted for correcting the discrepancy when there was a violation of the
conservation of vehicles. After correcting the discrepancies, the data were used for the
estimation of travel time. A methodology based on traffic flow theory was developed for the
estimation of travel time from ILD data. Finally the travel time was predicted to the future time
steps using the techniques, support vector machines, and artificial neural networks. Each of these
steps is briefly detailed below.
7.1.1 Data Reduction and Quality Control
Traditionally gross errors in loop detector data are identified using threshold checking on the
speed, volume, or occupancy observations, either individually or in combination. All of these
tests analyze and correct data at individual locations and therefore cannot account for systematic
problems over a series of detectors. While substantial failures in loop detector data are easily
identified using these existing methodologies; some other failures such as biases in volume
215
counts may go unnoticed, which can be identified if the detectors are analyzed as a series. Also,
for an application like estimation of travel time in a link, as in this present study, data from
consecutive ILDs need to be considered. In such cases when the detectors are analyzed as a
series, it is necessary to check the accuracy of the data based on the conservation of vehicles, in
addition to the individual location checks, since it is a basic condition that the data as a series
must follow.
Even though the violation of conservation of vehicles principle is a common problem with
detector data, this requirement has received little attention. Common applications of ILD data
such as incident detection may not get affected by this type of error and that may be a reason for
ignoring these errors in the earlier studies. However, if the loop detector data are to be
successfully used for applications such as O-D estimation or travel time estimation, these issues
of system data quality need to be addressed. As discussed in the literature review in Chapter II,
very few studies have been reported which systematically analyzed a series of detector locations
for a long interval of time to check whether the collected data follow the conservation of
vehicles. Most of those studies, when faced with a violation of conservation of vehicles,
suggested applying adjustment factors to rectify it, rather than applying any systematic
methodology.
In this dissertation the conservation of vehicles is checked, by comparing the cumulative flow
curves from consecutive detector stations. One weeks loop detector data from February 10 to
14, 2003, from the I-35 freeways of San Antonio was used. Systematic examination of the data
revealed that the conservation of vehicles principle was violated on many days. This may be due
to systematic errors such as some detectors under- or over-counting the vehicles.
This dissertation used a constrained non-linear optimization approach for systematically
identifying and correcting loop detector data obtained from the field, in situations where the data
violated the conservation of vehicles principle. The generalized reduced gradient method is
adopted with the objective function and constraints selected in such a way that the result will
follow the conservation of vehicles principle with least change of the original data. The objective
function was chosen to minimize the error from violation of conservation of vehicles principle
and the constraints were selected to keep the difference between the entry-exit observations
216
within the allowable maximum. Simulated data using CORSIM simulation software were used
for validating the methodology. This method of correcting the loop detector data is more useful
and convenient than the application of volume adjustment factors, when dealing with large
amount of data for a longer duration and having large discrepancies. Also, the optimization
technique proved to be very useful for imputing missing data as well as to prioritize the detector
stations for maintenance as illustrated in Chapter IV. This dissertation represents the first
application of this kind of an optimization technique for quality control of the freeway ILD data.
7.1.2 Estimation of Travel Time
The ILD data corrected using the optimization procedure can be used as input for the next stage,
which is the estimation of travel time. There are different methods available for the estimation of
travel time from loop detector data, the most popular among them being the extrapolation of the
point speed values. However, the accuracy of the speed-based methods declines as the flow
becomes larger. Other methods available are statistical and traffic flow theory based models, the
majority of which are developed for either the free-flow condition or the congested-flow
condition.
This dissertation presented several modifications to an existing traffic flow theory based model
for travel time estimation on freeways, such that the model can estimate travel time for varying
traffic flow conditions directly from the loop detector data. The approach was designed for
analyzing ILD data for longer intervals of time under varying traffic flow conditions. The input
used includes speed, flow, and occupancy obtained from field and the travel time estimation is
based on the area between the cumulative flow curves at entry and exit. Simulated data using
CORSIM simulation software were used for validating the results. After the validation, the
model was used for estimating travel time from field data. The travel time estimated was
compared with the AVI data collected from the field. The model result was also compared to the
results obtained from different available methods such as extrapolation method. The results
showed the developed model performing better under varying traffic flow conditions.
217
7.1.3 Prediction of Travel Time
Real-time information on current travel time can be useful to drivers in making their route
decisions if the traffic conditions are stable without much fluctuation. However, there are
fluctuation in traffic resulting in a substantial difference between the current link travel time and
the travel time on the link when traversed after a short time. Hence, accurate predictions are
more beneficial than current travel time information.
The present work introduced the application of a recently developed pattern classification and
regression technique called support vector machines (SVM) for travel time prediction. An
Artificial Neural Network (ANN) method was also developed in this dissertation for comparison.
It is also aimed at comparing and contrasting the performance of SVM, ANN, historic, and real-
time methods. Up to 4 days data were used for training the networks and 1 days data was left for
crossvalidation. The data used were the estimated travel time obtained from the model described
in the previous section.
The ANN model used was a multi-layer feed forward neural network and the SVM model used
was a support vector regression with radial basis kernel function. The analysis considered
forecasts ranging from 2 minutes ahead up to 1 hour into the future. The training data were
varied from 1 days data to 4 days data. The results were compared with historic and real-time
approaches.
Results of this comparison indicated that the explanatory power of SVR is comparable to ANN.
Also, SVR performed better than ANN when the training data is having more variations. To
check whether the results are data specific, speed predictions were also carried out using the field
data. Based on the investigation conducted in this dissertation, it was found that SVR is a viable
alternative to ANN for short-term prediction, especially when the training data are not a good
representative sample and when the amount of training data is less. In cases where enough
training data were available, the performance of SVM was comparable to ANN. Overall, it was
found that SVR is a good alternative option for prediction of traffic variables such as travel time.
218
The study also showed that current traffic conditions are good predictors, while long-range
predictions need the use of historical data. The ANN and SVM methods performed well for
some ranges into the future. Also, both these methods have good dynamic response and show
better performance compared to the traditional models. The training of both SVM and ANN may
not make them attractive for on-line applications. However, both of them can be trained off-line,
and then used for on-line prediction. Once the networks are trained and the network parameters
are stored off-line, the system can be used for online applications, where the travel time
corresponding to the incoming data needs to be predicted quickly.
As discussed in Chapter VI, to the knowledge of this author there have been very few studies
that explored the use of SVM in transportation applications and there have been none which used
SVM for the prediction of traffic variables. Thus, this dissertation is the first attempt to use SVM
technique for the prediction of vehicle travel time.
7.2 CONCLUSIONS
This dissertation resulted in a number of conclusions and they are listed as follows:
There are unidentified discrepancies in the ILD data even after the usual error checking
algorithms are applied. The data quality control can depend on the particular application
for which the data are used and for an application such as travel time estimation, more
data quality control is required than the usual error checking methods.
Majority of the ILD data collected from field violate the conservation of vehicles
principle when analyzed as a series for a long time. Thus, if the ILD data are used at a
system level, where the data from one detector are compared with that of its neighboring
detectors, checks should be conducted for conservation of vehicles.
The non-linear optimization technique adapted, namely, generalized reduced gradient
method, is found to be a suitable technique for removing the discrepancies in the ILD
data when the conservation of vehicles principle is violated.
Systematic correction of ILD data, such as using the optimization method is more useful
and convenient than the application of volume adjustment factors when dealing with
large amount of data for a longer duration and having large discrepancies.
219
The proposed generalized reduced gradient method also proved to be very useful for
imputing missing data as well as to prioritize the detector stations for maintenance.
It was found that the travel time estimation model proposed in this dissertation estimated
the travel time with considerable accuracy under varying traffic flow conditions. The
model was first validated using simulated data from CORSIM and it was found that the
estimated travel time is in good agreement with the actual travel time from simulation,
under congested and un-congested-flow conditions. The estimated travel time from ILD
data was compared with AVI data and the performance was found to be very
satisfactory. Thus, the developed theoretical model is found to be a promising method to
estimate travel time from loop detector data under varying traffic flow conditions.
A comparison of the developed model for the estimation of travel time with the
extrapolation method, which is the current field method, showed that the accuracy of the
performance of the developed model results increased with increasing flow values. It
was also found that the biggest differences in performance were observed during
transition and congested conditions. This is not unexpected because these conditions are
more difficult to model. In contrast it was found that both methods gave similar results
for un-congested conditions.
Support Vector Regression is a promising tool for the short-term prediction of travel
time with performance comparable to that of ANN, when the traffic condition is stable.
SVR performed better than ANN when the training data had more variations and the
amount of training data is relatively less.
Both ANN and SVM methods have good dynamic response and showed better
performance compared to the traditional models.
7.3 FUTURE RESEARCH
The optimization method for the data quality control used in this dissertation analyzed up to
five detectors in series. Future studies can check the performance with longer sections
having more detectors in series. Also, the optimization was carried out based on objective
function and constraints that will make sure that conservation of vehicles principle is not
violated. It is hypothesized that more rigorous objective function incorporating more
constraints may give better results and reduce more discrepancies in the data. Also, in this
dissertation the optimization was validated using simulated data. Future work can be
220
performed along a section where ground truth flow data can be collected for the same
locations as the detectors. This will provide the added benefit of a direct comparison of the
performance of the optimization method using field data.
Travel time estimation from loop detector data is an important component for the successful
use of ATIS as discussed in this dissertation. A model based on traffic flow theory was used
to obtain the travel time from field data corrected in the first step of the research. The
validation of the model was checked mainly using simulated data from CORSIM. Validation
using field data made use of AVI data. However, the sample size of AVI data was very less.
Future similar work should be performed along a section from where more ground truth
travel time data can be collected for the same location as the detector points. This would
provide a direct comparison of the performance of the theoretical model used in this
dissertation. Also, the data used in the present study added up data from all the lanes of the
road at a detector location and assumed it as a single lane. Future work is needed where the
analysis of the data is carried out at lane-by-lane level, and this needs the development of a
model that can also take into account the lane changing characteristics.
More research concerning the use of SVM for travel time prediction is necessary, especially
when one is interested in getting the best performance in terms of training time. For
example, exploring the effect of each of the different parameters of SVM such as kernel
function and cost function on the prediction performance is needed. Also, a more
computationally efficient and standardized toolbox is needed for fast and optimum
performance.
221
REFERENCES
Abadie, J. (1970), Application of the GRG algorithm to optimal control problems. Integer
and nonlinear programming, J. Abadie, ed., Norton-Holland Publishing
Company,Amsterdam, The Netherlands, 191-211.
Abadie, J. (1978), The GRG method for nonlinear programming. in Design and
implementation of optimization software, H. J. Greenberg, ed., Sijthoff and Noordhoff,
Leyden, The Netherlands, 335-362.
Abadie, J., and Carpenter, J. (1969), Generalization of the Wolfe reduced gradient method to
the case of nonlinear constraints. in Optimization-symposium of the institute of
mathematics and its applications, R. Fletcher, ed., University of Keele, Academic Press, U.
K., 37-47.
Abdulhai, B., and Tabib, S. M. (2003), Spatio-temporal inductance pattern recognition for
vehicle re-identification. Transp. Res. C, 11, 223-239.
Al-Deek, H. M. (1998), Travel time prediction with non-linear time series. Proc. of the 5
th
Int.
Conf. on Applications of Advanced Technologies in Transportation Engineering (AATT-
5): ASCE, Newport Beach, California, 317-324.
Ametha, J. (2001), Development and implementation of algorithms used in ground and internet
traffic monitoring. Masters thesis, Department of Mechanical Engineering, Texas A&M
University, College Station, Texas.
Anderson, J. M., Bell, M. G. H., Sayers, T. M., Busch, F. M., and Heymann, G. (1994), The
short term prediction of link travel times in signal controlled road networks. Transp.
systems: Theory and Application of Advanced Technology, IFAC symposium, Tianjin,
PRC, 621-626.
Beale, R., and Jackson, T. (1990), Neural computing: An introduction. IOP Publishing Ltd.,
Bristol, U.K.
Bellamy, P. H. (1979), Undercounting of vehicle with single-loop-detector systems. TRRL
Supplementary Rep. 473, Transport and Road Research Laboratory, Berkshire, U. K.
Bellemans, T., Schutter, B. D., and Moor, B. D. (2000), On data acquisition, modeling, and
simulation of highway traffic. Proc. of the 9
th
IFAC Control in Transp. Systems-Vol 1, E.
Schnieder and U. Becker, eds., Braunschweig, Germany, 22-27.
222
Bender, J., and Nihan, L. (1988), Inductive loop detector failure identification: A state of the
art review. Final Interim Rep., Research Project GC8286, Task 24, Item no. 204 C,
Washington State Transp. Center, Washington.
Berka, S., and Lall, K. B. (1998), New perspectives for ATMS: advanced technologies in
traffic detection. J. of Transp. Engineering, ASCE , 1, 9-15.
Bikowitz, E. W., and Ross, S. P. (1985), Evaluation and improvement of inductive loop traffic
detectors. Transp. Res. Rec. 1010, Transportation Research Board, Washington, D. C., 76-
80.
Blanz, V., Scholkopf, B., Bulthoff, H., Burges, C., Vapnik, V., and Vetter, T. (1996),
Comparison of view-based object recognition algorithms using realistic 3D models. In
Artificial Neural Networks - ICANN'96, Berlin, Springer Lecture Notes in Computer
Science, 1112, 251-256.
Blue, V., List, G. F., and Embrechts, M. J. (1994), Neural net freeway travel time estimation.
in Intelligent Engineering Systems through Artificial Neural Networks, C.H. Dagli, B.R.
Frenandez, J. Ghosh, and R.T.S. Kumara, eds., Vol. 4, ASME Press, NewYork, 1135-1140.
Bovy, P. H. L., and Thijs, R. (2000), Estimators of travel time for road networks: New
developments, evaluation results, and applications. Delft University Press, The
Netherlands.
Boyce, D., Rouphail, N., and Kirson, A. (1993), Estimation and measurement of link travel
times in the ADVANCE project. in Proc. of the Vehicle Navigation and Information
Systems Conf., IEEE, New York, 62-66.
Brydia, R. E., Turner, S. M., Eisele, W. L., and Liu, J. C. (1998), Development of intelligent
transportation system data management. Transp. Res. Rec. 1625, Transportation Research
Board, Washington, D.C., 124-130.
Burges, C. J. C. (1998), A tutorial on support vector machines for pattern recognition. Kluwer
Academic Publishers, Boston.
Burges, C. J. C., and Scholkopf, B. (1997), Improving the accuracy and speed of support
vector learning machines. in Advances in Neural Information Processing Systems, Vol. 9,
MIT Press, Cambridge, Massachusetts, 375-381.
Campbell, C. (2002), Kernel methods: A survey of current techniques. Neuro Computing, 48,
63-84.
223
Cassidy, M. J. (1998), Bivariate relations in nearly stationary highway traffic. Transp.
Research B, 32, 49-59.
Chen, L., and May, A. D. (1987), Traffic detector errors and diagnostics. Transp. Res. Rec.
1132, Transportation Research Board, Washington, D.C., 82-93.
Chen, L., Kwon, J., Rice, J., Skabardonis, A., and Varaiya, P. (2003), Detecting errors and
imputing missing data for single-loop surveillance systems. Presented at the TRB 82
nd
Annual Meeting (CDROM), Transportation Research Board, Washington D.C.
Chen, M. and Chien, S. I. J. (2001), Dynamic freeway travel time prediction using probe
vehicle data: Link based vs. path based. Presented at the 80th Annual Meeting (CD-
ROM), Transportation Research Board, Washington D.C.
Cherrett, T. J., Bell, H. A., and Mc Donald, M. (1996), The use of SCOOT type single-loop
detectors to measure speed, journey time and queue status on non- SCOOT controlled
links. 8
th
Int. Conf. on Road Traffic Monitoring and Control, 23-25.
Chien, S., Liu, X., and Ozbay, K. (2003), Predicting travel times for the South Jersey real-time
motorist information system. Presented at the 82nd Annual Meeting (CD-ROM),
Transportation Research Board, Washington D.C.
Chien, S. I. J., and Kuchipudi, C. M. (2002), Dynamic travel time prediction with real-time and
historical data, Presented at the 81st TRB Annual Meeting (CD-ROM), Transportation
Research Board Washington D. C.
Cleghorn, D., Hall, F. L., and Garbuio, D. (1991), Improved data screening techniques for
freeway traffic management systems. Transp. Res. Rec. 1320, Transportation Research
Board, Washington, D.C., 17-23.
Coifman, B. (1998), Vehicle re-identification and travel time measurement in real-time on
freeways using existing loop detector infrastructure. Transp. Res. Rec. 1643,
Transportation Research Board, Washington, D.C., 181-191.
Coifman, B. (1999), Using dual-loop speed traps to identify detector errors. Transp. Res. Rec.
1683, Transportation Research Board, Washington, D.C., 47-58.
Coifman, B. (2001), Improved Velocity Estimation Using Single-loop Detectors. Transp.
Research A, 35(10), 863-880.
Coifman, B. (2002), Estimating travel times and vehicle trajectories on freeways using dual-
loop detectors. Transp. Research A, 36, 351-364.
224
Coifman, B., and Cassidy, M. (2002), Vehicle re-identification and travel time measurement
on congested freeways. Transp. Research A, 36, 899-917.
Coifman, B., and Dhoorjaty, S. (2002), Event data based traffic detector validation tests.
Presented at the TRB 81
st
Annual Meeting (CD-ROM), Transportation Research Board,
Washington D.C.
CORSIM Users Guide (Software Help Menu) (2001), FHWA, U.S. Department of
Transportation, Washington, D.C.
Cortes, C., and Vapnik, V. (1995), Support vector networks. Machine Learning, 20, 273-297.
Cortes, C. E., Lavanya, R., Oh, J. S., and Jayakrishnan, R. (2002), A general purpose
methodology for link travel time estimation using multiple point detection of traffic.
Presented at the 81
st
Annual Meeting (CD-ROM), Transportation Research Board,
Washington D.C.
Courage, K. G., Bauer, C. S., and Ross, D. W. (1976), Operating parameters for main line
sensors in freeway surveillance systems. Transp. Res. Rec. 601, Transportation Research
Board, Washington, D.C., 19-26.
Cristianini, N. and Shawe-Taylor, J. (2000), An introduction to support vector machines and
other kernel based learning methods, Cambridge University Press, Cambridge, New York.
Cybenko, G. (1989), Approximation by superimposition of a sigmoidal function.
Mathematical Control Signals Systems, 2, 303-314
DAngelo, M. P., Al-Deek, H. M., and Wang, M. C. (1998), Travel time prediction for freeway
corridors. Transp. Res. Rec. 1676, Transportation Research Board, Washington, D.C.,
184-191.
Daganzo, C. (1997), Fundamentals of transportation and traffic operations. Pergamon-Elsevier,
Oxford, U.K.
Dia, H. (2001), An object oriented neural network approach to short term traffic forecasting.
European J. of Operational Res., 131, 253-261.
Dailey, D. J. (1993), Travel time estimation using cross-correlation techniques. Transp. Res.
B, 27(2), 97-107.
Dailey, D. J. (1997), Travel time estimates using a series of single-loop volume and occupancy
measurements. Presented at the 76th Annual Meeting (CD-ROM), Transportation
Research Board, Washington D.C.
225
Dayhoff, J. E. (1990), Neural network architectures: An introduction. Van Nostrand Reinhold,
New York.
Desa, J. P. M. (2001), Pattern recognition, concepts, methods and applications. Springer, New
York.
Dharia, A., and Adeli, H. (2003), Neural network model for rapid forecasting of freeway link
travel time. Engineering. Applications of Artificial Intelligence, 16(7-8), 617-613.
Dhulipala, S. (2002), A system for travel time estimation on urban freeways. Masters thesis,
Department of Civil and Environmental Engineering, Virginia Polytechnic Institute and
State University, Blacksburg, Virginia.
Ding, A., Zhao, X., and Jiao, L. (2002), Traffic flow time series prediction based on statistics
learning theory. IEEE 5
th
Int. Conf. on Intelligent Transp. Systems, Singapore, 727-730.
Dougherty, M. (1995), A review of neural networks applied to transport. Transp. Res. C, 3(4),
247-260.
Dougherty, M. S., and Cobbett, M. R. (1997), Short term inter urban traffic forecasts using
neural networks. Int. J. of Forecasting, 13, 21-31.
Dougherty, M. S., Kirby, H. R., and Boyle, R. D. (1994), Using neural networks to recognize,
predict and model traffic. in Artificial intelligence applications to traffic engineering, M.
Bielli, G. Ambrosino and M. Boero, eds., Utrecht, The Netherlands, 233-250.
Drew, D. R. (1968), Traffic flow theory and control. McGraw Hill series in Transp., McGraw
Hill Book Company, New York.
Dudek, C. L., Messer, C. J., and Dutt, A. K. (1974), Study of detector reliability for a motorist
information system on the gulf freeway. Transp. Res. Rec. 495, Transportation Research
Board, Washington, D.C., 35-43.
Eisele, W. L. (2001), Estimating travel time mean and variance using intelligent transportation
systems data for real-time and off-line transportation applications. Doctoral dissertation,
Department of Civil Engineering, Texas A&M University, College Station, Texas.
Eiselt, H. A., Pederzoli, G., and Sandblom, C. L. (1987), Continuous optimization models.
Walter De Gruyter Inc., Berlin, Germany.
Ezforecaster. (2003). < http://www.ezforecaster.com/ bestfit. htm> (Dec. 19, 2003).
Faghri, A., and Hua, J. (1992), Evaluation of artificial neural network applications in
transportation engineering Transp. Res. Rec. 1358, Transportation Research Board,
Washington, D.C., 71-80.
226
Faouzi, N. E., and Lesort, J. B. (1995), Travel time estimation on urban networks from traffic
data and on-board trip characteristics. Proc. of the 2
nd
World Congress on Intelligent
Transp. Systems, Yokohama, Japan.
Fenton, R. E. (1980), On future traffic control: advanced systems hardware. IEEE
Transactions on Vehicular Technology, VT-29, 200-207.
Ferrier, P. J. (1999), Comparison of vehicle travel times and measurement techniques along the
I-35 corridor in San Antonio, Texas. Masters thesis, Department of Civil Engineering,
Texas A&M University, College Station, Texas.
Gabriele, G. A., and Beltracchi, T. J. (1987), Resolving degeneracy in the generalized reduced
gradient method. J. of Mechanics, Transmissions and Automation in Design, 109(2), 263-
267.
Gabriele, G. A., and Ragsdell, K. M. (1977), The generalized reduced gradient method: A
reliable tool for optimal design. J. of Engineering for Industry: Transactions of the ASME,
99, 394-400.
Gabriele, G. A., and Ragsdell, K. M. (1980), Large-scale non-linear programming using the
generalized reduced gradient method. Transactions of the ASME, 102(3), 566-573.
Gibson, D., Mills, M. K., and Rekenthaler, D. (1998), Staying in the loop: The search for
improved reliability of traffic sensing systems through smart test instruments. Public
Roads, 62(2), <http://www.tfhrc.gov/pubrds/septoct98 /loop.htm> (Feb. 12, 2004).
Gold, D. L., Turner, S. M., Gajewski, B. J., and Spiegelman, C. (2001), Imputing missing
values in its data archives for intervals under 5 minutes. Presented at the 80
th
Annual
Meeting (CD-ROM), Transportation Research Board, Washington D.C.
Gunn, S. R. (2003), Support vector machines for classification and regression.
<http://www.ecs.soton.ac.uk/~srg/ publications/pdf/SVM.pdf> (Nov. 23, 2003).
Gupta, S. (1999), A new algorithm for detecting erroneous loop detector data. Masters thesis,
Department of Mechanical Engineering, Texas A& M University, College Station, Texas.
Hauslen, R. A. (1977), The promise of automatic vehicle identification. IEEE Transactions
on Vehicular Technology, VT-26, 30-38.
Haykin, S. S. (1994), Neural networks: A comprehensive foundation. Prentice Hall, N.J., 1999.
Hearst, M. A. (1998), Trends and controversies: Support vector machines, IEEE Intelligent
Systems, 13(4), 18-28.
Highway Capacity Manual, (2000), Transportation Research Board, Washington, D.C.
227
Himmelblau, D. M. (1972), Applied Nonlinear Programming. McGraw-Hill, New York.
Hoffman, C., and Janko, J. (1990), Travel time as a basis of the LISB guidance strategy, in
Proc. of IEEE Road Traffic Control Conf., IEEE, New York, 6-10.
Hoogendoorn, S. P. (2000), Model-based multiclass travel time estimation. Presented at the
79
th
Annual Meeting (CD-ROM), Transportation Research Board, Washington D.C.
Hornik, K., Stinchcombe, M., and White, H. (1989), Multilayer feed forward networks are
universal approximators. Neural Networks, 2, 359-366
Huang, S. H., and Ran, B. (2003), An application of neural network on traffic speed prediction
under adverse weather condition. Presented at the 82
nd
TRB Annual Meeting (CD-ROM),
Transportation Research Board, Washington D.C.
Huisken, G., and van Berkum, E. (2002), Short-term travel time prediction using data from
induction loops. 9th World Congress on Intelligent Transport Systems (CD-ROM),
Chicago, Illinois.
Innama, S. (2001), Short term prediction of highway travel time using MLP neural networks.
8
th
World Congress on Intelligent Transp. systems, Sydney, Australia, 1-12.
Ishak, S., and Al- Deek, H. (2002), Performance evaluation of short term time series traffic
prediction model. J. of Transp. Engineering, ASCE, 128(6), 490-498.
Ishak, S., Kotha, P., and Alecsandru, C. (2003), Optimization of dynamic neural networks
performance for short term traffic prediction. Presented at the 82
nd
TRB Annual Meeting
(CD-ROM), Transportation Research Board, Washington D.C.
Iwasaki, M., and Shirao, K. (1996), A short term prediction of traffic fluctuations using pseudo
traffic patterns. 3
rd
World Congress on Intelligent Transport Systems Conference (CD-
ROM), Orlando, Florida.
Jack, L. B., and Nandi, A. K. (2002), Fault detection using support vector machines and
artificial neural networks augmented by genetic algorithms. Mechanical Systems and
Signal Processing, 16(2-3), 373-390.
Jacobson, L. N., Nihan, N. L., and Bender, J. D. (1990), Detecting erroneous loop detector data
in a freeway traffic management system. Transp. Res. Rec. 1287, Transportation Research
Board, Washington, D.C., 151-166.
Jeffrey, D. J., Russam, K., and Robertson, D. I. (1987), Electronic route guidance by
AUTOGUIDE: The research background. Traffic Engineering and Control, 28(10), 525-
529.
228
Joachims, T. (1997), Text categorization with support vector machines. Technical Rep., LS
VIII no. 23, University of Dortmund, Dortmund, Germany.
Kaysi, I., Ben-Akiva, M., and Koutsopoulos, H. (1993) An integrated approach to vehicle
routing and congestion prediction for real-time driver guidance. Transp. Res. Rec. 1408,
Transportation Research Board, Washington, D.C., 66-74.
Kecman, V. (2001), Learning and soft computing: Support vector machines, neural networks,
and fuzzy logic models, The MIT Press, Cambridge, Massachusetts.
Kikuchi, S. (2000), A method to defuzzify the fuzzy numbers: Transportation problem
application. Fuzzy Sets and Systems, 116, 3-9.
Kikuchi, S., and Miljkovic, D. (1999), A method to pre-process traffic data: Application of
fuzzy optimization concept. Presented at the 78
th
Annual Meeting of the Transportation
Research Board (CD-ROM), Transportation Research Board, Washington, D.C.
Kikuchi, S., Miljkovic, D., and Van Zuylen, H. J. (2000), Examination of methods that adjust
observed traffic volumes on a network. Transp. Res. Rec. 1717, Transportation Research
Board, Washington, D. C., 109-119.
Kisgyorgy, L., and Rilett, L. R. (2002), Travel time prediction by advanced neural network.
Periodic Polytechnica Series in Civil Engineering, 46 (1), 15-32.
Klein, L. A. (2001), Sensor technologies and data requirements for ITS. Artech House, Boston,
London.
Kreer, J. B. (1975), A comparison of predictor algorithms for computerized control. Traffic
Engineering, 45 (4), 51-56.
Kuchipudi, C. M., and Chien, S. I. J. (2003), Development of a hybrid model for dynamic
travel time prediction. Presented at the 82nd Annual Meeting (CD-ROM), Transportation
Research Board, Washington D.C.
Kuhne, R., and Michalopoulos, P. (2003), Continuum flow models, Traffic flow theory, in
The Revised Monograph on Traffic Flow Theory, N. Gartner, C. J. Messer, and A. K.
Rathi, eds. <http://www.tfhrc.gov/its /tft/tft.htm> (May 21, 2003).
Kuhne, R. D., Palen, J., Gardner, C., and Ritchie, S. (1997), Loop based travel time
measurement: Fast incident detection using traditional loops. Traffic Technology Int.
Annual, 157-161.
229
Kwon, J., Coifman, B., and Bickel, P. (2000), Day-to-day travel time trends and travel time
prediction from loop detector data. Transp. Res. Rec. 1717, Transportation Research
Board, Washington, D.C., 120-129.
Labell, L. N., Spencer, M., Skabardonis, A., and May, A. D. (1989), Detectors for freeway
surveillance and control. Working Paper UCB-ITS-WP-89-1, University of California,
Berkeley.
Lasdon, L. S., and Waren, A. D. (1978), Generalized reduced gradient software for linearly
and non linearly constrained problems. Design and implementation of optimization
software, H. J. Greenberg, ed., Sijthoff and Noordhoff, Alphen aan den Rijn, The
Netherlands, 363-396.
Lasdon, L. S., Waren, A. D., Jain, A., and Ratner, M. (1978), Design and testing of a
generalized reduced gradient code for nonlinear programming. ACM Transactions on
Mathematical Software, 4, 34-50.
Lee, S., Kim, D., Kim, J., and Cho, B. (1998), Comparison of models for predicting short-term
travel speeds. 5
th
World Congress on Intelligent Transp. Systems(CD-ROM), Seoul,
Korea.
Lighthill, M. J., and Whitham, G. B. (1955), On kinematic waves: II. A theory of traffic flow
on long crowded roads. Proc., Royal Society, A 229, 1178, 317-345.
Lindveld, C. D. R., Thijis, R., Bovy, P. H., and Van der Zijpp, N. J. (2000), Evaluation of
online travel time estimators and predictors. Transp. Res. Rec. 1719, Transportation
Research Board, Washington, D.C., 45-53.
Lindveld, C. D. R., and Thijs, R. (1999), On-line travel time estimation using inductive loop
data: The effect of instrumentation peculiarities. 6
th
Annual World Conf. on Intelligent
Transp. Systems (CD-ROM), Toronto.
Lippman, R. P. (1987), An introduction to computing with neural nets. IEEE ASSP Magazine,
4-22.
Liu, T. K. (2000), Travel time data needs, applications, and data collection. <http://www.
Nmsu.edu/Research/traffic/public_html/NATDAC96/authors/liu.htm> (Aug. 25, 2000).
Mahalel, D., and Hakkert, A. S. (1995), Time series for vehicle speeds. Transp. Research B,
19(3), 217-225.
230
Manfredi, S., Salem H. H., and Grol, H. J. M. (1998), Development and application of co-
ordinated control of corridors.
<ftp://ftp.cordis.lu/pub/telematics/docs/taptransport/daccord_d9.1.pdf> (Nov. 8, 2003)
The MathWorks, Inc. (2003), MATLAB documentation, Version 6.5.0.180913a, Release 13,
Natick, Massachusetts.
Matsui, H., and Fujita, M. (1998), Travel time prediction for freeway traffic information by
neural network driven fuzzy reasoning. in Neural networks in transp. applications, V.
Himanen, P. Nijkamp, A. Reggiani, and J. Raitio, eds., Ashgate Publishers, Burlington,
Vermont, 355-364.
Mattera, D., and Haykin, S. (1999), Support vector machines for dynamic reconstruction of a
chaotic system. Advances in kernel methods: Support vector learning, B. Scholkopf, C. J.
C. Burges, and A. J. Smola, eds., MIT Press, Cambridge, Massachusetts.
May, A. D. (1990), Traffic flow fundamentals, Prentice-Hall, Inc., Englewood Cliffs, New
Jersey.
May, A. D., Cayford, R., Coifman, B., and Merritt, G. (2003), Loop detector data collection
and travel time measurement in the Berkeley highway laboratory. California PATH
Research Rep. UCB-ITS-PRR-2003-17, Institute of Transp. Studies, Berkeley, California.
Mc Fadden, J., Yang, W. T., and Durrans, S. R. (2001), Application of artificial neural
networks to predict speeds on two-lane rural highways. Presented at the 80
th
TRB Annual
Meeting (CD-ROM), Transportation Research Board, Washington, D.C.
Middleton, D., Jasek, D., and Parker, R. (1999), Evaluation of some existing technologies for
vehicle detection. Rep. No. FHWA/TX-00/1715-S, Texas Transportation Institute, College
Station, Texas.
Miller, J. C., and Miller, J. N. (1993), Statistics for analytical chemistry, Ellis Horwood PTR
Prentice Hall, Englewood Cliffs, New Jersey.
Mukherjee, S., Osuna, E., and Girosi, F. (1997), Nonlinear prediction of chaotic time series
using a support vector machine, Proc. of the IEEE Workshop on Neural Networks and for
Signal Processing, Amelia Island, Florida, 511-519.
Muller, K. R., Smola, A., Ratsch, G., Scholkopf, B., Kohlmorgen, J., and Vapnik, V. (1999),
Using support vector machines for time series prediction. in Advances in kernel
methods: Support vector learning, B. Scholkopf, C. J. C. Burges, A. Smola, eds., MIT
Press, Cambridge, Massachusetts.
231
Nair, A. S., Liu, J. C., Rilett, L. R., and Gupta, S. (2001), Non linear analysis of traffic flow.
4
th
Int. IEEE Conf. on Intelligent Transp. Systems, Oakland, California, 681-685.
Nakatsuji, T., and Shibuya, S. (1998), Neural network models applied to traffic flow
problems. in Neural networks in transport applications, V. Himanen, P. Nijkamp, A.
Reggiani, and J. Raitio, eds., Ashgate Publishers, Burlington, Vermont, 249-262.
Nam, D. H. (1995), Methodologies for integrating traffic flow theory, ITS and evolving
surveillance technologies. Doctoral dissertation, Department of Civil Engineering,
Virginia Polytechnic Institute and State University, Blacksburg, Virginia.
Nam, D. H., and Drew, D. R. (1996), Traffic dynamics: Methods for estimating freeway travel
times in real-time from flow measurements. J. of Transp. Engineering, ASCE, 122(3),
185-191.
Nam, D. H., and Drew, D. R. (1998), Analyzing freeway traffic under congestion: Traffic
dynamics approach. J. of Transp. Engineering, ASCE, 124(3), 208-212.
Nam, D. H., and Drew, D. R. (1999), Automatic measurement of traffic variables for
intelligent transportation systems applications. Transp. Research B, 33, 437-457.
Nanthawichit, C., Nakatsuji, T., and Suzuki, H. (2003), Application of probe vehicle data for
real-time traffic state estimation and short term travel time prediction on a freeway.
Presented at the 82
nd
Annual Meeting (CD-ROM), Transportation Research Board,
Washington D.C.
NEMA, Traffic control systems (1983), Standard Publication Number TS-1, National Electrical
Manufacturers Association, Washington, D.C.
Nihan, L., Jacobson, L. N., and Bender, J. D. (1990), Detector data validity., Final Rep., WA-
RD 208.1, Washington State Transp. Center, Washington.
Nihan, N., and Wong, M. (1995), Improved error detection using prediction techniques and
video imaging, Final Technical Rep., Research Project T9233, Washington State Transp.
Center, Washington.
Oda, T. (1990), An algorithm for prediction of travel time using vehicle sensor data. IEEE 3
rd
Int. Conf. on Road Traffic Control, 40-44,
<http://ieeexplore.ieee.org/servlet/opac?punumber=1222>.
Oh, J., Jayakrishnan, R., and Recker, W. (2003), Section travel time estimation from point
detection data. Center for Traffic Simulation Studies, Paper VCI-ITS-TS-WP-02-15,
<http://repositories.cdlib.org/itsirvine/ctss/UCI-ITS-TS-WP-02-15>, (July 4, 2003).
232
Ohba, Y., Koyama, T., and Shimada, S. (1997), Online learning type of traveling time
prediction model in expressway. IEEE Conf. on Intelligent Transp. Systems, Boston,
Massachusetts, 350-355.
Osuna, E., Freund, R., and Girosi, F. (1997a), Training support vector machines: An
application to face detection. IEEE Conf. on Computer Vision and Pattern Recognition,
Juan, Puerto Rico, 130-136.
Osuna, E., Freund, R., and Girosi, F. (1997b), Nonlinear prediction of chaotic time series using
support vector machines. Proc. of the IEEE Workshop on Neural Networks for Signal
Processing, Amelia Island, Florida, 276-285.
Palacharla, P. V., and Nelson, P. C. (1999), Application of fuzzy logic and neural networks for
dynamic travel time estimation. Int. Transactions in Operational Research, 6, 145-160.
Park, B., Messer, C. J., and Urbanik, T. II. (1998), Short term traffic volume forecasting using
radial basis function neural network. Transp. Res. Rec. 1651, Transportation Research
Board, Washington, D.C., 39-47.
Park, D., and Rilett, L. R. (1998), Forecasting multiple period freeway link travel times using
modular neural networks. Transp. Res. Rec. 1617, Transportation Research Board,
Washington, D.C., 163-170.
Park, D., and Rilett, L. R. (1999), Forecasting freeway link travel times with a multi-layer feed
forward neural network. Computer Aided Civil and Infrastructure Engineering, 14, 357-
367.
Park, D., Rilett, L. R., and Han, G. (1999), Spectral basis neural networks for real-time link
travel times forecasting. J. of Transp. Engineering, ASCE, 125(6), 515-523.
Park, E. S., Turner, S., and Spiegelman, C. H. (2003), Empirical approaches to outlier
detection in ITS data. Presented at the TRB 82
nd
Annual Meeting (CD-ROM),
Transportation Research Board, Washington D.C.
Payne, H. J., Helfenbein, E. D., and Knobel, H. C. (1976), Development and testing of incident
detection algorithms, 2, Research methodology and detailed results, Rep. No. FHWA-RD-
76-20, McLean, Virginia.
Payne, H. J., and Thompson, S. (1997), Malfunction detection and data repair for induction
loop sensors using I-880 database. Transp. Res. Rec. 1570, Transportation Research
Board, Washington, D. C., 191-201.
233
Peeta, S., and Anastassopoulos, I. (2002), Automatic real-time detection and correction of
erroneous detector data using Fourier transforms for on-line traffic control architectures.
Presented at the TRB 81
st
Annual Meeting (CD-ROM), Transportation Research Board
Washington D.C.
Persaud, B. N., and Hurdle, V. F. (1988), Some new data that challenge some old ideas about
speed flow relationships. Transp. Res. Rec. 1194, Transportation Research Board,
Washington, D.C., 191-198.
Petty, K. (1995), Freeway service petrol 1.1 the analysis software for the FSP project.
California PATH Research Rep., UCB-ITS-PRR-95-20, Berkeley, California.
Petty, K. F., Bickel, P., Ostland, M., Rice, J., Schoenberg, F., Jiang, J., and Ritov, Y. (1998),
Accurate estimation of travel times from single-loop detectors. Transp. Res. - A, 32(1),
1-17.
Pfannerstill, E. (1989), Automatic monitoring of traffic conditions by re-identification of
vehicles. Institution of Electrical Engineers 2
nd
Int. Conf. on Road Traffic Monitoring,
Publication # 299, London. U. K.
Pinnell-Anderson-Wilshire and Associates, Inc. (1976), Inductive loop detectors: Theory and
practice, U.S. Department of Transportation, Federal Highway Administration,
Springfield, Washington, D.C.
Quiroga, C. (2000), Assessment of dynamic message travel time information accuracy. Proc.
of the North American Travel Monitoring Conf. and Exposition, Middleton, Wisconsin, 1-
13.
Raj, J., and Rathi, A. (1994), Inductive loop tester ILT II. Summary Report, FHWA-SA-94-
077, Washington D. C.
Rice, J., and van Zwet, E. (2002), A simple and effective method for predicting travel times on
freeways. IEEE Intelligent Transp. Systems Conf. Proc., Piscataway, New Jersey, 227-
232.
Richards, P. I. (1956), Shock waves on the highway. Operations Research, 4(1), 42-51.
Rilett, L. R., and Park, D. (2001), Direct forecasting of freeway corridor travel times using
spectral basis neural networks. Transp. Res. Rec. 1752, Transportation Research Board,
Washington, D.C., 140-147.
234
Rilett, L. R., Kim, K., and Raney, B. (2000), Comparison of low-fidelity TRANSIMS and
high-fidelity CORSIM highway simulation models with intelligent transportation system
data. Transp. Res. Rec. 1739, Transportation Research Board, Washington, D.C., 1-8.
Rosenblatt, F. (1962), Principles of neuro dynamics. Spartan Books, New York.
Roth, S. H. (1977), History of automatic vehicle monitoring (AVM). IEEE Transactions on
Vehicular Technology VT-26, 2-6.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986), Learning representations by back
propagating errors. Nature, 323(9), 533-536.
Saito, M., and Watanabe, T. (1995), Prediction and Dissemination system for travel time
utilizing vehicle detectors. Proc. of the 2
nd
World Congress on Intelligent Transp. Systems
conference, Yokohama, Japan.
Samanta, B. (2004), Gear fault detection using artificial neural networks and support vector
machines with genetic algorithms. Mechanical Systems and Signal Processing, 18, 625-
644.
Schmidt, M. (1996), Identifying speaker with support vector networks. Interface 96 Proc.,
Sydney, Australia.
Scholkopf, B., Burges, C., and Vapnik, V. (1995), Extracting support data for a given task.
Proc. of first Int. Conf. on Knowledge Discovery and Data Mining, U. M. Fayyad and
Uthurusamy, R., eds., AAAI Press, CA.
Scholkopf, B., Burges, C., and Vapnik, V. (1996), Incorporating invariances in support vector
learning machines. Artificial neural networks-ICANN 1996. 47-52.
Scott, B. M. (1992), Automatic vehicle identification: A test of theories of technology.
University of Wollongong, Published in Science, Technology, & Human Values, 17 (4),
485-505, < http://www.uow.edu.au/arts/sts/bmartin/pubs /92sthv.html> (Nov. 23, 2003).
Seki, S. (1995), Travel time measurement and provision system using AVI units. Proc. of the
2
nd
World Congress on Intelligent Transp. Systems, Yokohama, Japan.
Sen, A., Liu, N., Thakuriah, P., and Li, J. (1991), Short-term forecasting of link travel times: A
preliminary proposal. ADVANCE Working Paper Series, Number 7, Illinois, Chicago.
Sharma, S., Lingras, P., and Zhong, M. (2003), Effect of missing value imputations on traffic
parameters estimations from permanent traffic counts. Presented at the TRB 82
nd
Annual
Meeting (CD-ROM), Transportation Research Board Washington D.C.
235
Shbaklo, S., Bhat, C., Koppelman, F., Li, J., Thakuriah, P., Sen, A., and Rouphail, N. (1992),
Short-term travel time prediction. ADVANCE Project Rep., TRF-TT-01, University
Transp. Research Consortium, Illinois, Chicago.
Simon, N. (1993), Constructive supervised learning algorithms for artificial neural networks.
Masters Thesis, Delft University, The Netherlands.
Singleton, M., and Ward, J. E. (1977), A comparative study of various types of vehicle
detectors. Rep. No. DOT-TSC-OST-77-9, Springfield, Virginia.
Sisiopiku, V. P., Rouphail, N. M., and Santiago, A. (1994a), Analysis of correlation between
arterial travel time and detector data from simulation and field studies. Transp. Res. Rec.
1457, Transportation Research Board, Washington, D.C., 166-173.
Sisiopiku, V. P., Rouphail, N. M., and Tarko, A. (1994), Estimating travel times on freeway
segments. ADVANCE Working Paper Series Number 32, Urban Transp. Center,
University of Illinois at Chicago.
Smith, B. L., and Conklin, J. H. (2002), Use of local lane distribution patterns to estimate
missing data values from traffic monitoring systems. Transp. Res. Rec. 1811,
Transportation Research Board, Washington, D. C., 50 - 56.
Smith, B. L., and Demetsky, M. J. (1994), Short term traffic flow prediction: Neural network
approach. Transp. Res. Rec. 1453, Transportation Research Board, Washington, D. C. 98-
104.
Smith, B. L., and Demetsky, M. J. (1997), "Traffic flow forecasting: Comparison of modeling
approaches." J. of Transp. Engineering, ASCE, 123 (4), 261-266.
Smith, B. L., Scherer, W. T., and Conklin, J. H. (2003), Exploring imputation techniques for
missing data in transportation management systems. Presented at the 82nd TRB Annual
Meeting (CD-ROM), Transportation Research Board, Washington D. C.
Smola, A. J., and Scholkopf, B. (1998), A tutorial on support vector regression. NeuroCOLT2
Technical Rep. Series, NC2-TR-1998-030, < http://www.kernel
machines.org/tutorial.html> (Feb. 12, 2004)
Son, B. (1996), Discussion on Traffic dynamics: Methods for estimating freeway travel times
in real-time from flow measurements, by Nam and Drew. J. of Transp. Engineering,
ASCE, 122(3), 519-520.
236
Speed, M., and Spiegelman, C. (1998), Evaluating black boxes: An ad-hoc method for
assessing nonparametric and non-linear curve fitting estimators. Communications in
StatisticsSimulation and Computation 27(3), 699-710.
Sreedevi, I., and Black, J. (2001), <http://www.path.berkeley.edu/~leap/
TTM/Incident_Manage/Detection/loopdet.html#Houston> (May 15, 2003).
Statsoft Pacific Pty Ltd. (2004). <http://www.statsoftinc.com/textbook/glosl.html> (Jan. 31,
2004).
Stephanedes, Y. J., Michalopoulos, P. G., and Plum, R. A. (1981) "Improved estimation of
traffic flow for real-time control." Transp. Res. Rec. 795, Transportation Research Board,
Washington, D.C., 28-39.
Sun, C., Arr, G., and Ramachandran, R. P. (2003), Vehicle re-identification as method for
deriving travel time and travel time distributions. Transp. Res. Rec. 1826, Transportation
Research Board, Washington, D. C., 25-31.
Sun, Z., Bebis, G., and Miller, R. (2002a), Quantized wavelet features and support vector
machines for on-road vehicle detection. IEEE Int. Conf. on Control, Automation, Robotics
and Vision, Singapore, < http://www.cs.unr.edu/~bebis/vehicleICARCV02.pdf > (Feb. 13,
2004).
Sun, Z., Bebis, G., and Miller, R. (2002b), On-road vehicle detection using gabor filters and
support vector machines. IEEE Int. Conf. on Digital Signal Processing, Santorini, Greece,
< http://www.cs.unr.edu/~bebis/vehicleDSP02.pdf> (Feb. 13, 2004).
Sun, C., Ritchie, S. G., and Tsai, K. (1998), Algorithm development for derivation of section
related measures of traffic system performance using inductive loop detectors. Transp.
Res. Rec. 1643, Transportation Research Board, Washington, D.C., 171-180.
Sun, C., Ritchie, S. G., Tsai, K., and Jayakrishnan, R. (1999), Use of vehicle signature analysis
and lexicographic optimization for vehicle re-identification on freeways. Transp.
Research C, 7, 167-185.
Takahashi, K., Inoue, T., Yokota, T., Kobayashi, Y., and Yamane, K. (1995), Measuring travel
time using pattern matching technique. Proc. of the 2
nd
World Congress on Intelligent
Transp. Systems (CD-ROM), Yokohama, Japan.
Talukder, A., and Casasent, D. (2001), A closed form neural network for discriminatory
feature extraction from high dimensional data. Neural Network, 14, 1201-1218.
237
Tarko, A., and Rouphail, N. M. (1993), Travel time data fusion in ADVANCE. In Proc. of
the 3
rd
Int. Conf. on Applications of Advanced Technologies in Transp. Engineering, ASCE,
New York, 36-42.
Taylor, B. N., Parker, W. H., and Langenberg, D. N. (1969), The fundamental constants and
quantum electrodynamics. in Reviews of modern physics monograph, Academic Press,
New York.
Texas Department of Transportation. (TxDOT). (2000).
<http://www.tongji.edu.cn/~yangdy/books/TransGuide.PDF> (Nov. 23, 2003).
Texas Department of Transportation. (TxDOT). (2003).
<http://www.transguide.dot.state.tx.us/docs/atms_info.html> (Nov. 23, 2003).
TexHwyMan. (2003). <http://home.att.net/~texhwyman/transgd.htm> (Nov. 23, 2003).
Thakuriah, P., Sen, A., Li, J., Liu, N., Koppelman, F. S., and Bhat, C. (1992), Data needs for
short term link travel time prediction. Advance Working Paper Series Number 19, Urban
Transp. Center, University of Illinois, Chicago.
Traffic Detector Handbook (1991), Second edition, Institute of Transportation Engineers,
McLean, Virginia.
Transguide Model Deployment Initiative Design Rep. and Transguide Technical Paper (2002),
Texas Department of Transportation,
<http://www.transguide.dot.state.tx.us/PublicInfo/papers.php>, (Nov. 1, 2003).
Travel Time Data Collection Handbook (1998), Rep. no. FHWA-PL-98-035, Texas
Transportation Institute, College Station, Texas.
Turner, S. M. (1996), Advanced techniques for travel time data collection. Transp. Res. Rec.
1551, Transportation Research Board, Washington, D.C., 51-58.
Turner, S. M., Albert, L., Gajewski, B., and Eisele, W. (2000), Archived ITS data quality:
Preliminary analysis of San Antonio Transguide data. Transp. Res. Rec. 1719,
Transportation Research Board, Washington, D. C., 77-84.
Turner, S. M., Eisele, W. L., Gajewsky, B. J., Albert, L. P., and Benz, R. J. (1999), ITS data
archiving: Case study analyses of San Antonio TransGuide data. Rep. No. FHWA A-PL-
99-024, Federal Highway Administration, Texas Transportation Institute, College Station,
Texas.
238
Turochy, R. E., and Smith, B. L. (2000), New procedure for data screening in traffic
management systems. Transp. Res. Rec. 1727, Transportation Research Board,
Washington, D. C., 127-131.
Valyon, J., and Horvath, G., A. (2002), Comparison of the SVM and LS-SVM regression,
from the viewpoint of parameter selection. IEEE Hungary Section Proc. Mini-Symposium,
<http://www.mit.bme.hu/events/minisy2002/ ValyonJozsef.pdf> (Feb. 2, 2004).
Van Aerde, M., and Yagar, S. (1983), Volume effects on speeds of 2-lane highways in
Ontario. Transp. Res. A, 17, 301-313.
Van Arem, B., Van der vlist, M. J. M., Muste, M. R., and Smulders, S. A. (1997), Travel time
estimation in the GERDIEN Project, Int. J. of Forecasting, 13, 73 - 85.
Van Lint, J. W. C., Hoogendoorn, S. P., and van Zuylen, H. J. (2000), Robust and adaptive
travel time prediction with neural networks. TRAIL Research School, Delft,
<vkk042.citg.tudelft.nl/.../staff/lint/papers/Robust%20and%20adaptive%20Travel%20Tim
e%20prediction.pdf> (Dec. 3, 2003).
Van Lint, J. W. C., Hoogendoorn, S. P., and van Zuylen, H. J. (2002), Freeway travel time
prediction with state space neural networks. Presented at the 81st TRB Annual Meeting
(CD-ROM), Transportation Research Board, Washington D. C.
Van Lint, J. W. C., Hoogendoorn, S. P., and van Zuylen, H. J. (2003), Towards a robust
framework for freeway travel time prediction: Experiments with simple imputation and
state space neural networks. Presented at the TRB 82
nd
Annual Meeting (CD-ROM),
Transportation Research Board, Washington D. C.
Van Lint, J. W. C., and van der Zijpp, N. J. (2003), An improved travel time estimation
algorithm using dual-loop detectors. Presented at the 82nd TRB Annual Meeting (CD-
ROM), Transportation Research Board, Washington D. C.
Vanajakshi, L., and Rilett, L. R. (2004a), A comparison of the performance of artificial neural
networks and support vector machines for the prediction of vehicle speed, Accepted for
IEEE Intelligent Vehicles Symposium, Parma, Italy.
Vanajakshi, L., and Rilett, L. R. (2004b), Loop detector data diagnostics based on conservation
of vehicle principle. Accepted for publication in Transp. Res. Rec., Transportation
Research Board, Washington, D.C.
Vanderplaats, G. N. (1984), Numerical optimization techniques for engineering design.
McGraw-Hill, Inc., New York.
239
Vapnik, V. N. (1998), Statistical learning theory. John Wiley and Sons, Inc., New York.
Venkataraman, P. (2001), Applied optimization with Matlab programming. John Wiley and
Sons, Inc., New York.
Wall, Z., and Dailey, D. J. (2003), An algorithm for the detection and correction of errors in
archived traffic data. Presented at the 82nd TRB Annual Meeting (CD-ROM),
Transportation Research Board, Washington D.C.
Wasserman, P. D. (1989), Neural computing: Theory and practice. Van Nostrand Reinhold,
New York.
Weigend, A. S., Huberman, B. A., and Rumelhart, D. E. (1992), Predicting sun pots and
exchange rates with connectionist networks. in Non Linear Modeling and Forecasting, M.
Casdagli and S. Eubank, eds. Addison Wesley, Menlo Park, California, 395-432.
Windover, J. R., and Cassidy, M. J. (2001), Some observed details of freeway traffic
evolution. Transp. Research A, 35, 881-894.
Wolfe, P. (1963), Methods of nonlinear programming. in Recent Advances in Mathematical
Programming, R. L. Graves and P. Wolfe, eds. Mc-Graw Hill, New York, 67-86.
Wolfe, P. (1967), Methods for linear constraints. in Nonlinear Programming, J. Abadie, ed.,
John Wiley & Sons, New York, 121-125.
Xiao, H., Sun, H., and Ran, B. (2003), The fuzzy-neural network traffic prediction framework
with wavelet decomposition. Presented at the 82
nd
TRB Annual Meeting (CD-ROM),
Transportation Research Board, Washington D.C.
Yasui, K., Ikenoue, K., and Takeuchi, H. (1995), Use of AVI information linked up with
detector output in travel time prediction and O-D flow estimation. Proc. of the 2
nd
World
Congress on Intelligent Transp. Systems (CD-ROM), Yokohama, Japan.
You, J., and Kim, T. J. (2000), Development of hybrid travel time forecasting model. Transp.
Research C, 8, 231-256.
Yuan, F., and Cheu, L. (2003), Incident detection using support vector machines. Transp.
Research C, 11, 309-328.
Yun, S. Y., Namkoong, S., Rho, J. H., Shin, S. W., and Choi, J. U. (1998), A performance
evaluation of neural network models in traffic volume forecasting. Mathematical and
Computer Modeling, 27(9-11), 293-310.
Zhang, G., Patuwo, E., and Hu, M. Y. (1998), Forecasting with artificial neural networks: The
state of the art. Int. J. of Forecasting, 14, 35-62.
240
Zhang, X., and Rice, J. (2003), Short term travel time prediction. Transp. Research C, 11,
187-210.
Zhang, X., Wang, Y., Nihan, N. L., and Hallenbeck, M. E. (2003), Development of a system to
collect loop detector event (individual vehicle) data. Proc. 80
th
TRB Annual Meeting
(CD-ROM), Transportation Research Board, Washington D.C.
Zhao, M., Garrick, N. W., and Achenie E. K. (1998), Data reconciliation based traffic count
analysis system. Transp. Res. Rec. 1625, Transportation Research Board, Washington, D.
C., 12-17.
Zhu, F. (2000), Locations of AVI system and travel time forecasting. Masters Thesis,
Department of Civil Engineering, Virginia Polytechnic Institute and State University,
Blacksburg, Virginia.
Zuylen, H. J., and Brantson, D. M. (1982), Consistent link flow estimation from counts.
Transp. Research B, 16, 473-476.
241
APPENDIX A
NOTATIONS
q - Flow in vehicles per unit time
D - Distance
T - Travel time
t
occ
- Occupancy time of detectors
t
on
- Instant of time the detector detects a vehicle
t
off
- Instant of time the vehicle exits the detector
v - Vehicle speed
n
L - Vehicle length
d
L - Detection zone length
O - Percent occupancy time
t - Time period
k - Density in vehicles per unit distance
v
L - Average vehicle length
- Step size
Q - Cumulative flow in vehicles
- Bias
w - Weight
v
f
- Free-flow speed
S
2
- Variance
242
APPENDIX B
GLOSSARY OF FREQUENTLY USED TERMS AND ACRONYMS
B.1 FREQUENTLY USED TERMS
Advanced Traffic Management System (ATMS): The location, usually centralized, where
intelligent transportation systems data are collected and the transportation system is monitored.
Advanced Traveler Information System (ATIS): The use of intelligent transportation systems
technologies and communication methods for providing information to motorists.
Artificial Neural Network (ANN): An information-processing structure whose design is
motivated by the design and functioning of human brains and components thereof.
Automatic Vehicle Identification (AVI): A system where probe vehicles equipped with
electronic toll tags communicate with roadside antennas to identify unique vehicles and collect
travel time data between the antenna locations.
Automatic Vehicle Location (AVL): An automatic vehicle location enables to remotely track
the location of a vehicle with the use of mobile radio receiver, GPS receiver, GPS modem, GPS
antenna etc.
Conservation of Vehicles Principle: The concept of conservation of vehicles states that the
difference between the number of vehicles entering and leaving a link during a specific time
interval corresponds to the change in the number of vehicles traveling on the link.
CORSIM: CORridor SIMulation software package developed by the Federal Highway
Administration (FHWA).
Density: A measure of the concentration of vehicles, stated as the number per unit distance per
lane.
243
Detector Failures: The occurrence of detector malfunctions including nonoperation, chattering,
or other intermittently erroneous detections.
Detectors: A system for indicating the presence or passage of vehicles.
Deterministic Model: A mathematical model that enables one to compute precisely what will
happen to one variable if a specified value is chosen for another variable. This model has no
random variables, and all entity interactions are defined by exact relationships (mathematical,
statistical, or logical).
Distance Measuring Instrument (DMI): An electronic device connected to the transmission of
a vehicle that can be used to determine travel time along a corridor based on the speed and
distance information.
Estimation: Calculation of traffic state variables, for the most recent period for which
measurements are available.
Extrapolation Method: Method to calculate travel time from detector data by dividing the
distance between the detectors by the speed obtained from the detectors.
Freeway Surveillance: Process or method of monitoring freeway traffic performance and
control system operation.
Generalized Reduced Gradient (GRG): A non-linear optimization technique, which can take
non-linear objective function and non-linear constraints into account.
Imputation: The process of calculating the missing detector data using techniques such as
interpolation.
Inductance: The property of an electric circuit whereby an electromotive force is generated by a
change of current.
244
Inductance Loop Detectors (ILD): Traffic monitoring technique, where wire loops buried
below road surface, detect vehicles as they cross the loop, due to change in inductance.
Intelligent Transportation Systems (ITS): Application of advanced technologies and
communication methods to the transportation sector to improve the efficiency or safety of a
surface transportation system.
Loop Detector Unit: An electronic device which is capable of energizing the sensor loops, of
monitoring the sensor loops inductance, and of responding to a pre-determined decrease in
inductance with an output which indicates the passage or presence of vehicles zone of detection.
Machine Learning: Machine learning involves adaptive mechanisms that enable computers to
learn from experience, learn by example and learn by analogy.
Macroscopic Model: Macroscopic models describe the behavior of the average vehicle driver
units in the traffic stream, based on the aggregate behavior of drivers.
Mean Absolute Difference (MAD): A statistical measure used to determine the difference
between two sets of data.
Mean Absolute Percentage Error (MAPE): A statistical measure used to determine the error
in a set of data in comparison with a correct set of data.
Microscopic Model: Microscopic flow models aim to describe the behavior of individual
vehicle driver units with respect to other vehicles in the traffic stream.
Occupancy: The proportion of time period a detector is occupied by vehicles (vehicles are
above the detectors).
Prediction/Forecasting: Calculation of future traffic state variables.
245
Probe Vehicles: Vehicles used for travel time data collection techniques in which the vehicles
travels along the corridor for the exclusive purpose of data collection and records travel time
data between points of interest.
Route Guidance System (RGS): The use of intelligent transportation systems technologies and
communication methods for guiding the vehicles to select the optimum route.
Stochastic Model: A model that uses a random process subjected to probability to formulate the
system.
Support Vector Machine (SVM): A recently developed pattern classification and regression
technique based on statistical learning theory.
Travel Time: Time to traverse a route between any two points of interest.
Validation: The process to determine whether a model provides an accurate representation of
the real-world system under study. It involves comparing the model output to generated
analytical solutions or to collected field data.
246
B.2 ACRONYMS
TABLE B.1. List of Frequently Used Acronyms
Acronym Title
ANN Artificial Neural Network
ATIS Advanced Traveler Information System
ATMS Advanced Traffic Management System
AVI Automatic Vehicle Identification
AVL Automatic Vehicle Location
CORSIM CORridor SIMulation
DMI Distance Measuring Instrument
FHWA Federal HighWays Administration
FRESIM FREeway SIMulation
GRG Generalized Reduced Gradient
HCM Highway Capacity Manual
ILD Inductance Loop Detector
ITS Intelligent Transportation System
MAD Mean Absolute Difference
MAPE Mean Absolute Percentage Error
NEMA National Electrical Manufacturers Association
NETSIM NETwork SIMulation
RGS Route Guidance System
SVM Support Vector Machine
SVR Support Vector Regression
TRB Transportation Research Board
TSIS Traffic Software Integration System
TMC Traffic Management Center
TransGuide Transportation Guidance System
WIM Weigh In Motion
247
APPENDIX C
MICROSCOPIC TRAFFIC SIMULATION
CORSIM INPUT FILES
.TRF File
ITRAF 2.0 00
1
1 1 5 7981 21 80600 7781 7581 2
72007200 3
60 4
5
8001 1 2 0 1 1 19
1 2 3 25000 1 1 19
2 3 4 25000 1 1 19
3 48002 25000 1 1 19
8001 1 11070 20
1 2 11070 20
2 3 11070 20
3 4 11070 20
8001 1 2 100 25
1 2 3 100 25
2 3 4 100 25
3 48002 100 25
2 3 1 10 6 15 2 1 28
3 4 1 10 6 15 2 2 28
3 4 12490 6 15 2 3 28
8001 11088 100 50
8001 1 0 001358 101430 301650 601760 90 1 53
10 0 60 20 1 0 64
1 2 3 67
0 170
1 0 8000 195
2 2500 8000 195
3 5000 8000 195
4 7500 8000 195
0 8 210
8001 11848 1201918 1501637 1801390 210 1 53
0 170
1 210
248
ITRAF 2.0 00
1
1 1 10 7981 21 80600 7781 7581 2
1800180018001800 3
60 4
5
8001 1 2 0 1 1 19
1 2 3 25000 1 1 19
2 3 4 25000 1 1 19
8003 5 4 1 1 1 19
5 4 6 5091 1 9 19
3 4 6 10300 1 1 19
4 68002 25000 1 91 100 1 19
8001 1 11070 20
1 2 11070 20
2 3 11070 20
8003 5 11070 20
5 4 11070 20
3 4 11070 20
4 6 11070 20
8001 1 2 100 25
1 2 3 100 25
2 3 4 100 25
8003 5 4 100 25
5 4 6 100 25
3 4 1 10 6 15 2 1 28
3 4 11020 6 15 2 2 28
4 6 11000 6 15 2 3 28
4 6 12490 6 15 2 4 28
5 4 1 400 6 15 2 5 28
8001 11783 100 50
8003 5 50 100 50
8001 1 0 001783 041953 102198 161976 222085 28 1 53
8003 5 0 00 130 04 150 10 170 16 190 22 210 28 1 53
10 0 60 20 1 0 64
1 2 3 4 5 67
0 70
1 0 8000 95
2 2500 8000 95
3 5000 8000 95
4 6030 8000 95
5 5930 7500 95
6 8530 8000 95
0 8 10
8001 11909 342113 402199 461980 521811 58 1 53
8003 5 250 34 270 40 300 46 310 52 300 58 1 53
0 70
0 8 10
8001 11877 642165 701868 761932 821770 88 1 53
8003 5 380 64 300 70 290 76 285 82 220 88 1 53
0 70
0 8 10
8001 11876 961836 1021770 1081767 1141836 120 1 53
8003 5 200 96 180 102 160 108 150 114 140 120 1 53
0 70
1 10
249
APPENDIX D
PROGRAMS DEVELOPED
MATLAB FILES
Data Retrieval
% This program read the data file for the selected day and make a new file with the details of the
%interested detectors alone. The input is the data file with the field data and the output will be the details
%of the selected detectors specified in the deal command.
clear;
[date, time, det, speed, vol, occ] = textread('PollServerFLaneData1_jan2002.txt','%s %s %s %s %s %s');
str1 = strrep(speed,'Speed=',''); %remove the letters
str2 = strrep(vol,'Vol=','');
str3 = strrep(occ,'Occ=','');
[str{1:10}] = deal('L1-0035S-163.421', 'L2-0035S-163.421', 'L3-0035S-163.421', 'EX1-0035S-
163.328','L1-0035S-162.899', 'L2-0035S-162.899', 'L3-0035S-162.899','L1-0035S-162.482', 'L2-0035S-
162.482', 'L3-0035S-162.482');
for m=(1:10)
itn(m,1) = 0;
end
for m = 1:10
[str4] = ('jan2002');
tempstr1 = char(str(m));
tempstr2 = char(str4);
str5 = [tempstr1(end-2:end), '_', tempstr2,'_', tempstr1(1:2),'.txt'];
fid(m) = fopen(char(str5),'w');
for n=1:length(date)
if (strmatch(str(m), det(n),'exact') ) %checking for match
itn(m,1) = itn(m,1)+1
str4 = hour(time(n))*3600+minute(time(n))*60+second(time(n));
fprintf(fid(m),'%s\t %f\t %s\t %s\t %s\t %s\n', date{n}, str4, det{n},str1{n},str2{n},str3{n});
end
end
fclose(fid(m));
end
250
Data Averaging
% averages 20 sec actual volume for 2 minute intervals.
clear;
[str{1:15}] = deal( '500_feb11N_L1.txt','500_feb11N_L2.txt','500_feb11N_L3.txt','998_feb11N_L1.txt',...
'998_feb11N_L2.txt','998_feb11N_L3.txt','504_feb11N_L1.txt','504_feb11N_L2.txt','504_feb11N_L3.txt',
..
'892_feb11N_L1.txt','892_feb11N_L2.txt','892_feb11N_L3.txt','405_feb11N_L1.txt','405_feb11N_L2.txt','
405_feb11N_L3.txt');
for zz=1:15
[date, time, detector, speed1, vol1, occ1] = textread(char(str(zz)),'%s %s %s %s %s %s');
t = str2num(char(time));
speed =str2num(char(speed1));
length(speed)
vol = str2num(char(vol1));
occ = str2num(char(occ1));
if (vol(1) == 0)
vol(1) = 1;
end
if (occ(1) == 0)
occ(1) =1;
end
% check for unreasonable combinations and threshold values
for i=1:11
itn(i) = 0;
end
for n = 2:length(t)
if(speed(n) > 0 && speed(n) < 100 && vol(n) > 0 && vol(n) <= 17 && occ(n) >0 && occ(n) < 90)
itn(4) = itn(4) +1;
end
if(speed(n) == 0 && vol(n) == 0 && occ(n) == 0)
itn(5) = itn(5) +1;
end
if(vol(n) > 17)
itn(1) = itn(1) +1;
vol(n)= vol(n-1);
end
if(speed(n) > 100)
itn(2) = itn(2) +1;
speed(n) = (speed(n-1));
end
251
if(occ(n)>90)
itn(3) = itn(3) +1;
occ(n) = occ(n-1);
end
if(speed(n) == 0 && vol(n) ~= 0 && occ(n) ~= 0)
itn(6) = itn(6) +1;
speed(n) = speed(n-1);
end
if(speed(n) ~= 0&& vol(n) ==0 && occ(n) ~= 0)
itn(7) = itn(7) +1;
vol(n) = vol(n-1);
end
if(speed(n) ~= 0 && vol(n) ~= 0 && occ(n) ==0)
itn(8) = itn(8) +1;
occ(n) = occ(n-1);
end
if(speed(n) == 0 && vol(n) == 0 && occ(n) ~= 0)
itn(9) = itn(9) +1;
speed(n) = speed(n-1);
vol(n) = vol(n-1);
end
if(speed(n)~=-1)
if(speed(n) ~= 0 && vol(n) == 0 && occ(n) == 0)
itn(10) = itn(10) +1;
vol(n) = vol(n-1);
occ(n) = occ(n-1);
end
end
if(speed(n) == 0 && vol(n) ~= 0 && occ(n) ==0)
itn(11) = itn(11) +1;
occ(n) = occ(n-1);
speed(n) = speed(n-1);
end
check(n,:) = itn;
if(n>2)
if(check(n,:) == check(n-1,:))
fprintf('none of the above at %d\n',n);
end
end
end
itn
%cumulate to 2 mts
start_t(1) = 0; %data collection started at time 0
i=1;
j=1;
252
for n = 1:length(t)
end_t(i) = start_t(i) + 120;
if t(n) >= end_t(i) % 1st data after 15 mt interval
if (t(n)-end_t(i) <20 ) %within +30
time_2mt(i,1) = t(n);
vol_2mt(i,1)= mean(vol(j:n))*6;%vol is the sum for all 6 - 20 sec intervals in the 2mt interval
if(nnz(speed(j:n)) == 0)
speed_2mt(i,1) = 0;
else
speed_2mt(i,1) = sum(speed(j:n))/nnz(speed(j:n));%average of all non zero speeds
end
occ_2mt(i,1) = sum(occ(j:n))/6;%occupancy calculated for the 2 mt from the 20 sec
%occ is percentage value and hence each number to be
%multiplied by 20 and divide by 100 to get the actual time
%occupied. then sum it up and divide by 120 and make it
%percent. the whole calculation comes out as divide by 6.
start_t(i+1) = end_t(i);
i = i+1;
j = n;
continue
end
if(n~=1)
if (abs(end_t(i)-t(n-1))<20) % within one time step
time_2mt(i,1) = t(n-1);
vol_2mt(i,1)= mean(vol(j:n))*6;%vol is the sum for all 6 - 20 sec intervals in the 2mt interval
if(nnz(speed(j:n)) == 0)
speed_2mt(i,1) = 0;
else
speed_2mt(i,1) = sum(speed(j:n))/nnz(speed(j:n));%average of all non zero speeds
end
occ_2mt(i,1) = sum(occ(j:n))/6;
start_t(i+1) = end_t(i);
n=n-1;
i = i+1;
j = n;
continue
end
end
if t(n) >= end_t(i) + 120 %if the time is more than 4 mt interval
x = (t(n)-end_t(i))/120;
y = round(x);
if(vol(n-1,1)>0 & vol(n,1) > 2*vol(n-1,1))
for (z=i:i+y)
time_2mt(z,1) = end_t(z);
vol_2mt(z,1) = vol(n,1)/y;
speed_2mt(z,1) = (speed(n,1)+speed(n-1,1))/2;
occ_2mt(z,1) = (occ(n,1)+occ(n-1,1))/2;
start_t(z+1) = end_t(z);
end_t(z+1) = start_t(z+1) + 120;
end
end
253
for (z=i:(i+y))
time_2mt(z,1) = end_t(z);
vol_2mt(z,1) = ((vol(n-1,1)+vol(n,1))/2)*6;
speed_2mt(z,1) = (speed(n-1,1)+speed(n,1))/2;
occ_2mt(z,1) = (occ(n-1,1)+occ(n,1))/2;
start_t(z+1) = end_t(z);
end_t(z+1) = start_t(z+1) + 120;
end
end_t(z+1) = 0;
i=z+1;
j=n;
continue
end
time_2mt(i,1) = t(n); % otherwise
vol_2mt(i,1)= mean(vol(j:n))*6;%vol is the sum for all 6 - 20 sec intervals in the 2mt interval
if(nnz(speed(j:n)) == 0)
speed_2mt(i,1) = 0;
else
speed_2mt(i,1) = sum(speed(j:n))/nnz(speed(j:n));%average of all non zero speeds
end
occ_2mt(i,1) = sum(occ(j:n))/6;
start_t(i+1) = end_t(i);
i = i+1;
j = n;
end
end
tempstr = char(str{zz});
str1 = [tempstr(1:end-4),'_2mt.txt'];
fid = fopen(char(str1),'w');
for k=1:length(vol_2mt)
fprintf(fid,'%f\t %f\t %6.2f\t %12.8f\t %f\t %f\n',start_t(k), end_t(k), time_2mt(k), speed_2mt(k),
vol_2mt(k), occ_2mt(k));
end
fclose(fid);
end %for zz loop
254
AVI Data
% File to get travel time from AVI data file between two selected points. Input is the tag data of only AVI
% stations . In this example AVI number 142 and 144 sorted based on vehicle ID and then on avi station
% number is given as input and the travel time of vehicles is obtained as output.
clear;
[AVInum, vehid, time1, date] = textread(char('feb11_avi.txt'),'%s %s %s %s ', 'whitespace','\t');
AVI1 = (char(AVInum));
AVI = str2num(AVI1);
t = strrep(time1,'&',''); %remove the $ from time
for n = 1:length(date)
n
time(n) = hour(t(n))*3600+minute(t(n))*60+second(t(n));
end
fid = fopen('avi_tt.txt','w');
for n = 2:length(date)
if ((AVI (n) == 144) && (AVI(n-1) == 142))% for loops 159-164
b(n)=1;
if (strmatch(vehid(n), vehid(n-1),'exact'))
a(n)=1;
if (a(n)==1 && b(n) == 1)
tt(n) = time(n)-time(n-1);
if (tt(n) >0 && tt(n) < 1800) %assuming a 10mph min speed
n
fprintf(fid,'%s\t %s\t %f\t %s\t %f\t %f\n', vehid{n}, t{n-1}, time(n-1), t{n}, time(n), tt(n)
);
end
end
end
end
end
fclose(fid);
255
Optimization
%The program for optimizing three detectors data. Input is the cumulative flow at three consecutive
%detectors and the out is the corresponding optimized values.
clear
format compact
format short e
%****************************************************************
%* define analytical functions
%* remember to use vectors for g and h if more than one of them
%* and modify code
%**************************************************************
syms f g1 g2 g3 g4 g5 g6 g7 cl1i cl2i cl3i cl1j cl2j cl3j x1 x2 x3 x4 x5 x6 x7
syms gradcl1i gradcl2i gradcl3i gradcl1j gradcl2j gradcl3j
syms gradx1 gradx2 gradx3 gradx4 gradx5 gradx6 gradx7
syms h1 h1cl1i h1cl2i h1cl3i h1cl1j h1cl2j h1cl3j h1x1 h1x2 h1x3 h1x4 h1x5 h1x6 h1x7
syms h2 h2cl1i h2cl2i h2cl3i h2cl1j h2cl2j h2cl3j h2x1 h2x2 h2x3 h2x4 h2x5 h2x6 h2x7
syms h3 h3cl1i h3cl2i h3cl3i h3cl1j h3cl2j h3cl3j h3x1 h3x2 h3x3 h3x4 h3x5 h3x6 h3x7
syms h4 h4cl1i h4cl2i h4cl3i h4cl1j h4cl2j h4cl3j h4x1 h4x2 h4x3 h4x4 h4x5 h4x6 h4x7
syms h5 h5cl1i h5cl2i h5cl3i h5cl1j h5cl2j h5cl3j h5x1 h5x2 h5x3 h5x4 h5x5 h5x6 h5x7
syms h6 h6cl1i h6cl2i h6cl3i h6cl1j h6cl2j h6cl3j h6x1 h6x2 h6x3 h6x4 h6x5 h6x6 h6x7
syms h7 h7cl1i h7cl2i h7cl3i h7cl1j h7cl2j h8cl3j h7x1 h7x2 h7x3 h7x4 h7x5 h7x6 h7x7
% the functions
f = ((cl1j-cl2j)^2 + (cl2j-cl3j)^2);
g1 = cl1j-cl2j;
h1 = g1-x1;
g2 = cl1j-cl2j-500;
h2 = g2 + x2;
g3 = cl2j-cl3j;
h3 = g3-x3;
g4 = cl2j-cl3j-500;
h4 = g4 + x4;
g5 = cl1j - cl1i;
h5 = g5 - x5;
g6 = cl2j - cl2i;
h6 = g6 - x6;
g7 = cl3j - cl3i;
h7 = g7 - x7;
%*****************************************************************
% input the design vector
load 'data.txt'
data1 = data(:,1); %4th column is the L1 cum volume
data2 = data(:,2); % L2 cumu vol
data3 = data(:,3); %L3 cumu vol
count = 0;
256
count1 = 0;
for n = 1:length(data)
if ( ((data1(n,1)-data2(n,1))>0) && ((data1(n,1)-data2(n,1))<500) && ((data2(n,1)-data3(n,1))>0) &&
((data2(n,1)-data3(n,1))<500))
status(n) = 1;
else
status(n) = 0;
end
end
check = min(status);
for n = 1:length(data)
if(check == 1)
fprintf('no need for optimization, the data is good:-)\n');
break
end
for(i=1:13)
xn(1,i) = -1;
end
fn(1) = 1;
flag = 1;
itn = 1;
diffF(1) = 1;
threshold = -1e-4;
threshold1 = 1e-4;
while (itn <40 & fn(itn) > threshold1 & min(xn(itn,:)) <threshold)
if (flag == 1 & n==1)
xs = [data1(n) data2(n) data3(n) 0 0 0];
elseif (flag == 1 & n>1)
xs = [data1(n) data2(n) data3(n) optmddata(n-1,1) optmddata(n-1,2) optmddata(n-1,3)];
elseif (flag~=1 & n>1)
xs = [xn(itn,1) xn(itn,2) xn(itn,3) optmddata(n-1,1) optmddata(n-1,2) optmddata(n-1,3)];
else
xs = [xn(itn,1) xn(itn,2) xn(itn,3) 0 0 0];
end
n
itn
if(itn > 1)
if (diffF(itn-1) < threshold1 & diffF(itn) < threshold1)
break
end
end
xs(7) = subs(g1,{cl1j,cl2j},{xs(1),xs(2)});
xs(8) = -subs(g2,{cl1j,cl2j},{xs(1),xs(2)});
257
xs(9) = subs(g3,{cl2j,cl3j},{xs(2),xs(3)});
xs(10) = -subs(g4,{cl2j,cl3j},{xs(2),xs(3)});
xs(11) = subs(g5,{cl1j,cl1i},{xs(1),xs(4)});
xs(12) = subs(g6,{cl2j,cl2i},{xs(2),xs(5)});
xs(13) = subs(g7,{cl3j,cl3i},{xs(3),xs(6)});
%fprintf('\nThe start design vector [%10.4f %10.4f %10.4f %10.4f %10.4f %10.4f %10.4f %10.4f
%10.4f %10.4f %10.4f %10.4f %10.4f ]\n',xs);
% the gradients
gradcl1j = diff(f,cl1j);
gradcl2j = diff(f,cl2j);
gradcl3j = diff(f,cl3j);
h1cl1j = diff(h1,cl1j);
h1cl2j = diff(h1,cl2j);
h1x1 = diff(h1,x1);
h2cl1j = diff(h2,cl1j);
h2cl2j = diff(h2,cl2j);
h2x2 = diff(h2,x2);
h3cl2j = diff(h3,cl2j);
h3cl3j = diff(h3,cl3j);
h3x3 = diff(h3,x3);
h4cl2j = diff(h4,cl2j);
h4cl3j = diff(h4,cl3j);
h4x4 = diff(h4,x4);
h5cl1j = diff(h5,cl1j);
h5cl1i = diff(h5,cl1i);
h5x5 = diff(h5,x5);
h6cl2j = diff(h6,cl2j);
h6cl2i = diff(h6,cl2i);
h6x6 = diff(h6,x6);
h7cl3j = diff(h7,cl3j);
h7cl3i = diff(h7,cl3i);
h7x7 = diff(h7,x7);
% evaluate the function, gradients , and hessian at the current design
fn(1) = double(subs(f,{cl1j,cl2j,cl3j},{xs(1),xs(2),xs(3)}));
g1v = double(subs(g1,{cl1j,cl2j},{xs(1),xs(2)}));
h1v = double(subs(h1,{cl1j,cl2j,x1},{xs(1),xs(2),xs(7)}));
g2v = double(subs(g2,{cl1j,cl2j},{xs(1),xs(2)}));
h2v = double(subs(h2,{cl1j,cl2j,x2},{xs(1),xs(2),xs(8)}));
g3v = double(subs(g3,{cl2j,cl3j},{xs(2),xs(3)}));
h3v = double(subs(h3,{cl2j,cl3j,x3},{xs(2),xs(3),xs(9)}));
g4v = double(subs(g4,{cl2j,cl3j},{xs(2),xs(3)}));
h4v = double(subs(h4,{cl2j,cl3j,x4},{xs(2),xs(3),xs(10)}));
258
g5v = double(subs(g5,{cl1j,cl1i},{xs(1),xs(4)}));
h5v = double(subs(h5,{cl1j,cl1i,x5},{xs(1),xs(4),xs(11)}));
g6v = double(subs(g6,{cl2j,cl2i},{xs(2),xs(5)}));
h6v = double(subs(h6,{cl2j,cl2i,x6},{xs(2),xs(5),xs(12)}));
g7v = double(subs(g7,{cl3j,cl3i},{xs(3),xs(6)}));
h7v = double(subs(h7,{cl3j,cl3i,x7},{xs(3),xs(6),xs(13)}));
%fprintf('\n start function and constraints(f h1 h2 h3 h4 h5 h6 h7):\n '),disp([fn(1) h1v h2v h3v h4v
h5v h6v h7v])
dfcl1j = double(subs(gradcl1j,{cl1j,cl2j,cl3j},{xs(1),xs(2),xs(3)}));
dfcl2j = double(subs(gradcl2j,{cl1j,cl2j,cl3j},{xs(1),xs(2),xs(3)}));
dfcl3j = double(subs(gradcl3j,{cl1j,cl2j,cl3j},{xs(1),xs(2),xs(3)}));
dfcl1i =0;
dfcl2i =0;
dfcl3i =0;
dfx1 = 0;
dfx2 = 0;
dfx3 = 0;
dfx4 = 0;
dfx5 = 0;
dfx6 = 0;
dfx7 = 0;
dh1cl1j = double(subs(h1cl1j,{cl1j,cl2j,cl3j,x1},{xs(1),xs(2),xs(3),xs(7)}));
dh1cl2j = double(subs(h1cl2j,{cl1j,cl2j,cl3j,x1},{xs(1),xs(2),xs(3),xs(7)}));
dh1x1 = double(subs(h1x1,{cl1j,cl2j,cl3j,x1},{xs(1),xs(2),xs(3),xs(7)}));
dh1cl3j =0;
dh1cl1i =0;
dh1cl2i =0;
dh1cl3i =0;
dh1x2 =0;
dh1x3 =0;
dh1x4 =0;
dh1x5 =0;
dh1x6 =0;
dh1x7 =0;
dh2cl1j = double(subs(h2cl1j,{cl1j,cl2j,cl3j,x2},{xs(1),xs(2),xs(3),xs(8)}));
dh2cl2j = double(subs(h2cl2j,{cl1j,cl2j,cl3j,x2},{xs(1),xs(2),xs(3),xs(8)}));
dh2x2 = double(subs(h2x2,{cl1j,cl2j,cl3j,x2},{xs(1),xs(2),xs(3),xs(8)}));
dh2cl3j =0;
dh2cl1i =0;
dh2cl2i =0;
dh2cl3i =0;
dh2x1 =0;
dh2x3 =0;
dh2x4 =0;
dh2x5 =0;
dh2x6 =0;
dh2x7 =0;
dh3cl2j = double(subs(h3cl2j,{cl1j,cl2j,cl3j,x3},{xs(1),xs(2),xs(3),xs(9)}));
259
dh3cl3j = double(subs(h3cl3j,{cl1j,cl2j,cl3j,x3},{xs(1),xs(2),xs(3),xs(9)}));
dh3x3 = double(subs(h3x3,{cl1j,cl2j,cl3j,x3},{xs(1),xs(2),xs(3),xs(9)}));
dh3cl1j =0;
dh3cl1i =0;
dh3cl2i =0;
dh3cl3i =0;
dh3x2 =0;
dh3x1 =0;
dh3x4 =0;
dh3x5 =0;
dh3x6 =0;
dh3x7 =0;
dh4cl2j = double(subs(h4cl2j,{cl1j,cl2j,cl3j,x4},{xs(1),xs(2),xs(3),xs(10)}));
dh4cl3j = double(subs(h4cl3j,{cl1j,cl2j,cl3j,x4},{xs(1),xs(2),xs(3),xs(10)}));
dh4x4 = double(subs(h4x4,{cl1j,cl2j,cl3j,x4},{xs(1),xs(2),xs(3),xs(10)}));
dh4cl1j =0;
dh4cl1i =0;
dh4cl2i =0;
dh4cl3i =0;
dh4x2 =0;
dh4x3 =0;
dh4x1 =0;
dh4x5 =0;
dh4x6 =0;
dh4x7 =0;
dh5cl1j = double(subs(h5cl1j,{cl1j,cl1i,x5},{xs(1),xs(4),xs(11)}));
dh5cl1i = double(subs(h5cl1i,{cl1j,cl1i,x5},{xs(1),xs(4),xs(11)}));
dh5x5 = double(subs(h5x5,{cl1j,cl1i,x5},{xs(1),xs(4),xs(11)}));
dh5cl2j =0;
dh5cl3j =0;
dh5cl2i =0;
dh5cl3i =0;
dh5x2 =0;
dh5x3 =0;
dh5x4 =0;
dh5x1 =0;
dh5x6 =0;
dh5x7 =0;
dh6cl2j = double(subs(h6cl2j,{cl2j,cl2i,x6},{xs(2),xs(5),xs(12)}));
dh6cl2i = double(subs(h6cl2i,{cl2j,cl2i,x6},{xs(2),xs(5),xs(12)}));
dh6x6 = double(subs(h6x6,{cl2j,cl2i,x6},{xs(2),xs(5),xs(12)}));
dh6cl1j =0;
dh6cl3j =0;
dh6cl1i =0;
dh6cl3i =0;
dh6x2 =0;
dh6x3 =0;
dh6x4 =0;
dh6x5 =0;
dh6x1 =0;
260
dh6x7 =0;
dh7cl3j = double(subs(h7cl3j,{cl3j,cl3i,x7},{xs(3),xs(6),xs(13)}));
dh7cl3i = double(subs(h7cl3i,{cl3j,cl3i,x7},{xs(3),xs(6),xs(13)}));
dh7x7 = double(subs(h7x7,{cl3j,cl3i,x7},{xs(3),xs(6),xs(13)}));
dh7cl1j =0;
dh7cl2j =0;
dh7cl1i =0;
dh7cl2i =0;
dh7x2 =0;
dh7x3 =0;
dh7x4 =0;
dh7x5 =0;
dh7x6 =0;
dh7x1 =0;
%matrix A and B
A = [dh1cl1j dh1cl2j dh1cl3j dh1cl1i dh1cl2i dh1cl3i; dh2cl1j dh2cl2j dh2cl3j dh2cl1i dh2cl2i
dh2cl3i;
dh3cl1j dh3cl2j dh3cl3j dh3cl1i dh3cl2i dh3cl3i; dh4cl1j dh4cl2j dh4cl3j dh4cl1i dh4cl2i dh4cl3i;
dh5cl1j dh5cl2j dh5cl3j dh5cl1i dh5cl2i dh5cl3i; dh6cl1j dh6cl2j dh6cl3j dh6cl1i dh6cl2i dh6cl3i;
dh7cl1j dh7cl2j dh7cl3j dh7cl1i dh7cl2i dh7cl3i];
B = [dh1x1 dh1x2 dh1x3 dh1x4 dh1x5 dh1x6 dh1x7; dh2x1 dh2x2 dh2x3 dh2x4 dh2x5 dh2x6
dh2x7;
dh3x1 dh3x2 dh3x3 dh3x4 dh3x5 dh3x6 dh3x7; dh4x1 dh4x2 dh4x3 dh4x4 dh4x5 dh4x6 dh4x7;
dh5x1 dh5x2 dh5x3 dh5x4 dh5x5 dh5x6 dh5x7; dh6x1 dh6x2 dh6x3 dh6x4 dh6x5 dh6x6 dh6x7;
dh7x1 dh7x2 dh7x3 dh7x4 dh7x5 dh7x6 dh7x7];
C = inv(B)*A;
Gr1 = ([dfcl1j;dfcl2j;dfcl3j;dfcl1i; dfcl2i;dfcl3i] - C'*[dfx1; dfx2; dfx3; dfx4; dfx5; dfx6; dfx7]);
S1 = -Gr1;
alpha = 0;
for jj = 1:3
if (jj < 3)
%string1 = ['\nInput the stepsize for evaluation.\n'] ;
%alpha = input(string1)
alpha = alpha + 0.05;
aa(jj+1) = alpha;
end
%******************
% for a given stepsize - Y calculation
%***************
dz1 = S1*alpha;
xn1 = xs(1) + dz1(1);
xn2 = xs(2) + dz1(2);
xn3 = xs(3) + dz1(3);
xn4 = xs(4);%+ dz1(4);
xn5 = xs(5);%+ dz1(5);
261
xn6 = xs(6);%+ dz1(6);
dy1 = -C*dz1;
yn1 = xs(7);
yn2 = xs(8);
yn3 = xs(9);
yn4 = xs(10);
yn5 = xs(11);
yn6 = xs(12);
yn7 = xs(13);
for i = 1: 40
yn1 = yn1 + dy1(1);
yn2 = yn2 + dy1(2);
yn3 = yn3 + dy1(3);
yn4 = yn4 + dy1(4);
yn5 = yn5 + dy1(5);
yn6 = yn6 + dy1(6);
yn7 = yn7 + dy1(7);
xxn=[xn1 xn2 xn3 xn4 xn5 xn6 yn1 yn2 yn3 yn4 yn5 yn6 yn7];
h1n = double(subs(h1,{cl1j,cl2j,x1},{xxn(1),xxn(2),xxn(7)}));
h2n = double(subs(h2,{cl1j,cl2j,x2},{xxn(1),xxn(2),xxn(8)}));
h3n = double(subs(h3,{cl2j,cl3j,x3},{xxn(2),xxn(3),xxn(9)}));
h4n = double(subs(h4,{cl2j,cl3j,x4},{xxn(2),xxn(3),xxn(10)}));
h5n = double(subs(h5,{cl1j,cl1i,x5},{xxn(1),xxn(4),xxn(11)}));
h6n = double(subs(h6,{cl2j,cl2i,x6},{xxn(2),xxn(5),xxn(12)}));
h7n = double(subs(h7,{cl3j,cl3i,x7},{xxn(3),xxn(6),xxn(13)}));
hsq = h1n*h1n + h2n*h2n + h3n*h3n + h4n*h4n + h5n*h5n + h6n*h6n + h7n*h7n;
if hsq <= 1.0e-08
break
else
dy1 = inv(B)*[-h1n -h2n -h3n -h4n -h5n -h6n -h7n]';
end
end
%fprintf('\nNo. of iterations of dy for same dz, alpha and constraint error: '),disp(i),disp([alpha
hsq]);
%fprintf('\n improved design vector: '),disp(xxn)
fn(itn+1) = double(subs(f,{cl1j,cl2j,cl3j},{xxn(1),xxn(2),xxn(3)}));
%fprintf('\n improved function and constraints (f h1 h2 h3 h4 h5 h6 h7)\n '),disp([fn(itn+1) h1n
h2n h3n h4n h5n h6n h7n])
flag = 2;
ff(jj+1)=fn(itn+1);
if (jj == 2)
aa(1) = 0; rhs(1) = fn(1);
amat = [1 0 0; 1 aa(2) aa(2)^2; 1 aa(3) aa(3)^2];
262
rhs=[fn(1) ff(2) ff(3)]';
xval = inv(amat)*rhs;
alpha = -xval(2)/(2*xval(3));
%alpha = .05; %0.25
end
end % jj loop
%fprintf('\n improved design vector: '),disp(xxn)
%xxn(4) = xs(4);
%xxn(5) = xs(5);
%xxn(6) = xs(6);
xn(itn+1,:) = xxn;
F(1) = fn(1);
F(itn+1) = fn(itn+1);
diffF(itn+1) = abs(F(itn+1) - F(itn));
itn = itn +1;
end % while loop
if (min(xn(itn,:)) < 0)
count1 = count1 + 1;
fid1 = fopen('infeasible1.txt','a');
fprintf(fid1,'%d %d\n', count1,n);
fclose(fid1);
end
%fprintf('\n final design vector: '),disp(xn(itn,:))
%fprintf('\n final function and constraints (f h1 h2 h3 h4 h5 h6)\n '),disp([fn(itn) h1n h2n h3n h4n h5n
h6n])
newx(n,:) = xn(itn,:);
for k = 1:3
optmddata(n,k) = newx(n,k);
end
save feb10th_345with50_optmd.txt optmddata -ascii
end % for loop for number of data
%optmddata
%plot(optmddata)
263
N-D Model
% Program to calculate travel time based on Nam and Drew model. The input is the flow values at
% consecutive points and the output will be the travel time.
clear;
format compact
format short e
load 'voldataformodel.txt' % read data
vol1 = voldataformodel(:,1); % speed
vol2 = voldataformodel(:,2);
vol3 = voldataformodel(:,3);
delta_x1 = .522; % given that section is ~~ .5 miles
delta_x2 = .417;
delta_t =2; % choose based on delta_x/free_flow_speed relation
%cumu = 4; % 30 sec data to be cumulated to 2 mt data. hence 4 set has to be added
no_in_start1 =-2; % not known. trial and error and choose the best number
no_in_start2 = -4;
for i=1:length(voldataformodel)
%act_vol(i,:)=temp((i-1)*2+1:i*2)'; %given data
act_vol(i,1) = vol1(i);
act_vol(i,2) = vol2(i);
act_vol(i,3) = vol3(i);
end
figure
plot(act_vol) % to check any unreasonable data
title('actual volume')
for i=1:length(voldataformodel)
vol(i,:)= act_vol(i,:);
end
figure
plot(vol)
title('aggregated actual volume')
for i = 1:length(voldataformodel)
q(i,:) = vol(i,:)/delta_t;%number per unit time
end
figure
plot(q)
title('q')
cum_vol(1,1) = q(1,1);
cum_vol(1,2)=(no_in_start1/2) + q(1,2);%cumulated q
cum_vol(1,3) = (no_in_start2/2) + q(1,3);
264
for i=2:length(voldataformodel)
cum_vol(i,:)=q(i,:) + cum_vol(i-1,:);
end
figure
plot(cum_vol)
title('cumulated q')
for i=1:length(voldataformodel)
Q(i,:) = cum_vol(i,:)*delta_t;%cumulated volume(not per unit time)
end
figure
plot(Q)
title('Q')
for i= 1:length(voldataformodel)
no_in_link1(i,1) = Q(i,1)-Q(i,2);
no_in_link2(i,1) = Q(i,2)-Q(i,3);
density(i,1) = no_in_link1(i,1)/delta_x1;
density(i,2)= no_in_link2(i,1)/delta_x2;
end
figure
plot(density)
title('density')
m(1,1) = Q(1,2);
m(1,2) = Q(1,3);
m_percent(1,1) = m(1,1)/vol(1,1);
m_percent(1,2) = m(1,2)/vol(1,2);
for i=2:length(voldataformodel)
m(i,1)=Q(i,2)-Q(i-1,1);
m(i,2) = Q(i,3) - Q(i-1,2);
m_percent(i,1)= m(i,1)/vol(i,1);
m_percent(i,2)= m(i,2)/vol(i,2);
end
figure
plot(m(:,1))
hold on
plot(m(:,2),'-r')
title('m')
for i =2:length(voldataformodel)
if (m(i,1)>0)
%tt_mts(i,1) = (delta_x1/2)*((q(i,1)*density(i-1,1))+(q(i,2)*density(i,1)))/(q(i,1)*q(i,2)); %drew's
original
tt_mts(i,1) = m_percent(i,1)*((delta_x1/2)*((q(i,1)*density(i-
1,1))+(q(i,2)*density(i,1))))/(q(i,1)*q(i,2)) + (1-m_percent(i,1))*((delta_x1/2)*((density(i-
1,1)+density(i,1))/q(i,2)));
else
tt_mts(i,1) = ((delta_x1/2)*((density(i-1,1)+density(i,1))/q(i,2))); % same eqn can be written as the
next line
265
%tt_mts(i,1) = ((2*(density(i-1,1)*delta_x))+((q(i,1)-q(i,2))*delta_t))/(2*q(i,2));
end
if(m(i,2) >0)
%tt_mts(i,2) = (delta_x2/2)*((q(i,2)*density(i-1,2))+(q(i,3)*density(i,2)))/(q(i,3)*q(i,2));%drew's
original
tt_mts(i,2) = m_percent(i,2)*((delta_x2/2)*((q(i,2)*density(i-
1,2))+(q(i,3)*density(i,2))))/(q(i,2)*q(i,3)) + (1-m_percent(i,2))*((delta_x2/2)*((density(i-
1,2)+density(i,2))/q(i,3)));
else
tt_mts(i,2) = ((delta_x2/2)*((density(i-1,2)+density(i,2))/q(i,3))); % same eqn can be written as the
next line
%tt_mts(i,2) = ((2*(density(i-1,2)*delta_x))+((q(i,2)-q(i,3))*delta_t))/(2*q(i,3));
end
end
% smoothing of the data
alpha1 = 0.3; % can be varied, do trial and error
alpha2=0.3;
tt_smoothed(2,1) = tt_mts(2,1);
tt_smoothed(2,2) = tt_mts(2,2);
for n=3:length(voldataformodel)
tt_smoothed(n,1) = alpha1*tt_mts(n,1)+(1-alpha1)*tt_smoothed(n-1,1);
tt_smoothed(n,2) = alpha2*tt_mts(n,2)+(1-alpha2)*tt_smoothed(n-1,2);
end
%for i =2:length(dataformodel)/cumu,
% tt_t(i,1) = ((delta_x/2)*((q(i,1)*density(i-1,1))+(q(i,2)*density(i,1))))/(q(i,1)*q(i,2));
%end
figure
plot(tt_mts(:,1)*60,'-g')
hold on
plot( tt_mts(:,2)*60,'-b')
title('travel time in seconds')
figure
plot(tt_smoothed(:,1)*60,'-r')
hold on
plot(tt_smoothed(:,2)*60, '-y')
title('smoothed travel time')
save tt_model.txt tt_mts -ascii
save tt_model_smoothed.txt tt_smoothed -ascii
266
Travel Time Estimation
% Program to calculate travel time based on the model proposed in this dissertation. Input includes the
%flow, speed and density at consecutive points and the travel time will be the output.
clear;
format compact
format short e
load 'voldataformodel.txt' % read data
load 'densitydataformodel.txt'
load 'speeddataformodel.txt' % read data
speed1 = speeddataformodel(:,1); % speed
speed2 = speeddataformodel(:,2);
speed3 = speeddataformodel(:,3);
speed4 = speeddataformodel(:,4);
speed5 = speeddataformodel(:,5);
vol1 = voldataformodel(:,1); % speed
vol2 = voldataformodel(:,2);
vol3 = voldataformodel(:,3);
vol4 = voldataformodel(:,4);
vol5 = voldataformodel(:,5);
density1 = densitydataformodel(:,1);%no_in_link1(i,1)/delta_x;
density2 = densitydataformodel(:,2);%no_in_link2(i,1)/delta_x;
density3 = densitydataformodel(:,3);%no_in_link1(i,1)/delta_x;
density4 = densitydataformodel(:,4);%no_in_link2(i,1)/delta_x;
delta_x1 = .498; % given that section is ~~ .5 miles
delta_x2 = .506; %.47;
delta_x3 = .388;
delta_x4 = .513;
delta_t =2; % choose based on delta_x/free_flow_speed relation
%cumu = 4; % 30 sec data to be cumulated to 2 mt data. hence 4 set has to be added
no_in_start1 =0; % not known. trial and error and choose the best number
no_in_start2 = 0;
no_in_start3 =0; % not known. trial and error and choose the best number
no_in_start4 = 0;
for i=1:length(voldataformodel)
%act_vol(i,:)=temp((i-1)*2+1:i*2)'; %given data
act_vol(i,1) = vol1(i);
act_vol(i,2) = vol2(i);
act_vol(i,3) = vol3(i);
act_vol(i,4) = vol4(i);
act_vol(i,5) = vol5(i);
end
267
figure
plot(act_vol) % to check any unreasonable data
title('actual volume')
for i=1:length(voldataformodel)
vol(i,:)= act_vol(i,:);
end
figure
plot(vol)
title('aggregated actual volume')
for i = 1:length(voldataformodel)
q(i,:) = vol(i,:)/delta_t;%number per unit time
end
figure
plot(q)
title('q')
cum_vol(1,1) = q(1,1);
cum_vol(1,2)=(no_in_start1/2) + q(1,2);%cumulated q
cum_vol(1,3) = (no_in_start2/2) + q(1,3);
cum_vol(1,4)=(no_in_start3/2) + q(1,4);%cumulated q
cum_vol(1,5) = (no_in_start4/2) + q(1,5);
for i=2:length(voldataformodel)
cum_vol(i,:)=q(i,:) + cum_vol(i-1,:);
end
figure
plot(cum_vol)
title('cumulated q')
for i=1:length(voldataformodel)
Q(i,:) = cum_vol(i,:)*delta_t;%cumulated volume(not per unit time)
end
figure
plot(Q)
title('Q')
for i= 1:length(voldataformodel)
no_in_link1(i,1) = Q(i,1)-Q(i,2);
no_in_link2(i,1) = Q(i,2)-Q(i,3);
no_in_link3(i,1) = Q(i,3)-Q(i,4);
no_in_link4(i,1) = Q(i,4)-Q(i,5);
end
figure
plot(density1, '-b')
hold on
plot(density2,'-r')
hold on
plot(density3, '-g')
268
hold on
plot(density4, '-y')
title('density')
m(1,1) = Q(1,2);
m(1,2) = Q(1,3);
m(1,3) = Q(1,4);
m(1,4) = Q(1,5);
m_percent(1,1) = m(1,1)/vol(1,1);
m_percent(1,2) = m(1,2)/vol(1,2);
m_percent(1,3) = m(1,3)/vol(1,3);
m_percent(1,4) = m(1,4)/vol(1,4);
for i=2:length(voldataformodel)
m(i,1)=Q(i,2)-Q(i-1,1);
m(i,2) = Q(i,3) - Q(i-1,2);
m(i,3)=Q(i,4)-Q(i-1,3);
m(i,4) = Q(i,5) - Q(i-1,4);
m_percent(i,1)= m(i,1)/vol(i,1);
m_percent(i,2)= m(i,2)/vol(i,2);
m_percent(i,3)= m(i,3)/vol(i,3);
m_percent(i,4)= m(i,4)/vol(i,4);
end
figure
plot(m(:,1))
hold on
plot(m(:,2),'-r')
hold on
plot(m(:,3),'-g')
hold on
plot(m(:,4),'-y')
title('m')
for i =1:length(speeddataformodel)
if (act_vol(i,:)<50)
%tt1_method1(i) = ((delta_x/(2*speed1(i)))+(delta_x/(2*speed2(i))))*3600;
tt_mts(i,1) = delta_x1/((speed1(i)+speed2(i))/2)*60;
%tt1_method3(i) = delta_x/(min(speed1(i),speed2(i)))*3600;
%tt2_method1(i) = ((delta_x/(2*speed2(i)))+(delta_x/(2*speed3(i))))*3600;
tt_mts(i,2) = delta_x2/((speed2(i)+speed3(i))/2)*60;
%tt2_method3(i) = delta_x/(min(speed2(i),speed3(i)))*3600;
tt_mts(i,3) = delta_x3/((speed3(i)+speed4(i))/2)*60;
tt_mts(i,4) = delta_x4/((speed4(i)+speed5(i))/2)*60;
else
if (m(i,1)>0)
tt_mts(i,1) = m_percent(i,1)*((delta_x1/2)*((q(i,1)*density1(i-
1,1))+(q(i,2)*density1(i,1))))/(q(i,1)*q(i,2)) + (1-m_percent(i,1))*((delta_x1/2)*((density1(i-
1,1)+density1(i,1))/q(i,2)));
269
else
tt_mts(i,1) = ((delta_x1/2)*((density1(i-1,1)+density1(i,1))/q(i,2))); % same eqn can be written
as the next line
%tt_mts(i,1) = ((2*(density(i-1,1)*delta_x))+((q(i,1)-q(i,2))*delta_t))/(2*q(i,2));
end
if(m(i,2) >0)
tt_mts(i,2) = m_percent(i,2)*((delta_x2/2)*((q(i,2)*density2(i-
1,1))+(q(i,3)*density2(i,1))))/(q(i,2)*q(i,3)) + (1-m_percent(i,2))*((delta_x2/2)*((density2(i-
1,1)+density2(i,1))/q(i,3)));
else
tt_mts(i,2) = ((delta_x2/2)*((density2(i-1,1)+density2(i,1))/q(i,3))); % same eqn can be written as
the next line
%tt_mts(i,2) = ((2*(density(i-1,2)*delta_x))+((q(i,2)-q(i,3))*delta_t))/(2*q(i,3));
end
if(m(i,3) >0)
tt_mts(i,3) = m_percent(i,3)*((delta_x3/2)*((q(i,3)*density3(i-
1,1))+(q(i,4)*density3(i,1))))/(q(i,3)*q(i,4)) + (1-m_percent(i,3))*((delta_x3/2)*((density3(i-
1,1)+density3(i,1))/q(i,4)));
else
tt_mts(i,3) = ((delta_x3/2)*((density3(i-1,1)+density3(i,1))/q(i,4))); % same eqn can be written as
the next line
end
if(m(i,4) >0)
tt_mts(i,4) = m_percent(i,4)*((delta_x4/2)*((q(i,4)*density4(i-
1,1))+(q(i,5)*density4(i,1))))/(q(i,4)*q(i,5)) + ...
(1-m_percent(i,4))*((delta_x4/2)*((density4(i-1,1)+density4(i,1))/q(i,5)));
else
tt_mts(i,4) = ((delta_x4/2)*((density4(i-1,1)+density4(i,1))/q(i,5))); % same eqn can be written as
the next line
end
end
end
% smoothing of the data
alpha1 = 0.3; % can be varied, do trial and error
alpha2 = 0.3;
alpha3 = 0.3; % can be varied, do trial and error
alpha4 = 0.3;
tt_smoothed(1,1) = tt_mts(1,1);
tt_smoothed(1,2) = tt_mts(1,2);
tt_smoothed(1,3) = tt_mts(1,3);
tt_smoothed(1,4) = tt_mts(1,4);
for n=2:length(tt_mts)
if(act_vol(i,:)<50)
tt_smoothed(n,1) = alpha1*tt_mts(n,1)+(1-alpha1)*tt_smoothed(n-1,1);
tt_smoothed(n,2) = alpha2*tt_mts(n,2)+(1-alpha2)*tt_smoothed(n-1,2);
tt_smoothed(n,3) = alpha3*tt_mts(n,3)+(1-alpha3)*tt_smoothed(n-1,3);
270
tt_smoothed(n,4) = alpha4*tt_mts(n,4)+(1-alpha4)*tt_smoothed(n-1,4);
else
tt_smoothed(n,1) = tt_mts(n,1);
tt_smoothed(n,2) = tt_mts(n,2);
tt_smoothed(n,3) = tt_mts(n,3);
tt_smoothed(n,4) = tt_mts(n,4);
end
end
figure
plot(tt_mts(:,1)*60,'-g')
hold on
plot( tt_mts(:,2)*60,'-b')
hold on
plot(tt_mts(:,3)*60,'-r')
hold on
plot( tt_mts(:,4)*60,'-y')
title('travel time in seconds')
figure
plot(tt_smoothed(:,1)*60,'-r')
hold on
plot(tt_smoothed(:,2)*60, '-y')
hold on
plot(tt_smoothed(:,3)*60,'-b')
hold on
plot(tt_smoothed(:,4)*60, '-g')
title('smoothed travel time')
%save tt_model.txt tt_mts -ascii
%save tt_model_smoothed.txt tt_smoothed -ascii
fid = fopen('tt_frommodel.txt','w');
for n=1:length(tt_mts)
fprintf(fid,'%f\t %f\t %f\t %f\n', tt_smoothed(n,1)*60, tt_smoothed(n,2)*60, tt_smoothed(n,3)*60,
tt_smoothed(n,4)*60 );
end
fclose(fid);
271
Extrapolation Method
%different extrapolation methods to calculate travel time. Input the speed values and travel time will be
%calculated.
clear;
format compact
format short e
load 'speeddataformodel.txt' % read data
speed1 = speeddataformodel(:,1); % speed
speed2 = speeddataformodel(:,2);
speed3 = speeddataformodel(:,3);
speed4 = speeddataformodel(:,4);
speed5 = speeddataformodel(:,5);
delta_x1 = .498; % given that section is ~~ .5 miles
delta_x2 = .506;
delta_x3 = .388; % given that section is ~~ .5 miles
delta_x4 = .513;
delta_t =2; % choose based on delta_x/free_flow_speed relation
%cumu = 4; % 30 sec data to be cumulated to 2 mt data. hence 4 set has to be added
no_in_start = 0; % not known. trial and error and choose the best number
% following are the two methods applied in the field
for(n=1:length(speed1))
tt1_method1(n) = ((delta_x1/(2*speed1(n)))+(delta_x1/(2*speed2(n))))*3600;
tt1_method2(n) = delta_x1/((speed1(n)+speed2(n))/2)*3600;
tt1_method3(n) = delta_x1/(min(speed1(n),speed2(n)))*3600;
tt2_method1(n) = ((delta_x2/(2*speed2(n)))+(delta_x2/(2*speed3(n))))*3600;
tt2_method2(n) = delta_x2/((speed2(n)+speed3(n))/2)*3600;
tt2_method3(n) = delta_x2/(min(speed2(n),speed3(n)))*3600;
tt3_method2(n) = delta_x3/((speed3(n)+speed4(n))/2)*3600;
tt4_method2(n) = delta_x4/((speed4(n)+speed5(n))/2)*3600;
end
figure
plot(tt1_method2, '-g')
title('travel time1 from method2 in seconds')
figure
plot(tt2_method2, '-r')
title('travel time2 from method2 in seconds')
figure
plot(tt3_method2, '-r')
title('travel time3 from method2 in seconds')
figure
plot(tt4_method2, '-b')
title('travel time4 from method2 in seconds')
%figure
272
%plot(tt2_method2, '-y')
%title('travel time2 from method2 in seconds')
%figure
%plot(tt2_method3, '-y')
%title('travel time2 from method3 in seconds')
fid = fopen('tt_fromspeed.txt','w');
for n=1:length(speed1)
fprintf(fid,'%f\t %f\t %f\t %f\n',tt1_method2(n), tt2_method2(n), tt3_method2(n), tt4_method2(n));
end
fclose(fid);
273
Travel Time Prediction
Real-time method
% To predict the travel time using real-time method. Input is the previous 5 time steps travel time and the
%travel time up to the next 30 time steps will be calculated.
clear;
real_
load tst.mat
save tst1.mat x y mx mn
load real_res.mat
errl(1)=ers;
errl1(1) = ers1;
for i=2:30,
N=length(y);
y=y(2:N);
x=x(1:N-1,:);
save tst.mat x y mx mn
real_
load real_res.mat
errl(i)=ers;
errl1(i) = ers1;
end
clear x,y;
load tst1.mat
save tst.mat x y mx mn
save realres.mat errl errl1
plot(errl)
figure
plot(errl1)
**********************************************************
function real_
load tst.mat
load norm.mat
N=length(y);
ye=x(:,5);
ers=sum(abs(ye-y)./y)*100/(N-1);
actual_ye=((ye-mm)*nx1)+ nx2;
actual_y = ((y-mm)*nx1)+ nx2;
ers1=sum(abs(actual_y-actual_ye)./actual_y)*100/N;
save real_res.mat actual_y actual_ye ers ers1
274
ANN
% Program to predict travel time using ANN method. Input the previous 5 time steps travel time values
%and get the travel time up to 30 time steps ahead
clear;
nntr_
nntst_
load tst.mat
save tst1.mat x y mx mn
load nnres.mat
ernn(1)=ers;
ernn1(1) = ers1;
for i=2:30,
N=length(y)
x=x(1:N-1,:);
%x(2:end,4)=ye(1:end-1)';
y=y(2:N);
%x=x(2:end,:);
save tst.mat x y mx mn
nntst_
load nnres.mat
ernn(i)=ers;
ernn1(i) = ers1;
end
clear x,y;
load tst1.mat
save tst.mat x y mx mn
% cd ..
save nnres.mat ernn ernn1
figure
plot(ernn)
figure
plot(ernn1)
********************************************************************
function nntr_
load tr.mat
size(x);
fcn_init='rands';
mi=round(min(x)*10)/10;
ma=round(max(x)*10)/10;
net = newff([mi' ma'],[10 1],{'logsig' 'purelin'});
275
net.initFcn='initlay';
net.layers{1}.initFcn='initwb';
net.layers{2}.initFcn='initwb';
for i=1:2,
net.inputWeights{i}.initFcn=fcn_init;
end
for i=1:2,
net.layerWeights{1,i}.initFcn=fcn_init;
end
net.layerWeights{2,1}.initFcn=fcn_init;
net.biases{1}.initFcn=fcn_init;
net.biases{2}.initFcn=fcn_init;
net=init(net);
net.trainFcn='trainlm';
net.trainParam.epochs = 1000;
net.trainParam.mu=1;
%disp(net.trainParam)
%pause
%net.lw{2,1}
net = train(net,x',y');
save nnwt.mat net
*****************************************************
function nntst_
load tst.mat
load norm.mat %data coming from SVMdata
load nnwt.mat
ye = sim(net,x');
aa=ye;
ye=aa';
N=length(y);
ers=sum(abs(y-ye)./y)*100/N
actual_y=((y-mm)*nx1)+ nx2; %mean(testx1)
actual_ye=((ye-mm)*nx1)+ nx2; %mean(testx1)
ers1=sum(abs(actual_y-actual_ye)./actual_y)*100/N
save nnres.mat actual_y actual_ye ers ers1
276
SVM
% Program to predict travel time using SVM method. Input the previous time steps values and get the
%future travel time.
clear;
svmdata
trainsvm
testsvm
load tst.mat
save tst1.mat x y mn mx
load svmres.mat
ersvm(1)=ers;
ersvm1(1)=ers1;
for i=2:30,
i
N=length(y);
x=x(1:N-1,:);
%x(2:end,4)=ye(1:end-1)';
y=y(2:N);
%x=x(2:end,:);
%size(y)
save tst.mat x y mn mx
testsvm
load svmres.mat
ersvm1(i)=ers1;
ersvm(i)=ers;
end
clear x,y;
load tst1.mat x y mn mx
%load tst1.mat
save tst.mat x y mn mx
%cd ..
save svmres.mat ersvm ersvm1
figure
plot(ersvm)
figure
plot(ersvm1)
******************************************
function svmdata
load 'train_original.txt'
trainx = cat(1, train_original(:,1));%, train_original(:,2));%, train_original(:,3));
load 'test_original.txt'
testx1 = test_original;
277
St = 1;
en = length(trainx);
St1 = 1%500;
en1 = length(testx1); %600
nx=max(trainx(St:en))-min(trainx(St:en));
nx1=max(testx1(St1:en1))-min(testx1(St1:en1));
nx2 = mean(testx1(St1:en1));
mm=max(mean(trainx(St:en)),mean(testx1(St1:en1)));
x_norm=((trainx-mean(trainx))/nx)+mm;
x1_norm=((testx1-mean(testx1))/nx1)+mm;
x_final=x_norm(St:en); %nomalised input data for training
x1_final=x1_norm(St1:en1); %nomalised input data for testing
P=5; % take five numbers as input and the 6th number as the output
Ntr=719;%;%1438 %2157;
%x_final=svdatanorm(trainx,'rbf');
%x1_final=svdatanorm(testx1,'rbf');
%Ntst=length(x)-Ntr-P-1;
count=1;
for i=P+1:Ntr,
for j=1:P
X(count,j)=x_final(i-j);
end
Y(count,1)=x_final(i);
count=count+1;
end
x=X;
y=Y;
mn=min(x);
mx=max(x);
save tr.mat x y mn mx
save norm.mat mm nx1 nx2
% training data
count=1;
Ntst = 719;
for i=P+1:Ntst,
for j=1:P
Xtst(count,j)=x1_final(i-j);
end
Ytst(count,1)=x1_final(i);
count=count+1;
end
x=Xtst;
y=Ytst;
mn=min(x);
mx=max(x);
save tst.mat x y mn mx
********************************************
278
function testsvm
global C P p1 p2 sep beta nsv bias;
load tr.mat %variable name is x and y
X=x;
Y=y;
load tst.mat %variable name is x and y
load norm.mat %for denormalising the data
load svmresult.mat
C=Inf;%C=500;%
P=1;
e=0.05;%e=0.1;%
ker='erbf';%'rbf';
p1=15;
p2=0;
sep=1;
%save tsdata_svm.mat X Y C P e ker p1 p2 sep
%size(beta)
err=svrerror(X,x,y,ker,beta,bias,'eInsensitive',e)
out=svroutput(X,x,ker,beta,bias);
N=length(y);
ers=sum(abs(y-out)./y)*100/N
actual_out=((out-mm)*nx1)+ nx2;
actual_y = ((y-mm)*nx1)+ nx2;
ers1=sum(abs(actual_y-actual_out)./actual_y)*100/N
save svmres.mat actual_y actual_out ers ers1
****************************************************
function trainsvm
global C P p1 p2 sep nsv beta bias
load tr.mat
C=Inf;%C=500;%
P=1;
e=0.05;%e=0.1;%
ker='erbf';
p1=15;
p2=0;
sep=1;
%save tsdata_svm.mat X Y C P e ker p1 p2 sep
[nsv beta bias] = svr(x,y,ker,C,'eInsensitive',e);
save svmresult.mat nsv beta bias
279
C Programs for Extracting Simulation Data
/*-------program to get the entry exit details from tsd_text file----*/
#include <stdio.h>
#include <stdlib.h>
#define size 121856
/*................swapping function starts----------*/
void swap(int *x, int *y) /*function for swapping*/
{
int temp;
temp = *x;
*x=*y;
*y=temp;
}
/*----------main program starts-----------*/
int main()
{
FILE *inf = NULL;
FILE *outf = NULL;
FILE *inf1 = NULL;
int time[size],id[size],tempid[size]; /* reading data as 2 one dim arrays*/
int speed[size],tempspeed[size];
int tempcount[size],count[size];
int i,j,k,l,m,counter=0,cum_count=-1;
inf=fopen("datafile","r"); /* datafile name is "datafile" */
outf=fopen("outfile1","w"); /* want the output in the file "outfile"*/
/*fscanf(inf1,"%d",&size);*/ /*specify the size of file*/
for(i=0;i<size;i++)
{
fscanf(inf,"%5d %d %d",&time[i], &id[i], &speed[i]); /*read data*/
}
/*----------swapping and sorting----------*/
for(i=0;i<size;i++)
{
if (time[i] == time[i+1])
{
tempid[i] = id[i]; /* if time is same save id */
counter ++;
cum_count++;
tempspeed[i] = speed[i];
}
else
{
280
tempid[i] = id[i];
tempspeed[i]=speed[i];
cum_count++;
for (k=i-counter;k<=i-1;k++)
{
for(l=k+1;l<i+1;l++)
{
if(tempid[k] > tempid[l])
{
swap(&tempid[k],&tempid[l]); /*ascendingly order saved id's*/
swap(&tempspeed[k],&tempspeed[l]);
}
} tempcount[i]=cum_count;
} counter = 0;
}
}
/*--------getting points where the time changes.....*/
count[0]=0; /*count array start at zero*/
m=1;
for(i=0;i<size;i++)
{
if(tempcount[i]!=0) /*points where time changes*/
{
count[m]=tempcount[i]; /*time counts where time changes*/
m++;
}
}
/*--getting exit details ---if a vehicle id is missing in the second time group it exited in the previous time
step------*/
i = 0; /*first time group checked separately*/
for(j=count[i];j<=count[i+1];j++)
{
for(k=count[i+1]+1; k<=count[i+2]; k++)
{
if(tempid[j] > tempid[k]) continue; /*check the next id in the 2nd group*/
else
{
if(tempid[j] == tempid[k]) break; /*vehicle continue in next time*/
else
{
fprintf(outf,"%d\t %d\t %d\t %d\n", time[j],tempid[j],tempspeed[j],0); /*exit is 0*/
break;
}
}
}
}
for(i=1;i<m-2;i++) /*from time 1 to last but one*/
{
281
for(j=count[i]+1;j<=count[i+1];j++)
{
for(k=count[i+1]+1;k<=count[i+2];k++)
{
if(tempid[j]>tempid[k]) continue;
else
{
if(tempid[j] == tempid[k]) break;
else
{
fprintf(outf,"%d\t %d\t %d\t %d\n", time[j], tempid[j],tempspeed[j],0);
break;
}
}
}
}
}
/*--finding out entries-check with the previous time group and if a new number is there it is an
entry...represented as 1..*/
i=1; /*from group 1, group 1 done separately*/
for(j=count[i]+1; j<=count[i+1]; j++)
{
for(k=count[i-1]; k<count[i]; k++)
{
if(tempid[j] == tempid[k]) break;
else if(tempid[j] > tempid[k]) continue;
}
if(tempid[j]!=tempid[k])
fprintf(outf,"%d\t %d\t %d\t %d\n", time[j],tempid[j],tempspeed[j],1);
}
for(i=2;i<m-1;i++) /*from group 2 to last*/
{
for(j=count[i]+1; j<=count[i+1]; j++)
{
for(k=count[i-1]+1; k<count[i]; k++)
{
if(tempid[j] == tempid[k]) break;
else if (tempid[j] > tempid[k]) continue;
}
if(tempid[j]!=tempid[k])
fprintf(outf,"%d\t %d\t %d\t %d\n", time[j],tempid[j],tempspeed[j],1);
}
}
/* for(j=0;j<m;j++)
fprintf(outf,"count[%d]=%d\n",j,count[j]);
for(i=0;i<size;i++)
282
fprintf(outf,"%d\t %d\t %d\n",time[i],id[i],tempid[i]); */
fclose(inf1);
fclose(inf);
fclose(outf);
return 0;
}
*****************************************
/*-------program to calculate entry and exit volume and average entry and exit speeds in every one minute
interval obtained from the entryexit.c program---------*/
#include <stdio.h>
#include <stdlib.h>
#define size 8421
int main()
{
FILE *inf = NULL;
FILE *outf = NULL;
int time[size],id[size], speed[size], traveltime[size],status[size];
int entry[size],exit[size];
float entryspeed[size],exitspeed[size];
int i,j,k,l,m;
inf=fopen("outfile1","r"); /* datafile name is "datafile" */
outf=fopen("outfile3","w"); /* want the output in the file "outfile"*/
for(i=0;i<size;i++)
fscanf(inf,"%5d %d %d %d",&time[i], &id[i], &speed[i], &status[i] ); /*read data*/
for (k =0 ; k <size ; k++)
{
for(j=0;j<size;j++)
{
if(time[j]>k*60 && time[j]<=(k+1)*60)
{
if(status[j] == 0) { exit[k]++; exitspeed[k]+=speed[j]; }
else {entry[k]++; entryspeed[k]+=speed[j];}
}
}
}
fprintf(outf,"entryvol\t exitvol\t avgentryspeed\t avgexitspeed\n");
for(k=0;k<size;k++)
{
if (exit[k]!=0 || entry[k]!=0)
fprintf(outf,"%d\t\t %d\t\t %f\t %f\n",entry[k]*60,exit[k]*60, (entryspeed[k]/entry[k])*.682,
(exitspeed[k]/exit[k])*.682);
}
283
fclose(inf);
fclose(outf);
return 0;
}
******************************************
/*---program to get exit time and travel time for each vehicle obtained from the entry exit.c program-----*/
#include <stdio.h>
#include <stdlib.h>
#define size 7493
void swap(int *x, int *y) /*function for swapping*/
{
int temp;
temp = *x;
*x=*y;
*y=temp;
}
int main()
{
FILE *inf = NULL;
FILE *outf = NULL;
int time[size],id[size], speed[size],traveltime[size],status[size];
int i,j,k,l,m;
inf=fopen("outfile1","r"); /* datafile name is "datafile" */
outf=fopen("outfile2","w"); /* want the output in the file "outfile"*/
for(i=0;i<size;i++)
fscanf(inf,"%5d %d %d %d",&time[i], &id[i], &speed[i], &status[i] ); /*read data*/
for (k =0 ; k <size-2 ; k++)
{
for (l = k + 1; l < size; l++)
{
if(id[k] > id[l])
{
swap(&id[k],&id[l]); /*sorting ascending order*/
swap(&time[k],&time[l]);
swap(&speed[k],&speed[l]);
swap(&status[k],&status[l]);
}
}
}
/*for(i=0;i<size;i++)
fprintf(outf,"%d\t %d\t %d\t %d\n",id[i],time[i],speed[i],status[i]);
*/
for(i=0;i<size;i++)
284
{
if (id[i] == id[i+1])
{
traveltime[i] = time[i]-time[i+1]; /* if id is same find traveltime */
fprintf(outf,"%d\t %d\t\t %d\n",id[i], time[i],traveltime[i]);
i++;
}
}
fclose(inf);
fclose(outf);
return 0;
}
***********************************
/*-------program to get the density values from tsd_text file----*/
#include <stdio.h>
#include <stdlib.h>
#define size 146614
/*----------main program starts-----------*/
int main()
{
FILE *inf = NULL;
FILE *outf = NULL;
int time[size],id[size],tempid[size]; /* reading data as 2 one dim arrays*/
int speed[size],temptime[size];
int density[size],d,den1, den2;
int i,j,k,l,m,counter=0,cum_count=-1;
inf=fopen("datafile","r"); /* datafile name is "datafile" */
outf=fopen("outfile","w"); /* want the output in the file "outfile"*/
for(i=0;i<size;i++)
{
fscanf(inf,"%5d %d %d",&time[i], &id[i], &speed[i]); /*read data*/
}
d=0;
den1 = 1;
den2 =1;
for(i=0;i<size;i++)
{
if (time[i] == time[i+1])
{
den1++;
}
else
{
285
density[d]=den1;
temptime[d] = time[i];
d ++;
den1=1;
den2++;
}
}
for (i=0;i<(den2-1);i++)
{
fprintf(outf,"%d\t %d\n", temptime[i],density[i]);
}
fclose(inf);
fclose(outf);
return 0;
}
*******************************
/*-------program to calculate average density in every one minute interval obtained from the outfile---------
*/
#include <stdio.h>
#include <stdlib.h>
#define size 7033
int main()
{
FILE *inf = NULL;
FILE *outf = NULL;
int time[size], density[size];
float average[size];
int j,k,counter[size],id[size];
inf=fopen("outfile","r"); /* datafile name is "datafile" */
outf=fopen("outfile5","w"); /* want the output in the file "outfile"*/
for(k=0;k<size;k++)
fscanf(inf,"%d %d", &time[k], &density[k] ); /*read data*/
for(k=0;k<size;k++)
{
counter[k]=0;
average[k]=0;
}
for (k =0 ; k <size ; k++)
{
for(j=0;j<size;j++)
{
if(time[j]>k*60 && time[j]<=(k+1)*60)
{
counter[k]++;
286
average[k] += density[j];
}
}
}
fprintf(outf,"totalden\t count\t averageden\n");
for(k=0;k<size;k++)
{
if(counter[k] !=0)
fprintf(outf,"%f\t\t %d\t %f\n",average[k],counter[k],average[k]/counter[k]);
}
fclose(inf);
fclose(outf);
return 0;
}
*********************
/*-------program to calculate average travel time of the vehicles in every one minute interval obtained
from the traveltime.c program---------*/
#include <stdio.h>
#include <stdlib.h>
#define size 3742
int main()
{
FILE *inf = NULL;
FILE *outf = NULL;
int exittime[size], traveltime[size];
float average[size];
int j,k,counter[size],id[size];
inf=fopen("outfile2","r"); /* datafile name is "datafile" */
outf=fopen("outfile4","w"); /* want the output in the file "outfile"*/
for(k=0;k<size;k++)
fscanf(inf,"%d %d %d", &id,&exittime[k], &traveltime[k] ); /*read data*/
for(k=0;k<size;k++)
{
counter[k]=0;
average[k]=0;
}
for (k =0 ; k <size ; k++)
{
for(j=0;j<size;j++)
{
if(exittime[j]>k*60 && exittime[j]<=(k+1)*60)
287
{
counter[k]++;
average[k] += traveltime[j];
}
}
}
fprintf(outf,"totaltt\t\t count\t averagett\n");
for(k=0;k<size;k++)
{
if(counter[k] !=0)
fprintf(outf,"%f\t\t %d\t %f\n",average[k],counter[k],average[k]/counter[k]);
}
fclose(inf);
fclose(outf);
return 0;
}
288
VITA
Lelitha Devi Vanajakshi
Permanent Address
Prabhanilayam, Neerkunnam, Alleppey, Kerala, India 688 005, e-mail: lelitha@yahoo.com
Education
Ph.D., Civil Engineering, Texas A&M University, August 2004
M.Tech., Civil Engineering, Government College of Engg., Trivandrum, India, September 1995
B.Tech., Civil Engineering, Government College of Engg., Trivandrum, India, September 1993
Publications and Presentations
1. Vanajakshi, L. D., and Rilett, L. R. (2004), Loop detector data diagnostics based on
vehicle conservation principle. Accepted for publication in Transportation Research
Record, Transportation Research Board, Washington, D.C.
2. Vanajakshi, L. D. (2003), Loop detector data screening and diagnostics based on
conservation of vehicles. Proceedings of the IGERT Student Research Conference (CD-
ROM), Institute of Transportation Studies, University of California, Davis.
3. Vanajakshi, L. D., and Rilett, L. R. (2004), Some issues in using loop detector data for
ATIS applications. ITS Safety and Security Conference (CD-ROM), Miami, Florida.
4. Vanajakshi, L. D., and Rilett, L. R. (2004), Travel time estimation from loop detector
data. ITS Safety and Security Conference (CD-ROM), Miami, Florida.
5. Vanajakshi, L. D., and Rilett, L. R. (2003), Estimation and prediction of travel time
from loop detector data for intelligent transportation systems applications. Presented at
the TAMUS Pathways Students Research Symposium, Galveston, Texas.
6. Vanajakshi, L. D. (2004), Estimation and prediction of travel time from loop detector
data for intelligent transportation systems applications. Presented at the Ph.D.
Dissertation Seminar of the 83rd TRB Annual Meeting, Washington, D.C.
7. Vanajakshi, L. D., and Rilett, L. R. (2004), A comparison of the performance of
artificial neural networks and support vector machines for the prediction of
vehicle speed. Accepted for IEEE Intelligent Vehicles Symposium, Parma, Italy.