Assessment of Crash Occurrence Using

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Research Article

Transportation Research Record


1–15
Ó National Academy of Sciences:
Assessment of Crash Occurrence Using Transportation Research Board 2021
Article reuse guidelines:
Historical Crash Data and a Random sagepub.com/journals-permissions
DOI: 10.1177/03611981211027569

Effect Negative Binomial Model: A Case journals.sagepub.com/home/trr

Study for a Rural State

Karla J Diaz-Corro1 , Leyla Coronel Moreno1 ,


Suman Mitra1 , and Sarah Hernandez1

Abstract
This work identifies factors that influence crash occurrence within a traffic analysis zone (TAZ) by accounting for location-
specific effects and serial correlation in longitudinal crash data. This is accomplished by applying a random effect negative
binomial (RENB) model. Unlike commonly used count models such as Poisson and negative binomial (NB), RENB accounts
for heterogeneity and serial correlation in crash occurrence. An RENB was applied to 15 years of crash data in Arkansas with
1,817 TAZs. Four models were developed for total crashes and by severity (property damage only (PDO), injury, and fatal).
RENB-estimated impacts were measured using the incidence rate ratio (IRR). The significant causal factors found to increase
in observed crashes include: (i) average precipitation (a one-unit increase in average precipitation results in a 134% increase
in total monthly crashes for a TAZ); (ii) average wind speed (16%); (iii) urban designation (7%); (iv) traffic volume (2%); and
(v) total roadway mileage (1% for each functional class). Snow depth and days of sunshine were found to decrease the num-
ber of accidents by 15% and 2%, respectively. Employment and total population had no impact on crash occurrence.
Goodness-of-fit comparisons show that RENB provides the best fit among Poisson and NB formulations. All four model diag-
nostics confirm the presence of over-dispersion and serial correlation indicating the necessity of RENB model estimation.
The main contribution of this work is the identification of crash causal factors at the TAZ level for longitudinal data, which
supports data-driven performance measurement requirements of recent federal legislation.

Crash occurrence is an increasing public health concern, While most crash occurrences are in urban areas, the
compromising the well-being of communities and result- risk factors of crashes are higher in rural areas (9).
ing in serious social and economic losses. Globally, more Roadway design elements, narrow shoulders, and higher
than 1 million deaths and 20–50 million serious injuries speed limits can make rural driving conditions more
are attributed to traffic crash incidents (1). In 2015, an hazardous. Low population density and geographic iso-
analysis of road crashes per 100,000 population ranked lation of rural areas limit the transferability of the identi-
Arkansas, a relatively rural state in the U.S., in the top fied crash risk factors from urban to rural areas.
five states with the highest fatality rate (17.8) (2). Therefore, there is a need for targeted studies of rural
According to the Federal Highway Administration areas to identify crash-contributing factors (10).
(FHWA), the rapidly growing population is increasing With U.S. federal legislation requiring performance-
drivers on the road each year, which can result in more based planning, it is increasingly important to be able to
exposure to traffic incidents (3). In response, traffic estimate safety performance measures, like crash occur-
safety strategies have been implemented in the U.S.—for rences, at the same spatial resolution used for mobility,
example, Vision Zero, safety belt use regulations, and accessibility, and other performance measures (11).
distance-based charges (4, 5). Policy, operational, and
infrastructure solutions for crash mitigation are based on 1
Department of Civil Engineering, University of Arkansas, Fayetteville, AR
identified risk factors including driver-related factors
(i.e., impairment, fatigue, or distractions) and roadway Corresponding Author:
infrastructure (6–8). Karla J Diaz-Corro, kjdiazco@uark.edu
2 Transportation Research Record 00(0)

While it is more common to develop crash-prediction the simplest measure to assess the degree of safety of a
models for corridors or specific sites, by estimating or roadway segment, intersection, or sites with similar char-
predicting crashes at the traffic analysis zone (TAZ) acteristics and traffic volumes (21). Caution on the use
level, it is easier to tie safety performance measures into of crash rates must be taken because of the non-linear
long-range transportation planning efforts (12–15). relationship between crash rates and volumes. Thus, it is
Naderan and Shahi included an aggregate macro-level recommended to employ count-data models to identify
crash prediction model to apply during trip generation in crash risk factors (22–25).
the four-step travel demand model to forecast the num-
ber of crashes in urban TAZs (16). The intention of the
model is crash prediction and not to identify causal fac- Causal Factors Associated with Crash Occurrence
tors necessary for countermeasure selection. The ratio- Causal factors of crash incidence are commonly associ-
nale for identifying crash causal factors at the TAZ level ated with roadway network attributes—for example, lane
is that it can be used for the development of long-range width, median type, speed limit, and number of lanes—
safety plans. Thus, here is where it relies on the impor- thus helping engineers and other traffic safety profession-
tance of creating a method to assess crash occurrence als design safer roads (12–14). Additionally, crash causal
using specialized methods to forecast crash frequencies. factors can be associated with drivers’ sociodemographic
A handful of studies focused on identifying risk factors characteristics (age, gender, household income levels) to
of crash occurrence using historical data at the TAZ level determine risk factors and develop laws and policies (26,
(17–20). For instance, Wang et al. showed that at the 27). Weather conditions—for example, rain, snowfall,
macro-level—for example, TAZ and census tract—traffic and temperature—considered causal to a crash may aid
flow influenced crash occurrence, by using Bayesian in the specification of lighting, pavements, and other
models to account for the spatial dependency (20). Peera infrastructure or vehicle designs (28, 29). These elements
et al. studied the relationship among land-use character- are the most common causal factors identified in the lit-
istics and traffic accidents at the TAZ level using a gener- erature, which can be associated in broader categories as
alized linear regression model (17). This paper extends spatial variables (e.g., land use, population, roadway
this approach by using a model that better captures characteristics) and temporal variables (e.g., weather
interdependencies in the data. parameters) (13, 30, 31).
Without accurate estimates of crash-influencing fac-
tors, transportation planning agencies will be unable to
make informed transportation policy decisions to enhance Methods to Identify Causal Factors for Crash
safety. Therefore, the main contribution of this study is to Occurrence
help fill a critical gap in the transportation safety toolkit
The most common methods for estimating crash-
concerning the study of historical crash data through
contributing factors include multiple linear regression
modeling approaches suitable for identifying crash causal
(MLR), multinomial logistic regression, and Poisson
factors at a macro-level (i.e., TAZ level). This paper
regression (32–36). These models estimate coefficients
applies a count model that allows for temporal autocorre-
(parameters) for causal factors to identify and compare
lation in historical TAZ-level crash data with over-disper-
significant factors. Models differ in their assumptions
sion. Specifically, a random effect negative binomial
about the structure of crash data—for example, normal,
(RENB) model is considered. RENB models do not
random, discrete, and non-negative. Assumptions of
require assumptions of homogeneity, autocorrelation, or
observed heterogeneity and spatial correlation are often
statistical dispersion of the data (e.g., mean and variance).
violated by real-world historical crash data (37–39). Li
Instead, RENB models treat data as a longitudinal panel
et al. developed a geographically weighted Poisson
to account for heterogeneity and serial correlation.
regression model and identified traffic patterns, road net-
work attributes, and sociodemographic characteristics as
Background causal factors (36). Of these, daily vehicle miles traveled
(DVMT) had the strongest influence on crash occur-
Measures of Safety rence, reflecting the general relationship between crashes
Crash occurrence is the most consistently referenced and exposure. While this method captures the observed
measure of safety for roadway safety analysis and is heterogeneity that exists in the relationship between
interpreted as a frequency—for example, the number of crash occurrence and explanatory variables over the geo-
crashes divided by a measure of exposure, such as traffic graphical extent of the study area, the calibrated model
volume (specifically annual average daily traffic is not spatially transferable, since it does not account for
[AADT]) passing over a segment of roadway or through spatial correlation in the dataset. Quddus et al. found
an intersection during a specified period. Crash rates are that, in London, traffic flow and the resident population
Diaz-Corro et al 3

aged 60 or over were serially correlated and thus violated Methodology


the assumption of Poisson/negative binomial (NB) mod-
els for homogeneity over time (37).
Spatial Scope
NB count models are often used for crash estimation TAZs are defined by geographical boundary and aggre-
at the TAZ level for data with over-dispersion and auto- gated sociodemographic characteristics. TAZs are the
correlation (12, 40). Pulugurtha developed a crash esti- smallest unit of geography used in travel demand fore-
mation model using TAZ level data for a single period casting models and are structured to have homogenous
for the Charlotte, North Carolina (13). They uniquely travel and land-use characteristics within a zone (44).
included land-use characteristics as a factor along with
demographic and socio-economic characteristics like Factor Identification
population, number of household units, employment,
network elements, and traffic indicators (e.g., trip Pre-processing and suggested sources for each factor are
productions and attractions by zone). Results showed explained in this section and summarized in Table 1.
over-dispersion leading to rejection of the use of Poisson
distribution models. Land-use characteristics such as Dependent Variable (Crash Data). Each crash record should
mixed-use development, urban residential, single-family include the geo-location of the accident (latitude-longi-
residential, multi-family residential, business, and office tude), year, and month. Crashes were aggregated into
districts were strongly associated with an increase in esti- TAZs using geographical information system (GIS)
mating TAZ-level crashes. Their main limitation was that tools. For this study, all crash types (e.g., severity, pedes-
the study period consisted of one year, and they were trian involved, single car, property damage only [PDO],
unable to note the impact of changes to the surrounding heavy vehicle) were included. The dependent variable in
environment over time. Conversely, in this paper, a long- this study, yitm , was the total number of crashes in TAZ
itudinal approach is employed to analyze changes in indi- (i), in month (m) of year (t). A necessary data feature for
vidual factor levels, to account for changes in TAZ this model is that the crash data is provided by exact
characteristics over time and space. location—for example, latitude-longitude—and not a
By introducing model forms like spatial regression in general reference to only the road or link on which the
place of Poisson or NB models, limiting assumptions of crash occurred. In instances when a crash occurs along a
homogeneity can be addressed. Using spatial regression roadway that borders two or more TAZs boundaries, the
methods, Mitra used traffic volume (i.e., average daily aggregate crash count for the border road was propor-
traffic [ADT]) as exposure data to investigate the influ- tionally distributed to the TAZs comprising the border
ence of traffic on the crash occurrence at signalized inter- (Figure 1). This is a similar approach to how the ADT
sections (12). A major limitation of their approach was was distributed for bordering roads—for example, the
that only exposure was used to define the relationship ADT was proportionally distributed to all TAZs border-
between crash occurrence and location. ing the road. This approach follows what is carried out in
Considering crashes are random events influenced by prior research in which crashes are proportionally distrib-
many space-specific factors, it is critical to include multi- uted to TAZs when crashes occur along a roadway that
ple years of data (24, 41, 42). Mannering et al. concluded borders a TAZ, or if crash data is not reported to the
that estimation techniques providing insights into the exact location (19). Uniquely, the data used for the study
effects of time-variant and -invariant elements are neces- did not contain any crashes located along roads that bor-
sary in crash models (24). Therefore, this paper uses his- dered TAZs. In many areas, the study region is rural, and
torical data (i.e., measurements of the same variables roads are not used to define the TAZ boundaries.
over time, or longitudinal data) to consider temporal
changes. Inclusion of longitudinal data in crash estima- Explanatory Variables. As noted in Table 1, weather data
tion can strengthen, weaken, or hide the relationship was time variant and aggregated monthly; roadway infra-
between crash factors and their variation in crash fre- structure was time invariant and aggregated for the 15-
quency models like Poisson and NB, since longitudinal year period; roadway usage (ADT, VMT) was time var-
data is inherently serially correlated (43). The objective iant and aggregated annually; built environment vari-
of this paper is to outline an approach for using longitu- ables were time variant and aggregated annually.
dinal data in RENB models for crash estimation in a
rural state. This approach allows for the identification of Weather Data. Weather conditions such as rain, snow,
factors that influence crash occurrence while explicitly temperature, and precipitation can influence crash occur-
considering the observed heterogeneity of crash occur- rence (14, 45). Weather data was collected for TAZs for
rence by location. all 15 years in the study period from the National
4
Table 1. Model Variables and Summary Statistics

Variables Unit Expected relationship Source Time-variant? Mean SD Min. Max.

Dependent variable
Number of crashes per month per TAZ count na ASP Yesm 1.75 2.50 0.00 24.00
Explanatory variables
Weather effects
Average precipitation inches + NOAA Yesm 0.005 0.024 0.00 0.29
Binary 1: if there is a day with snow depth . 1 in.; 0 otherwise binary + NOAA Yesm 0.097 0.296 0.00 1.00
Average wind speed mps + NOAA Yesm 0.047 0.128 0.00 7.04
Average percent of possible sunshine % - NOAA Yesm 8.18 12.40 0.00 31.00
Average temperature Celsius + NOAA Yesm 60.84 15.01 24.90 89.60
Minimum temperature Celsius + NOAA Yesm 71.81 15.26 33.70 105.6
Maximum temperature Celsius + NOAA Yesm 49.86 14.87 14.40 77.20
Roadway characteristics
Road length of interstate mi + ARDOT No 1.49 4.378 0.00 41.40
Road length of freeways and expressways mi + ARDOT No 0.36 2.471 0.00 53.40
Road length of primary arterial mi + ARDOT No 2.28 5.383 0.00 51.20
Road length of minor arterial mi + ARDOT No 4.49 6.787 0.00 52.30
Road length of major collector mi + ARDOT No 10.72 19.99 0.00 173.5
Road length of minor collector mi + ARDOT No 4.65 11.32 0.00 93.10
Road length of local road mi + ARDOT No 46.96 99.69 0.00 1344
Ln (ADT), in thousands count + ARDOT Yest 6.77 1.369 0.00 10.21
Ln (VMT), in hundred thousand count + ARDOT Yest 10.04 1.746 3.61 15.03
Built environment
Employment density, in hundreds count/mi2 + ACS Yest 0.591 0.699 0.00 2.75
Total population, in thousands count + ACS Yest 1.503 3.640 0.001 48.51
Binary 1: if TAZ is urban binary + ARDOT No 0.561 0.496 0 1

Note: ACS = American Community Survey (from the U.S. Census); ARDOT = Arkansas Department of Transportation; ASP = Arkansas State Police; Ln (ADT) = natural log of ADT; Ln (VMT) = natural
log of VMT; Max. = maximum; Min. = minimum; na = not applicable; NOAA = National Oceanic and Atmospheric Administration; SD = standard deviation; TAZ = traffic analysis zone.
m
time variant by month.
t
time variant by year.
Diaz-Corro et al 5

Figure 1. Example of proportionally distributed traffic exposure diagram.


Note: ADT = average daily traffic.

Oceanic and Atmospheric Administration (NOAA)’s major arterials, major collector, minor collector, and
National Center for Environmental Information (NCEI) local road); (ii) average vehicle-miles traveled (VMT);
(46). The weather data is a monthly average (aggregate) and (iii) ADT. The most current (2017) roadway net-
and does not reflect the weather at the exact time of the work database provided by Arkansas Department of
crash as would be reported in the crash reports. The goal Transportation (ARDOT) was used. Historical data on
of the model is to allow aggregate analysis of crash roadway characteristics was not publicly available and,
casual factors; thus, it was deemed most appropriate to thus, the most recent network file was used. However,
aggregate weather conditions to the TAZ level, and not within a TAZ, it is assumed that the total mileage by
the individual crash level. function class did not change considerably. As evidence,
One of the needs of the proposed model is to have according to the Highway Performance Monitoring
continuous data for all study periods. Since physical System (HPMS) for Arkansas between 2000 and 2014,
weather stations often do not collect data each day of the total highway mileage increased from 97,600 to
the year because of maintenance periods, gaps in tem- 102,595 mi, across the entire state (approximately 5%
poral coverage may be observed. Besides, not every TAZ increase over 53,000 square mile area over a 15-year
contains a weather station. Thus, the following process is span). All roadway characteristic data used for the con-
necessary to convert the ‘‘raw’’ weather data from NCEI struction of this model was obtained from ARDOT. The
to monthly averages for each TAZ. For the weather sta- road links contained in a TAZ were collected using GIS
tion assignment to TAZ, in this paper, it was assumed software (e.g., ‘‘sum lines length’’) to calculate the total
that if there is at least one weather station within a TAZ, length of roadways by functional class within a TAZ in
then the parameters for that weather station can be miles. Pulugurtha et al. assumed that road links which
assigned to be representative of that TAZ. If more than overlapped with TAZ boundaries should not be consid-
one weather station were contained within a TAZ, then ered in their analysis because there was no approach to
the average weather parameters were calculated and attribute the segment to a single TAZ (13). Similar over-
assigned to that TAZ. If there were no weather stations lapping issues were found in the analysis in this study.
found within a TAZ, then the closest-neighbor analysis Unlike Pulugurtha et al., using GIS software, links over-
was performed to determine which of the neighboring lapping a TAZ boundary were identified, and the total
stations was the closest to the centroid of the TAZ, using traffic volume was proportionally split according to length
as maximum threshold the cut-off distance proposed in within each TAZ to each TAZ spanned by the link (13).
Akter et al. (47). Weather can be considered homoge- ADT and VMT were both collected from ARDOT for the
nous within a specified ‘‘cut-off’’ distance of 10–16 mi 15-year study period. ADT and VMT were aggregated by
around the weather station (48). year. ADT and VMT were not available at monthly aggre-
gation as they come from annual traffic counts.
Roadway Data. Roadway characteristics aggregated to
the TAZ include: (i) the total road length by roadway Built Environment Data. Built environment data includ-
functional class (e.g., interstate, other freeways and ing TAZ designation as urban or rural, population, and
expressways, other principal arterials, minor arterials, employment density can be obtained from the American
6 Transportation Research Record 00(0)

Figure 2. Example of disaggregation from census tract to traffic analysis zone (TAZ) level.

Community Survey 5-year estimate at the census tract TAZ for period t is independent of other times—provide
level (50). The built environment (employment, popula- incorrect estimation in the presence of over-dispersion
tion, and urbanicity) characteristics were included as and serial correlation, respectively. Although the NB
planning variables. Data on employment and population model accounts for over-dispersion conditions, it does not
were aggregated yearly. Data on the built environment allow location-specific effects or serial correlation over
was available for all 15 years of the study period. time for TAZ-level crash counts. If over-dispersion and
For Arkansas within the boundaries of a census tract, serial correlation exist in the data, RENB models are rec-
there are approximately one to five TAZs, and the major- ommended, since group-specific effects (i.e., TAZ-specific
ity of TAZ boundaries overlap with census tracts. To effects) are believed to be randomly distributed across
(dis)aggregate from the census tract to the TAZ level, it locations (52). Depending on how the effect deviates from
was assumed that socioeconomic traits are evenly distrib- the mean, and across time, serial correlation can be found
uted across the entire census tract. When a census tract to be positive or negative. Because of the heterogeneity of
intersected with a TAZ, the percent coverage of each cen- the TAZ group-location constraint, it can be assumed
sus tract within the TAZ was calculated (Figure 2). Then, that the number of crashes is related to location-specific
the percentage coverage is used to associate census- effects. Thus, in this study, the RENB model with n num-
derived variables to the TAZ. The resulting values were ber of group locations (e.g., TAZs) and tm periods (e.g., t
the weighted average of the built environment variables years with m months), appears appropriate for analyzing
aggregated by year. the crash frequencies with over-dispersion, at the same
Built environment characteristics were limited to what time accounting for location-specific and temporal effects.
could be garnered from public data sources that were con- From the parent NB regression model, the RENB can
sistent across the state. Because of the lack of available be derived by introducing a random location-specific and
land-use data for Arkansas, as a proxy, however, employ- time effect. The main benefit of using this approach is
ment density and population were used. Previous studies that the variance-to-mean ratio is not assumed equal or
claim that land-use data can be captured by employment constant across the group locations since they are ran-
densities, which measure how efficiently land is being used dom. The relationship between the estimated number of
per unit of employment in a geographical area (51). crashes in a TAZ and the covariance of an observation
unit i, in year t with month m, can be written as:
Modeling Approach
^itm = litm di
l ð1Þ
This paper uses an RENB regression model to assess the
relationship between the number of accidents within a where
TAZ and selected explanatory factors. A key contribution di represents random locations with specific effects.
of this paper is the use of longitudinal data spanning The non-negativity can be rewritten as:
15 years, which combines cross-sectional and temporal
characteristics at the TAZ level to assess realistic beha- ^itm = litm di = expðXitm b + hi Þ
l ð2Þ
vioral models that cannot be identified at the site-level.
Traditional count-based models—for example, Poisson where
and NB, which assume that the number of crashes in a b is the coefficient vector to be estimated;
Diaz-Corro et al 7

hi is the random effect across observations; and where


exp (hi ) is gamma-distributed with a mean of 1 and var- wi is the weight for the ith TAZ.
iance a, where a is the over-dispersion parameter in the The parameters a, b, and the coefficient vector b can be
NB regression model. estimated using any standard ML (maximum likelihood)
If a is not statistically significant from zero, then the algorithms (53).
NB is reduced to the Poisson distribution. On the other
hand, if a has a significant difference from zero, the var-
iance is greater than the mean and indicates the data to
Case Study: Factors Attributed to Crash
be over-dispersed or under-dispersed. Occurrence in Arkansas
The formulation for the RENB model for a crash The RENB model was applied to Arkansas to identify
occurrence at the TAZ level is as follows: Let yitm be the crash causal factors at the TAZ level. Arkansas is a rela-
number of crashes occurring in TAZ i for a given year tively rural state with a population of 3,017,804 people,
t with month m. Begin with the model yitm j gitm ; 16,481 mi of state-maintained roadway, and 102,594 mi
Poisson (gitm ), where gitm j di ;gamma (itm , 1=di ) with of public roads (54, 55). Arkansas ranks 12th in the U.S.
itm = exp(Xitm b+eitm ) and di is the dispersion para- by highway mileage.
meter, and exp (eitm ) expresses the gamma-distributed A total of 1,817 TAZs are included in the study, after
error with mean 1 and variance a. This yields the model: removing 150 TAZs for which weather data was not
Pr(Yitm = yitm jxitm , di ) available. Some of the physical weather stations in the
 itm  yitm state do not collect data each day of the year because of
ðitm + yitm Þ 1 di ð3Þ
= maintenance periods. Efforts were made to supplement
ðitm Þ ðyitm + 1Þ ð1 + di Þ ð1 + di Þ missing weather data for a TAZ with weather data from
Looking at within-TAZ effects only, this specification adjacent TAZs when appropriate. A previous study sug-
yields a negative binomial model for the ith census tract gested homogeneous weather distance thresholds for
with dispersion (variance divided by the mean) equal to Arkansas of 10–16 mi (47). This means that data from a
1 + di , that is, constant dispersion within a TAZ. weather station in a TAZ 10–16 mi from the TAZ with
In addition, to account for the variation of location missing weather data could be substituted for the missing
over time, the term ð1 +1 di Þ is assumed to follow a beta data. The 150 TAZs were removed because the weather
stations assigned to these TAZs were missing continuous
distribution with distributional parameters (a, b). Using
data, exceeded the 10–16 mi distance threshold, or both.
the results from the derivation of Hausman et al., the
The final dataset consists of 180 months, that is, 15 years,
joint probability of the counts for the ith TAZ is as shown
totaling 327,060 observations (Table 1). Crash data was
in Equation (4) (52).
Z ‘Y obtained from Arkansas State Police (ASP). In total,
Pr ðYi11 = yi11 , . . . : , Yitmi = yitmi jXi Þ =
ni 611,318 crashes were reported between January 2000 and
tm = 1 December 2014. The dependent variable (e.g., number of
0
Pr (Yitm = yitm jxitm , di )f ðdi Þ ddi crashes) ranged from zero to 883 crashes per TAZ for
 P P  the entire study period (e.g., 15 years) (Figure 3).
G½a + b: G a + T litm : G½b + yitm ð4Þ
= P PT
G½a: G½b: G½a + b + T litm + T yitm
Y G½lit + yitm  Results and Discussion
T
Glit G½yitm + 1 Model Diagnostics and Fits
For Xi = ðxi1 , . . . ::, xini Þ, and where f is the probability Models were estimated using Stata 16 (56). To ensure
density function for di , the resulting log-likelihood (LL) is: that there was no correlation among the independent
! variables, multicollinearity was checked before develop-
X
n X
ni ing the model. The variance inflation factor (VIF) was
lnL = wi ½lnða + bÞ + ln a + used as a proxy to indicate whether the independent vari-
i=1 k = 1ik
! ables correlated. Among all 13 variables introduced in
X
ni
the model, the largest VIF was 2.27. Any VIF greater
+ ln b + yik  lnðaÞ  lnðbÞ
k =1
than 5 is believed to be interpreted as highly correlated
! ð5Þ (57). The Wald chi-squared for the full random-effect
X
ni X
ni
 ln a + b + + yik Poisson and RENB model, and the likelihood ratio (LR)
k = 1ik k =1 chi-squared for the NB are significant, indicating that
X
ni the overall models are significant in all three cases. The
+ flnðitm + yitm Þ  lnðitm Þ  lnðyitm + 1Þg alpha (a) parameter in the NB model and the LR test of
k =1 a parameter in the Poisson model were significant, which
8 Transportation Research Record 00(0)

IRR is greater than 1.0, an increase in the value of the


variable is associated with a significant decline in safety.
Otherwise, the variable has no effect on safety (60). An
alternative source of interpretation of the resulting beta
coefficients is to examine their sign—for example, posi-
tive/negative. For instance, assuming an increasing unit
change in the independent variable, if the resulting beta
coefficient shows a negative relationship with the depen-
dent variable, then the crash risk decreases; whereas, if
the coefficient is positive, then the crash risk is interpreted
as increasing. Results showed that 11 variables have a sig-
nificant positive impact, while two variables have a signif-
icant negative impact on the total number of crashes in a
TAZ (Figure 4). All the variables in the final model are
highly significant in the RENB model (p ł 0.01).
Figure 3. Spatial distribution of crashes by traffic analysis zone
(TAZ) for 15 years (2000–2014) in Arkansas.
Weather Effects. Months with more precipitation and
higher wind speed were associated with an increase in
the number of crashes in a TAZ, consistent with the find-
indicates there was over dispersion in the dependent vari- ings of previous studies (28). From the computed IRR
able and, therefore, the Poisson model is not appropriate
value, a one-unit increase in the average precipitation
(Table 2).
and one-unit increase in average wind speed were associ-
Comparison of the NB and RENB models shows that
ated with increases of 134% (IRR value for average pre-
the RENB model outperformed the NB model in relation
cipitation is 2.34, e.g., [2.34 – 1.00] * 100 = 134%) and
to Akaike information criterion (AIC), the Bayesian infor-
16% in the total number of monthly crashes, while hold-
mation criterion (BIC), and LL. The Breusch and Pagan
ing all other variables constant. This was expected, since
Lagrangian multiplier test for random effects is significant,
large amounts of rainfall lead to hydroplaning, and
illustrating the RENB model’s appropriateness for this
strong winds limit the ability of drivers to control their
data set (58). In addition, the beta-distribution parameters
vehicle.
a and b were statistically significant in the RENB model,
The finding of an association between snow depth and
indicating the existence of autocorrelation between
crash occurrence was unexpected and counterintuitive.
multiple observations within a TAZ where ln(a) and
For instance, the number of days with snow accumula-
ln(b) are the inverse of 1 plus the dispersion assumed to
tion greater than 1 in. showed a negative relationship
follow a Beta (a, b) distribution (Table 2). Moreover,
with crash occurrence—for example, IRR value of 0.85
the F statistic of the Wooldridge test for serial correla-
indicating that a month with a snow day of greater than
tion is significant, indicating the suitability of the
1 in. was associated with a 15% reduction in total acci-
RENB model over the standard NB model and fixed-
dents per month, all other variables being equal. This
effect model (59). Finally, the likelihood test that com-
result may be specific to the weather patterns in
pares the panel estimator (RENB model) with the pooled
Arkansas, where snow accumulation is very rare, and can
estimator (NB model) was also significant, illustrating the
necessity of using the RENB model. Since the RENB be attributed to two possible conditions. First, this could
model outperforms the random-effects Poisson and NB be an issue with data quantity (not quality). As a rare
model in every aspect, the following section discusses the event, snow accumulation may alter how drivers behave
RENB results exclusively. on the roads, thus influencing their crash likelihood. This
counterintuitive result may be an indication of a latent
variable capturing driving behavior. Specifically, in
Parameter Estimates Arkansas, where the model framework was tested, there
To facilitate interpretation, the coefficients of the RENB are few days a month (or year) with snow accumulations
model have been transformed into incidence rate ratios greater than 1 in. Thus, because of the low frequency of
(IRRs) (i.e., eb rather than b). IRR values can be used to the event, the estimated coefficient may be misleading.
interpret the variables. If the IRR of a given variable is Second, since snow conditions are rare in Arkansas,
less than 1.0, then an increase in value of the variable is many drivers choose not to make a trip, thus reducing
associated with a significant improvement in safety, as crash exposure. This helps to explain why fewer crashes
measured by the number of crashes. Conversely, if the occur with snow accumulation. Crash counts are not
Diaz-Corro et al 9

Table 2. Model Results (Dependent Variable: Number of Crashes per Month per Traffic Analysis Zone [TAZ])

Coefficient
Variables RE Poisson NB RENB (IRR)

Weather effects
Average precipitation 1.201*** 1.979*** 0.850 (2.34) ***
Average wind speed 0.106** 2.988*** 0.152 (1.16) ***
Binary 1: if there is a day with snow depth . 1 in. 20.163*** 20.183*** 20.166 (0.85) ***
Average percent of possible sunshine 20.019*** 20.041*** 20.020 (0.98) ***
Roadway characteristics
Road length of interstate 0.003 0.001** 0.007 (1.01) ***
Road length of freeways and expressways 20.011*** 20.013*** 0.008 (1.01) ***
Road length of primary arterial 0.013** 0.011*** 0.006 (1.01) ***
Road length of minor arterial 0.01 0.007*** 0.007 (1.01) ***
Road length of major collector 0.005** 0.004*** 0.002 (1.00) ***
Ln (average daily traffic [ADT]) 0.022 20.004*** 0.021 (1.02) ***
Built Environment
Employment density 0.003 20.064*** 0.046 (1.00) ***
Total population 0.001 0.006*** 0.003 (1.00) ***
Binary 1: if TAZ is urban 0.024 0.060*** 0.066 (1.07) ***
Constant 0.479* 0.577*** 20.255***
Model diagnostic statistics
Wald chi2 (DF1 = 13) 431,546*** 22,768*** 12,447***
Ln (alpha) na 0.353*** na
LR test of alpha = 0 chi-squared (DF = 1) 1.1E05*** na na
LR test versus pooled (chi-squared) na na 2.5 E04***
Ln a na na 2.667***
Ln b na na 3.308***
a na na 14.39***
b na na 27.33***
Akaike information criterion (AIC) 1,311,614 1,140,686 1,121,188
Bayesian information criterion (BIC) 1,311,775 1,140,846 1,121,359
Log-likelihood (LL) 2655,792 2570,328 2560,578
Breusch and Pagan Lagrangian multiplier test for na na 1,373.11***
random effects chi-square (01)
Wooldridge test for serial autocorrelation F (1,1816) na na 8,252.17***
N (number of observations) 327,060
i (number of TAZs) 1,817
t (number of years) 15
m (number of months) 180

Note: a = beta-distribution parameter a ; b = beta-distribution parameter b; DF = degrees of freedom; IRR = incidence rate ratio; Ln a = inverse of 1 plus
the beta-distribution parameter a; Ln b = inverse of 1 plus the beta-distribution parameter b; Ln (alpha) = natural log of the over-dispersion parameter; LR
= likelihood ratio; na = not applicable; NB = negative binomial; RENB = random effect negative binomial; RE Poisson = random effect Poisson model.
*
significance at 10%.
**
significance at 5%.
***
significance at 1%.

necessarily higher in snowy weather when compared with Roadway Characteristics Effects. The statistically significant
dry weather as indicated in previous studies (60–63). variables in the model for roadway characteristics were
The IRR value for the variable associated with per- total length of interstate, freeways and expressways, pri-
centage of possible sunshine indicates that 1% increase mary arterial, minor arterial, major collector, and ADT.
in the maximum amount of sunshine possible from sun- The results showed a positive relationship between the
rise to sunset with clear sky conditions was associated number of crashes and the total length of the roadway by
with 2% reduction in total accidents per month, indicat- functional class in a TAZ; however, the magnitude of this
ing that, under clear sky conditions, drivers have better impact is very low, around 1%. The highest effect among
visibility. Note that this variable does not indicate sun roadway variables was found for ADT—for example, the
glare, which could lead to a different interpretation in IRR value indicates that 1% increase in annual ADT
relation to crash occurrence. Average temperature was was associated with 2% increase in monthly accidents.
not found to be significant in the model, consistent with The positive coefficient for all variables representing
the findings of previous studies (64). roadway characteristics was consistent with expectations
10 Transportation Research Record 00(0)

vulnerable populations. The diversified activities and ser-


vices offered in urban areas may attract people from
external/surrounding areas who may be unfamiliar with
driving conditions and navigation for denser urban net-
works. Moreover, there tend to be more intersections and
road segments with strong street compactness and mixed
land use leading to more exposure to accidents (66).

Models for Crash Severity Type


It has been suggested in previous literature that factors
vary by crash severity when predicting crash occurrence
(38, 67). This work extends the RENB for total crashes by
crash severity to understand crash casual factors. Three
models were developed: (i) fatal; (ii) injury only; and (iii)
PDO. The same methodology used for the total crash
Figure 4. Percentage change in number of monthly crashes model was applied. The Wald chi-squared for all three
(based on incidence rate ratio [IRR]).
Note: ADT = average daily traffic; TAZ = traffic analysis zone.
RENB models are significant, indicating that the overall
models are significant in all three cases (Table 3). Like the
total crash frequency model, all other model diagnostics for
since traffic exposure (e.g., volume) is associated with the crash severity models supported the necessity of using
higher crash occurrence (12). The model results also RENB model. Figure 5 shows the coefficients of the RENB
found a positive relationship between VMT and crash models by crash severity type, transformed into IRRs.
occurrence. Yet, VMT caused multicollinearity when The coefficient relationship (positive, negative) with the
included in the model alongside ADT. dependent variable for all-weather effects are consistent
Since the model aims to evaluate crash occurrence at for all models. This implies that weather effects have the
the TAZ level, and not the roadway segment level, expla- same relationship to the number of crashes by severity. In
natory variables that represented aggregate exposure and relation to the magnitude of impact, the average precipita-
roadway network characteristics within a TAZ were tion is associated with 190% more fatal crashes than PDO
prioritized. Thus, the decision to use ADT over VMT as (104%) and injury (62%) crashes, whereas average wind
the traffic exposure variable allows for separate use of speed is associated with a 19% increase of PDO crashes
roadway length by functional class and ADT within the when compared with fatal (12%) and injury (5%). Like
set of explanatory variables. Notice, it is more likely that the total crash frequency model, a month with a snow day
ADT data is missing or not observed, so treating it sepa- of greater than 1 in. was associated with a 26% reduction
rately from roadway length allows use of roadway length in fatal accidents per month, 16% in injury, and 15% in
even when ADT is missing. Results showed that both PDO crashes, all other variables being equal.
ADT and roadway length, as represented in the model, The coefficient relationship for roadway characteris-
influence crash occurrences (Table 2). tics differs among the models. The most significant
change is found at the PDO type of crashes where the
Built Environment Characteristics Effect. Total population roadway length by functional class is negatively related
and employment density were statistically significant and to the number of crashes of this severity type. This
had positive signs. This indicates that TAZs with higher implies that exposure to roadways of this type in a TAZ
population and employment density were associated might contribute to a small decrease in the number of
with a higher number of monthly crashes; however, the PDO crashes, except if it is a collector road. This makes
magnitude was low (IRR = 1.00 for both variables). sense, because property damage is less likely to happen at
Population and employment density were associated higher functional classes (interstate, freeways, etc.) than
with a higher risk of pedestrian collisions (65). at a more local level. For fatal crashes, the highest effect
The binary variable for the TAZ designation as urban among roadway variables was found for interstates—for
was positive and statistically significant, indicating that example, the IRR value indicates that a 1% increase in
urban TAZs were linked to a higher number of crashes. road length for this functional class was associated with a
A TAZ with urban characteristics was associated with 9% increase in monthly accidents. The highest effect
7% more crashes than its rural counterparts. This can be among roadway variables for injury crashes was found
attributed to urban area travel patterns. In urban areas, for ADT—for example, the IRR value indicates that a
non-motorized modes (walking, biking) can be used to 1% increase in annual ADT was associated with a 4%
access activities and jobs, possibly increasing exposure of increase in monthly injury accidents.
Diaz-Corro et al 11

Table 3. Random Effect Negative Binomial (RENB) Model Results Comparison by Crash Severity Type

RENB model results (IRR)


Variables Injury crashes PDO crashes Fatal crashes

Weather effects
Average precipitation 0.480 (1.62) *** 0.714 (2.04) *** 1.065 (2.90) **
Average wind speed 0.050 (1.05) *** 0.172 (1.19) *** 0.111 (1.12) *
Binary 1: if there is a day with snow depth . 1 in. 20.181 (0.84) *** 20.159 (0.85) *** 20.30 (0.74) ***
Average percent of possible sunshine 20.013 (0.99) *** 20.020 (0.98) *** 20.014 (0.99) ***
Roadway characteristics
Road length of interstate 0.007 (1.01) ** 20.024 (0.98) *** 0.088 (1.09) ***
Road length of freeways and expressways 0.001 (1.00) 20.020 (0.98) *** 0.068 (1.07) ***
Road length of primary arterial 0.008 (1.01) *** 20.012 (0.99) *** 0.075 (1.08) ***
Road length of minor arterial 0.012 (1.00) *** 20.007 (0.99) *** 0.056 (1.06) ***
Road length of major collector 0.003 (1.02) *** 0.001 (1.00) ** 0.015 (1.02) ***
Ln (ADT) 20.036 (1.04) *** 0.017 (1.02) *** 20.003 (1.00)
Built environment
Employment density 20.014 (0.99) 0.077 (1.08) *** 20.279 (0.76) ***
Total population 0.003 (1.00) 0.000 (1.00) 0.003 (1.00)
Binary 1: if TAZ is urban 0.046 (1.04) 20.080 (0.92) *** 20.218 (0.80) ***
Constant 0.339*** 20.327*** 20.254
Wald chi2 (DF1 = 13) 1,976*** 10,782*** 1,167***
LR test versus pooled (chi-squared) 5.9 E04*** 2.6 E04*** 1.04 E3***
Ln a 1.217*** 2.062*** 5.972***
Ln b 20.787*** 2.784*** 0.175**
a 3.38*** 7.86*** 392.13***
b 0.455*** 16.18*** 1.19**
Akaike information criterion (AIC) 354,218 1,006,690 34,052
Bayesian information criterion (BIC) 354,389 1,006,861 34,223
Log-likelihood (LL) 2177,093 2503,329 217,010
Breusch and Pagan Lagrangian multiplier test for 1.1E+06*** 9.2E+05*** 1,373.11***
random effects chi-square (01)
Wooldridge test for serial autocorrelation F (1,1816) 407.971*** 3,553.41*** 3.354*
N (number of observations) 327,060 327,060 327,060
i (number of TAZs) 1,817 1,817 1,817
t (number of years) 15 15 15
m (number of months) 180 180 180

Note: a = beta-distribution parameter a; b = beta-distribution of parameter b; DF = degrees of freedom; IRR = incidence rate ratio; Ln a = inverse of 1 plus
the beta-distribution parameter a; Ln b = inverse of 1 plus the beta-distribution parameter b; Ln (alpha) = natural log of the over-dispersed parameter
alpha; LR = likelihood ratio; PDO = property damage only; TAZ = traffic analysis zone.

Lastly, the coefficients for the built environment show The study used data from on-the-ground weather sta-
a differing effect by crash severity. For injury crashes, tions in Arkansas supplied by NOAA. Other forms of
none of the variables for the built environment were weather data are available. For example, gridded weather
found statistically significant. For fatal crashes, employ- data is available from other sources (NASA’s Modern-
ment density was associated with a 24% reduction in the Era Retrospective analysis for Research and Applications,
number of fatal accidents, whereas a TAZ with urban Version 2 [MERRA-2]). Grid weather data could be an
characteristics was associated with a 20% reduction alternate source. A sensitivity analysis to determine the
compared with its rural counterparts. For PDO, employ- trade-off in accuracy of the station-based weather with
ment density was associated with an 8% increase in the increasing distance from the weather station was carried
number of PDO accidents, whereas a TAZ with urban out. For Arkansas, the average distance between a
characteristics was associated with an 8% reduction weather station and a TAZ centroid was 22 mi. The
compared with its rural counterparts. MERRA-2 satellite weather data was available for grids
of approximately 30 mi (50 km). This is a slightly lower
resolution than the weather station data; thus, on-the-
Sensitivity Analysis ground weather station data was used in this paper.
A key assumption in the model framework was the desig- To incorporate this data, the following pre-processing
nation of weather conditions to a TAZ. method was performed. When two or more weather
12 Transportation Research Record 00(0)

Considering the model was applied to Arkansas with its


unique weather patterns, applying the same approach to
another region can produce different results. It is worth-
while to be able to compare the significant but counterin-
tuitive results with other regions. Employment and total
population had no impact on crash occurrence. The
direction of the effect (positive or negative) of these esti-
mates were in line with prior studies, although the mag-
nitudes differ, since the RENB accounts for inherent
over-dispersion and serial correlation. Prior studies sug-
gest that the most consistently significant and influential
factors are precipitation, traffic exposure (e.g., ADT),
roadway geometric characteristics, and safety culture
characteristics. Goodness-of-fit comparisons referencing
AIC, BIC, and LL showed that RENB provides the best
Figure 5. Percentage change in number of monthly crashes by fit among Poisson and NB formulations. Model diagnos-
severity type (based on incidence rate ratio [IRR]). tics confirm the presence of over-dispersion and serial
correlation in the monthly crash occurrence data indicat-
ing the necessity of RENB model estimation.
stations were in a TAZ, their parameters were averaged.
Extensions of the RENB model developed for this
When there were no weather stations in a TAZ, the clo-
paper include disaggregation of crash types (severity and
sest within a specified threshold (65 mi for Arkansas) was
type) and further temporal synchronization of weather,
used (47). For the TAZs in this study, the maximum dis-
roadway, and built environment variables. Distinction of
tance from a weather station to a TAZ was 22 mi, thus
crashes by type would allow further identification of cau-
the threshold was not invoked.
sal factors that may be crash-specific. While in this paper
employment density was used as a proxy for land uses,
Conclusions as land-use data was not available, future work should
consider the inclusion of land-use mix to capture built
This study was uniquely able to identify causal factors
environment influences on crash occurrence and type.
that are associated with the number of crashes originat-
Another limitation is that the roadway network data
ing at the TAZ level in a relatively rural state. For crash
was time invariant, although the most up-to-date high-
analyses, count-models like Poisson and NB models may
way network map was used. If roadway data was avail-
be inappropriate for estimating causal factors as they do
not account for dispersion or consider location-specific able for multiple time periods, it could be synchronously
and serial correlation. To overcome these limiting matched to weather data. In relation to data, several dis-
assumptions, the RENB model was introduced and aggregations could improve insights derived from the
applied. The RENB model was applied to a dataset of model. First, employment data could be broken into
15 years of data, aggregated by month. Weather, road- more categories. Different employment types might dif-
way, and built-environment factors were gathered from ferently influence land use and thus accident occurrence.
NOAA, state databases, and the U.S, Census and Second, accident severity and type could be disaggre-
merged. gated. With the increasing availability of detailed digital
The significant causal factors found to contribute to GIS land-use data, trip productions and attractions from
increases in observed crashes include, in order of IRR- forecasted travel demand models, such extensions may
estimated magnitude: (i) average precipitation (a one unit become more feasible for urban and metropolitan
increase in average precipitation results in a 134% territories.
increase in total monthly crashes for a TAZ); (ii) average One concern about the findings of this model is that
wind speed (16%); (iii) urban designation (7%); (iv) traf- over the 15 years, several infrastructures and systemic
fic volume (2%); and (v) total roadway mileage (1% for changes could have taken place. Although the RENB
each functional class). These findings are indicative of model treats the data as a time series (e.g., monthly),
the need to implement real-time traffic management sys- variables related to roadway characteristics were
tems, such as changeable message signs broadcasting time-invariant (did not change over time). Crash coun-
forecasted adverse weather conditions, to alert drivers, termeasures and roadway infrastructure projects can theo-
to lessen the probabilities of crash occurrence. Snow retically affect crash occurrence. For this study, data on
depth and days of sunshine were found to decrease the changes in infrastructure over time were not available.
number of accidents by 15% and 2%, respectively. This is a potential limitation of the study, since changes in
Diaz-Corro et al 13

roadway infrastructure (added lanes, expanded shoulders, and Arkansas Department of Transportation (ARDOT) for
etc.) likely occurred over time. It is recommended that providing the roadway characteristics data.
future work incorporate infrastructure changes over time
if data to do so are available. Likewise, ADT and VMT Author Contributions
data were collected yearly and not monthly. Future studies
The authors confirm contribution to the paper as follows: study
can use monthly traffic counts from ITS systems or cell-
conception and design: S. Mitra, K. Diaz-Corro; data collection:
phone data, as these technologies are becoming increas- K. Diaz-Corro, L. Coronel; analysis and interpretation of results:
ingly common. K. Diaz-Corro, S. Mitra, S. Hernandez; draft manuscript prepara-
In the absence of continuous weather data, future tion: K. Diaz-Corro, S. Hernandez, S. Mitra. All authors reviewed
studies should investigate temporal characteristics related the results and approved the final version of the manuscript.
to weather that may capture seasonal variation in
crashes. For example, the percent of crashes in a month Declaration of Conflicting Interests
aggregated by day of the week (e.g., beginning of the The author(s) declared no potential conflicts of interest with
week, weekday, end of the week, weekend), type of day respect to the research, authorship, and/or publication of this
(e.g., holiday, working day), season (e.g., winter, spring, article.
summer, fall) could be used as indicators of crash occur-
rence without the requirement of detailed weather data. Funding
Additionally, when physical weather station data are not The author(s) received no financial support for the research,
available, it is suggested that other forms of weather authorship, and/or publication of this article.
data, such as NASA’s MERRA-2, are used as an alter-
native source. For Arkansas, this alternative source did ORCID iDs
not make a difference because the level of resolution was
larger than the maximum distance of the on-the-ground Karla J Diaz-Corro https://orcid.org/0000-0002-1936-4547
Leyla Coronel Moreno https://orcid.org/0000-0002-4299-7334
weather data used in this paper.
Suman Mitra https://orcid.org/0000-0002-7776-5779
By estimating crash causal factors at the TAZ level, Sarah Hernandez https://orcid.org/0000-0002-4243-1461
several policy and planning decisions concerning safety per-
formance can be generated. Considering that federal legisla-
tion in the U.S. requires performance-based planning, it is References
necessary to analyze safety at spatial levels of resolution 1. World Health Organization, Department of Violence &
that match those generated for mobility, accessibility, and Injury Prevention & Disability. Global Status Report on
other performance measures. The model framework pre- Road Safety: Time for Action. World Health Organization,
sented in this paper identifies if there are weather, roadway, Geneva, 2009.
and built environment characteristics that can be associated 2. Sivak, M., and B. Schoettle. Mortality from Road Crashes
in the Individual US States: A Comparison with Leading
with crash occurrences. The results from the model frame-
Causes of Death in 2015. The University of Michigan, Sus-
work can be implemented by state transportation agencies tainable Worldwide Transportation, Ann Arbor, MI,
to prioritize safety-related projects. For example, if crashes 2018, pp. 1–36.
are frequently occurring on roadway types with extreme 3. Federal Highway Administration (US). Highway Statistics
weather conditions (e.g., severe rainfall or precipitation pro- 2004. Federal Highway Administration, Washington, DC,
ducing hydroplaning), this combination of characteristics 2006.
can be considered causal factors. Then, locations (e.g., 4. Yang, D., E. Kastrouni, and L. Zhang. Equitable and Pro-
TAZs) can be prioritized for implementing low-cost safety gressive Distance-Based User Charges Design and Evalua-
treatments—for example, signage, surface treatments, and tion of Income-Based Mileage Fees in Maryland.
drainage improvements. The importance of identifying the Transport Policy, Vol. 47, 2016, pp. 169–77.
5. Evenson, K. R., S. LaJeunesse, and S. Heiny. Awareness
causal factors for transportation planning organizations is
of Vision Zero Among United States’ Road Safety Profes-
that there is a requirement for performance-based planning
sionals. Injury Epidemiology, Vol. 5, No. 1, 2018, pp. 1–6.
at the same spatial resolution that matches other perfor- 6. Dingus, T. A., F. Guo, S. Lee, J. F. Antin, M. Perez, M.
mance metrics related to mobility (travel times), accessibil- Buchanan-King, and J. Hankey. Driver Crash Risk Fac-
ity, and so forth. Thus, the results of this model can tors and Prevalence Evaluation Using Naturalistic Driving
improve the ways in which state transportation agencies Data. Proceedings of the National Academy of Sciences of
prioritize safety-related projects. the United States of America, Vol. 113, No. 10, 2016,
pp. 2636–2641.
7. Papadimitriou, E., A. Filtness, A. Theofilatos, A. Ziako-
Acknowledgments poulos, C. Quigley, and G. Yannis. Review and Ranking
The authors thank Arkansas State Police (ASP) Highway of Crash Risk Factors Related to the Road Infrastructure.
Safety Office, for providing the crash data used in this paper Accident Analysis & Prevention, Vol. 125, 2019, pp. 85–97.
14 Transportation Research Record 00(0)

8. Anastasopoulos, P. C., and F. L. Mannering. A Note on Intersections. Journal of Traffic and Transportation Engi-
Modeling Vehicle Accident Frequencies With Random- neering (English edition), Vol. 3, No. 2, 2016, pp. 166–171.
Parameters Count Models. Accident Analysis & Prevention, 23. Coruh, E., A. Bilgic, and A. Tortum. Accident Analysis
Vol. 41, No. 1, 2009, pp. 153–159. with Aggregated Data: The Random Parameters Negative
9. Rakauskas, M. E., N. J. Ward, and S. G. Gerberich. Iden- Binomial Panel Count Data Model. Analytic Methods in
tification of Differences Between Rural and Urban Safety Accident Research, Vol. 7, 2015, pp. 37–49.
Cultures. Accident Analysis & Prevention, Vol. 41, No. 5, 24. Mannering, F. L., V. Shankar, and C. R. Bhat. Unob-
2009, pp. 931–937. served Heterogeneity and the Statistical Analysis of High-
10. Ratcliffe, M., C. Burd, K. Holder, and A. Fields. Defining way Accident Data. Analytic Methods in Accident
rural at the US Census Bureau. American Community Sur- Research, Vol. 11, 2016, pp. 1–6.
vey and Geography Brief. 2016. 25. Xu, P., H. Huang, N. Dong, and S. C. Wong. Revisiting
11. Fixing America’s Surface Transportation Act. In 114th Con- Crash Spatial Heterogeneity: A Bayesian Spatially Varying
gress of the United States of America, Vol. 6, January, 2015. Coefficients Approach. Accident Analysis & Prevention,
12. Mitra, S. Spatial Autocorrelation and Bayesian Spatial Sta- Vol. 98, 2017, pp. 330–337.
tistical Method for Analyzing Intersections Prone to Injury 26. Pirdavani, A., S. Daniels, K. Van Vlierden, K. Brijs, and B.
Crashes. Transportation Research Record: Journal of the Kochan. Socioeconomic and Sociodemographic Inequal-
Transportation Research Board, 2009. 2136: 92–100. ities and their Association with Road Traffic Injuries. Jour-
13. Pulugurtha, S. S., V. R. Duddu, and Y. Kotagiri. Traffic nal of Transport & Health, Vol. 4, 2017, pp. 152–161.
Analysis Zone Level Crash Estimation Models Based on 27. Sagar, S., N. Stamatiadis, S. Wright, and A. Cambron.
Land Use Characteristics. Accident Analysis & Prevention, Identifying High-Risk Commercial Vehicle Drivers Using
Vol. 15, 2013, pp. 678–687. Sociodemographic Characteristics. Accident Analysis &
14. Yu, R., Y. Xiong, and M. Abdel-Aty. A Correlated Ran- Prevention, Vol. 143, 2020, p. 105582.
dom Parameter Approach to Investigate the Effects of 28. Tefft, B. C. Motor Vehicle Crashes, Injuries, and Deaths in
Weather Conditions on Crash Risk for a Mountainous Relation to Weather Conditions, United States, 2010–2014.
Freeway. Transportation Research Part C: Emerging Tech- AAA Foundation for Traffic Safety. Washington, DC, 2016.
nologies, Vol. 50, 2015, pp. 68–77. 29. Saha, S., P. Schramm, A. Nolan, and J. Hess. Adverse
15. Washington, S., I. Van Schalkwyk, S. Mitra, M. Meyer, Weather Conditions and Fatal Motor Vehicle Crashes in
E. Dumbaugh and M. Zoll. Incorporating Safety into Long- the United States, 1994–2012. Environmental Health, Vol.
Range Transportation Planning. NCHRP Report 546. 15, No. 1, 2016, pp. 1–9.
Transportation Research Board of the National Aca- 30. Wong, J. T., and Y. S. Chung. Comparison of Methodol-
demics, Washington, D.C., 2006. ogy Approach to Identify Causal Factors of Accident
16. Naderan, A., and J. Shahi. Crash Generation Models: Severity. Transportation Research Record: Journal of the
Forecasting Crashes in Urban Areas. Transportation Transportation Research Board, 2008. 2083: 190–198.
Research Record: Journal of the Transportation Research 31. Ahmed, M. M., M. Abdel-Aty, and R. Yu. Assessment of
Board. 2010 2148: 101–106. Interaction of Crash Occurrence, Mountainous Freeway
17. Peera, K. M., R. S. Shekhawat, and C. S. Prasad. Traffic Geometry, Real-Time Weather, and Traffic Data. Trans-
Analysis Zone Level Road Traffic Accident Prediction portation Research Record: Journal of the Transportation
Models Based on Land Use Characteristics. International Research Board, 2012. 2280: 51–59.
Journal for Traffic and Transport Engineering (Belgrade), 32. Jovanis, P. P., and H. L. Chang. Modeling the Relationship
Vol. 9, No. 4, 2019, pp. 376–386. of Accidents to Miles Traveled. Transportation Research
18. Mukoko, K. K., and S. S. Pulugurtha. Examining the Record: Journal of the Transportation Research Board,
Influence of Network, Land Use, and Demographic Char- 1986. 1068: 42–51.
acteristics to Estimate the Number of Bicycle-Vehicle 33. Fitrianti, H., Y. P. Pasaribu, and P. Betaubun. Modeling
Crashes on Urban Roads. IATSS Research, Vol. 44, No. Factor as the Cause of Traffic Accident Losses Using Mul-
1, 2020, pp. 8–16. tiple Linear Regression Approach and Generalized Linear
19. Zhang, C., X. Yan, L. Ma, and M. An. Crash Prediction Models. IOP Conference Series: Earth and Environmental
and Risk Evaluation Based on Traffic Analysis Zones. Math- Science, Vol. 235, No. 1, 2019, p. 012030.
ematical Problems in Engineering, Vol. 2014, 2014, pp. 1–9. 34. Arbabzadeh, N., and M. Jafari. A Data-Driven Approach
20. Wang, C., L. Liu, and C. Xu. Developing a New Spatial for Driving Safety Risk Prediction Using Driver Behavior and
Unit for Macroscopic Safety Evaluation Based on Traffic Roadway Information Data. IEEE Transactions on Intelligent
Density Homogeneity. Journal of Advanced Transportation, Transportation Systems, Vol. 19, No. 2, 2017, pp. 446–460.
Vol. 2020, 2020, pp. 1–9. 35. Ye, X., K. Wang, Y. Zou, and D. Lord. A Semi-Nonpara-
21. Carter, D., D. Gelinne, B. Kirley, C. Sundstrom, R. Srini- metric Poisson Regression Model for Analyzing Motor
vasan, and J. Palcher-Silliman. Road Safety Fundamentals: Vehicle Crash Data. PLoS One, Vol. 13, No. 5, 2018, p.
Concepts, Strategies, and Practices that Reduce Fatalities e0197338.
and Injuries on the Road. Federal Highway Administration, 36. Li, Z., W. Wang, P. Liu, J. M. Bigham, and D. R. Ragland.
United States, 2017. Using Geographically Weighted Poisson Regression for
22. Roshandeh, A. M., B. R. Agbelie, and Y. Lee. Statistical County-Level Crash Modeling in California. Safety Sci-
Modeling of Total Crash Frequency at Highway ence, Vol. 58, 2013, pp. 89–97.
Diaz-Corro et al 15

37. Quddus, M. A. Modelling Area-Wide Count Outcomes Columbus, Ohio. Journal of Urban Planning and Develop-
with Spatial Correlation and Heterogeneity: An Analysis ment, Vol. 141, No. 4, 2015, p. 04014040.
of London Crash Data. Accident Analysis & Prevention, 52. Hausman, J. A., B. H. Hall, and Z. Griliches. Econometric
Vol. 40, No. 4, 2008, pp. 1486–1497. Models for Count Data with an Application to the Patents
38. Siddiqui, C., M. Abdel-Aty, and K. Choi. Macroscopic R&D Relationship. National Bureau of Economic
Spatial Analysis of Pedestrian and Bicycle Crashes. Acci- Research, 1984.
dent Analysis & Prevention, Vol. 45, 2012, pp. 382–391. 53. Cameron, A. C., and P. K. Trivedi. Regression Analysis of
39. Zeng, Q., H. Wen, H. Huang, and M. Abdel-Aty. A Baye- Count Data. Cambridge University Press, 2013.
sian Spatial Random Parameters Tobit Model for Analyz- 54. U.S. Census Bureau. QuickFacts. www.census.gov/quick-
ing Crash Rates on Roadway Segments. Accident Analysis facts/AR. 2020.
& Prevention, Vol. 100, 2017, pp. 37–43. 55. Arkansas Department of Transportation. 2014 Facts Sheets.
40. Yakovlev, P. A., and M. Inden. Mind the Weather: https://www.arkansashighways.com/Trans_Plan_Policy/poli
A Panel Data Analysis of Time-Invariant Factors and cy_legis/publications/fact_sheets/2014_fact_sheet.pdf
Traffic Fatalities. Economics Bulletin, Vol. 30, No. 4, 2010, 56. StataCorp. 2019. Stata Statistical Software: Release 16.
pp. 2685–2696. College station, TX: StataCorp LLC.
41. Venkataraman, N. S., G. F. Ulfarsson, V. Shankar, J. Oh, 57. Stine, R. A. Graphical Interpretation of Variance Inflation
and M. Park. Model of Relationship Between Interstate Factors. The American Statistician, Vol. 49, No. 1, 1995,
Crash Occurrence and Geometrics: Exploratory Insights pp. 53–56.
from Random Parameter Negative Binomial Approach. 58. Breusch, T. S., and A. R. Pagan. A Simple Test for Hetero-
Transportation Research Record: Journal of the Transporta- scedasticity and Random Coefficient Variation. Econome-
tion Research Board, 2011. 2236: 41–48. trica: Journal of the Econometric Society, Vol. 47, No. 4,
42. Venkataraman, N., G. F. Ulfarsson, and V. N. Shankar. 1979, pp. 1287–1294.
Random Parameter Models of Interstate Crash Frequen- 59. Wooldridge, J. M. Econometric Analysis of Cross Section
cies by Severity, Number of Vehicles Involved, Collision and Panel Data. MIT Press, Cambridge, MA. 2010.
and Location Type. Accident Analysis & Prevention, Vol. 60. Eisenberg, D., and K. E. Warner. Effects of Snowfalls on
59, 2013, pp. 309–318. Motor Vehicle Collisions, Injuries, and Fatalities. American
43. Mohammadi, M. A., V. A. Samaranayake, and G. H. Journal of Public Health, Vol. 95, No. 1, 2005, pp. 120–124.
Bham. Crash Frequency Modeling Using Negative Bino- 61. Brown, B., and K. Baass. Seasonal variation in frequencies
mial Models: An Application of Generalized Estimating and rates of highway accidents as function of severity.
Equation to Longitudinal Data. Analytic Methods in Acci- Transportation Research Record: Journal of the Transporta-
dent Research, Vol. 2, 2014, pp. 52–69. tion Research Board, 1997. 1581: 59–65.
44. U.S. Department of Transportation, Federal Highway 62. Fridstrøm, L., J. Ifver, S. Ingebrigtsen, R. Kulmala, and L.
Administration, Office of Planning, Environment, and Real- K. Thomsen. Measuring the Contribution of Randomness,
ity. Travel Model Improvement Program (TMIP). TMIP Exposure, Weather, and Daylight to the Variation in Road
Email List Technical Synthesis Series 2007–2010. 2014. Accident Counts. Accident Analysis & Prevention, Vol. 27,
45. Yu, R., M. Abdel-Aty, and M. Ahmed. Bayesian Random No. 1, 1995, pp. 1–20.
Effect Models Incorporating Real-Time Weather and 63. Eisenberg, D. The Mixed Effects of Precipitation on Traf-
Traffic Data to Investigate Mountainous Freeway fic Crashes. Accident Analysis & Prevention, Vol. 36, No. 4,
Hazardous Factors. Accident Analysis & Prevention, 2004, pp. 637–647.
Vol. 50, 2013, pp. 371–376. 64. Usman, T., L. Fu, and L. F. Miranda-Moreno. Quantify-
46. National Oceanic and Atmospheric Administration and ing Safety Benefit of Winter Road Maintenance: Accident
National Centers for Environmental Information. Histori- Frequency Modeling. Accident Analysis & Prevention, Vol.
cal Palmer Drought Indices, National Centers for Environ- 42, No. 6, 2010, pp. 1878–1887.
mental Information, 2016. 65. Quistberg, D. A., E. J. Howard, B. E. Ebel, A. V. Moudon,
47. Akter, T., S. K. Mitra, S. Hernandez, and K. Corro-Diaz. B. E. Saelens, P. M. Hurvitz, J. E. Curtin, and F. P. Riv-
A Spatial Panel Regression Model to Measure the Effect of ara. Multilevel Models for Evaluating the Risk of Pedes-
Weather Events on Freight Truck Traffic. Transportmetrica trian–Motor Vehicle Collisions at Intersections and Mid-
A: Transport Science, Vol. 16, No. 3, 2020, pp. 910–929. Blocks. Accident Analysis & Prevention, Vol. 84, 2015,
48. Datla, S., and S. Sharma. Impact of Cold and Snow on pp. 99–111.
Temporal and Spatial Variations of Highway Traffic 66. Dai, D., E. Taquechel, J. Steward, and S. Strasser. The
Volumes. Journal of Transport Geography, Vol. 16, No. 5, Impact of Built Environment on Pedestrian Crashes and
2008, pp. 358–372. the Identification of Crash Clusters on an Urban Univer-
49. Federal Highway Administration (FHWA). Highway Sta- sity Campus. Western Journal of Emergency Medicine, Vol.
tistics Series Publications, 2000–2016. State Motor-Vehicle 11, No. 3, 2010, p. 294.
Registrations, Washington, DC, 2017. 67. Hadayeghi, A., A. S. Shalaby, and B. N. Persaud. Safety
50. U.S. Census Bureau. American Community Survey (ACS), Prediction Models: Proactive Tool for Safety Evaluation in
Five-Year Estimates, 2000-2014, Washington, DC, 2014. Urban Transportation Planning Applications. Transporta-
51. Lu, J., and J. M. Guldmann. Employment Distribution tion Research Record: Journal of the Transportation
and Land-Use Structure in the Metropolitan Area of Research Board, 2007. 2019: 225–236.

You might also like