2016-Wim DQM Brazil (Paper19 - Icwim8)
2016-Wim DQM Brazil (Paper19 - Icwim8)
2016-Wim DQM Brazil (Paper19 - Icwim8)
Abstract
Since the start of the present decade, the Brazilian federal government expanded its use of
WIM systems for highway planning and operations, and over 350 new high-speed WIM sites
should be implemented by the National Department of Transportation Infrastructure (DNIT)
by 2017. Motivated by this situation, DNIT and the Transportation and Logistics Laboratory
at the Federal University of Santa Catarina (LabTrans/UFSC) developed a prototype of an
automated data quality management system, capable of performing effective and
efficient quality management of the federal WIM network in Brazil. This paper describes the
development of quality criteria adapted to the Brazilian road traffic conditions, their
implementation in standard quality checks and the construction of a prototype computer tool
for automated data quality management.
Resumo
Com o início da presente década, o governo federal Brasileiro expandiu o uso de sistemas
WIM para planejamento e operações de rodovias, e mais de 350 novos sistemas WIM de alta
velocidade devem ser implementados pelo Departamento Nacional de Infraestrutura de
Transportes (DNIT) até 2017. Motivados por esta situação, o DNIT e o Laborátório de
Transportes e Logística da Universidade Federal de Santa Catarina (LabTrans/UFSC)
desenvolveram um protótipo de sistema automatizado de gestão de qualidade de dados, capaz
de realizar uma gerência efetiva e eficiente da qualidade da rede federal de sistemas WIM.
Este artigo descreve o desenvolvimento de critérios de qualidade adaptados às condições do
tráfego rodoviário no Brasil, a sua aplicação em verificações de qualidade e a construção de
uma ferramenta para gestão automatizada da qualidade dos dados.
In the present decade, motivated by a new wave of investments in road infrastructure, the
Brazilian Federal Department of Transportation Infrastructure (DNIT) focused its attention on
modernizing its methods for traffic data collection and truck weight enforcement with the
objective of providing more effective tools for roadway planning and operations. In this
context two national programs were launched:
PNCT - the National Plan on Traffic Count with the objective of providing permanent
traffic data collection in its road network and therefore improve decision-making for
investments in transportation infrastructure in Brazil. The PNCT consists of 320 traffic
data collection sites equipped with WIM systems throughout the federal road network
and;
PIAF - the Integrated and Automated Enforcement Stations, which consists of fixed weigh
stations with mainline WIM for screening of heavy vehicles. In its first phase, 35 PIAFs
have been contracted for weight enforcement and operations are expected to start in 2017.
Experience with the use of WIM systems show that system performance generally decays
over time, and these changes in performance have an effect on the accuracy and reliability of
the output of WIM systems. As WIM data will be used as input for different kinds of studies
and political decision making processes, the quality of these processes is directly dependent
on the quality of the WIM data collected. As a consequence, the establishment of effective
methods and tools for monitoring the quality of the WIM data collected is important in order
to ensure the quality of the results and conclusions drawn from the application of this data.
DNIT endorsed a project in collaboration with LabTrans and the consultancy of Corner Stone
International for the development of a prototype automated data quality management system
capable of performing effective and efficient quality management of its WIM network. The
project was divided into three main parts:
Development of potential quality criteria based on international experience and validation
under Brazilian conditions.
Implementation of statistical control charts based on quality criteria.
Implementation of prototype software tool.
For the development of quality criteria suitable for the Brazilian federal WIM network, a
study was conducted on existing WIM Quality Checks developed in different countries.
Research and applications from South Africa (De Wet, 2012), (Slavik, De Wet, 2012), the
United States (Nichols, Bullock, 2004), (FHWA, 2010) and the European Union (Telman,
Hordijk, 2013), (Lees, Van Loo, 2015) were analyzed with the objective of identifying
potential standard checks that could be adapted for Brazilian conditions. As a result of this
process, the following criteria were selected:
Initial assessments of monthly sets of data from PNCT have shown that some of sites
presented ‘gaps’ in the data collection, indicating that the system had been offline for several
days. The initial monthly check on traffic count will result in a warning if more than 72 hours
are registered with no vehicle records in a given month. Later implementations may
incorporate further quality checks based on vehicle counts per hour and per day, which have
been used for similar purposes in different countries (Walker, Cebon, 2012).
Further evaluation over PNCT sites with more than 10% of heavy vehicles classified as
‘others’ have shown an indication of issues in the measurements of axle distances, which
constitutes one of the most important inputs for the PNCT class division. The investigation
showed that in all of these cases, at least 90% of the vehicles classified as ‘others’ had at least
one of its axle distances measured below 1.00m, while Brazilian regulations establish a
minimum of 1,20m for heavy vehicles (DNIT, 2012). Therefore the initial checks
implemented in the system will generate a warning if over 10% of the detected vehicles are
classified as “others” in two or more consecutive days of a given month.
In order to be effective this check needs a statistically significant number of vehicles, hence a
very common vehicle class. The first analysis of a month of data from a number of PNCT
sites have confirmed that the six axle tractor semi-trailer combination (Class E1 in PNCT and
Class 3S3 in PIAF class division) suitable for this type of criteria besides being one of the
most common vehicle classes in all sites. In the analysis it was verified that most sites had an
average axle distance of between 125 and 126cm with a standard deviation of less than 2%.
Figure 1 – First axle load distribution - Class E1/3S3– Weigh Station 1608
The average load of the first axle of the 1.282 vehicles of Class E1/3S3 that entered the
station was 5.725kg with a standard deviation of 650kg. The graph shows a downward shift in
the measurements at the same time when the enforcement officers claimed operational issues
with the equipment.
Figure 2 – Daily averages of first axle loads - Class E1 - Site 40046 (PNCT) - Lane 1
The graph on Figure 2 shows the calculation of daily averages of first axle loads of six-axle
articulated heavy vehicles over the period from the August to September, 2014. This site
started operations in May and the reference line was drawn based on the averages of the first
month of data collection. The charts for the site present a relatively steady trend around the
reference line until the end of August. In the end of August and especially throughout
September, the daily averages started running further above the reference line, showing a
possible deterioration in the performance of the WIM system and the potential of the quality
check as a tool for detecting shifts in WIM data under Brazilian local conditions.
Besides the evaluations for the development of quality checks based on first axle loads,
similar evaluations were performed for checks on axle distance, classification and traffic
count processes. Hence, this stage of the analyses presented specific results that confirmed
initial assumptions regarding the suitability of certain quality checks, which supported the
execution of further analyses and developments.
The control charts for WIM data quality management were structured through the adaptation
of Statistical Process Control (SPC) techniques. In this context, Statistical Control Charts are
SPC tools that aim to detect abnormal variation due to circumstances that are not usual or
inherent in regular processes. In this methodology, the control charts take into consideration
the 1st Axle Load and Axle Distance WIM quality criteria, which are both applied with six-
axle tractor-semitrailer vehicle combinations (Class E1 or 3S3). These control charts are
based on the theory of Shewart’s control charts for variables (Montgomery, 2004), and the
application of the quality checks are performed through two distinct phases:
Phase I – Qualification of reference period.
Phase II – Evaluation of subsequent months of data collection.
3.1 Phase I
Phase I is performed for the first full month of data collection, immediately after the
calibration of the system. This period will be used as a reference for subsequent monthly
quality checks, so the consistency of the data is validated with the aid of charts based on daily
averages and standard deviations in combination with their respective lower and upper control
limits. Due to natural variation in local traffic conditions, the daily subgroups of data vary in
size, so the calculations of the central lines and the control limits take that into consideration.
The formulations for obtaining the upper and lower limits in Phase I are shown on Table 1:
x and s refer to the mean of the subgroup averages and standard deviations,
respectively;
n refers to the number of measurements per day;
c4 is a control chart constant that depends on subgroup size:
4 (n 1)
c4 (1)
4 n 3
k refers to the number of standard deviations to be drawn from the central line. Theory of
control charts states that for an industrial context with limited common cause variation,
the default value for k is 3. For control charts outside the industrial context with a larger
expected variation, a higher value for k can be chosen in order to minimize false warnings.
Thus, the calculation of the control limits in this methodology considers k = 4 and 5 for
the Axle Distance criteria and the 1st Axle Load criteria, respectively.
The establishment of the k values was based on empirical analyses and validation performed
over samples of WIM data in the scope of the project. These values were effective in
providing limits for the detection of unusual data variation without excessive false warnings.
Different values, however, may be adopted as further operation and validation takes place.
Figure 3 shows validation results of Phase I with the 1st axle criteria and WIM data from the
Araranguá WIM test-site managed by LabTrans in cooperation with DNIT. In this case, the
WIM site is composed by two lines of piezoquartz sensors:
Figure 3 – Phase I - Daily averages of 1st axle loads – E1/3S3 – Araranguá test site
In the chart, all points fall inside the established limits and the period is qualified as a
reference for the given criteria. The chart shows a widening of the control limits between
December 22nd and December 31st, which is due to the low volume of heavy vehicles during
the holiday season. The reliability of the checks becomes lower with a smaller amount of
measurements, so the method takes this into consideration and makes the control limits less
strict. Besides the quality check performed through control charts, the qualification of the
reference period in Process I is done with an extra set of fixed absolute limits in order to
verify basic calibration of the system:
1st Axle Load criteria: x < 4000 or x > 7000.
Axle Distances criteria: x < 120 or x >140.
According to the established methodology, quality checks on the reference month of data will
generate a quality warning if one point falls outside the control limits.
3.2 Phase II
Phase II involves quality checks on data collected in the months after the reference period.
The formulations for obtaining the upper and lower limits are shown in Table 2, where s
and x , as previously calculated. The values of k and the formulation for c4 remain the
same as in phase I.
The control charts for the subsequent months of WIM data collection will generate a quality
warning if a run of 3 consecutive points occurs outside of the control limits, which may
indicate a trend in the WIM data. Figure 4 shows the application of Phase II as part of the
validation tests of the methodology:
Figure 4 – Phase II - Daily averages of 1st axle loads – E1/3S3 – Araranguá test site
The figure shows a merge of control charts for the months of March and April, 2015. In the
given data collection process, the daily averages of the reference vehicle 1st axle loads
remained relatively stable until the end of March, but with a slight upward trend. In the first
three days of April a run of 3 consecutive points occurred outside the control limits,
generating a quality warning. An assessment of the performance of that WIM site with
hundreds of measurements from the low-speed enforcement site nearby confirmed an upward
shift in the system’s weight measurements. The mean difference in GVW measurements from
both systems went from 2.28% in March to 3.21% in April. These results indicate the
potential effectiveness of the quality check in pointing out trends in the WIM data.
A prototype computer tool was developed in order to automate the application of the
developed quality checks and facilitate the overall management of data quality. The large
volume of data received motivated a careful decision on the programming language and the
database tool to be used.
The chosen programming language was Python. For the database, in order to prevent
performance issues due to the large volume of data processing, the solution was found in non-
relational databases, which are often faster than traditional databases (Nayak et al., 2013).
New data is given to the software via processing of XML files. In a first stage, the
software reads all records from the XML file and insert them into the database under a
table named ‘Records’.
After processing the XML file, the data that was inserted is read again to perform a
grouping operation that joins all the readings in periods of fifteen minutes, one hour and
one day (only daily groupings are of interest to the quality assurance criteria). When
performing such groupings, a summary containing the number of vehicles crossing the site
and the mean value of all the weigh and axle distance readings is calculated and stored at
the corresponding tables. The summaries are also separated by vehicle class.
Data in the flow tables is accessed to generate indicators per day; these indicators are
calculated and represent parameters used as quality criteria for quality checks in each day.
Having the indicators ready, the system analyses each full month of records and calculates
the quality criteria, checking if any of the sites in any month has violated the calculated
limits. If there is a violation, the software creates a record in the alerts_per_month table,
which stores data quality warnings. A warning is defined as one record of quality criteria
violation. More than one alert can exist for the same site in the same month if it violated
more than one criterion.
As shown on the screenshot, the tool generates a list of quality warnings immediately after the
data is imported. These warnings may be directly accessed and visualized in the form of
control charts, as shown on the right side of the image. From the warning list the user may
also generate reports in editable format for comments and interaction with stakeholders.
5. Conclusions
A prototype of a Data Quality Management System has been developed by DNIT and
LabTrans for efficient quality control of measurement data from DNIT’s WIM network.
A first set of quality criteria were identified based on international research and adapted to
the Brazilian road traffic conditions based on data from the first operational PNCT sites.
The selected quality criteria were converted to statistical control charts that were verified
using a few months of data from a number of PNCT sites and the Araranguá test-site.
The control charts were implemented in a prototype software tool for automated data
quality management and remote monitoring of all sites in DNIT’s WIM network.
The tool has shown potential for supporting the management of WIM network, improving
the quality of the collected WIM data, and providing a guarantee for its applications;
The next stage of the project include a large scale test operation of the prototype computer
tool within all PNCT sites in order to fully assess the efficiency of the methodology and
the computer tool in managing the quality of a large WIM network.
In the test operation, further assessment and adjustments in the methodology for quality
checks will be made as well as an evaluation of the implementation for a full production
system.
6. References