0% found this document useful (0 votes)

54 views

Roadway Surface Profiling Using An Onboard Data Logger

Uploaded by

NaufilHassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

Roadway Surface Profiling Using An Onboard Data Logger

Uploaded by

NaufilHassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

Roadway Surface Profiling Using an On-board Data Logger

Master of Science in Data Sciences

Naufil Hassan

MSDS17009

Session: 2017 – 2019

DEPARTMENT OF COMPUTER SCIENCE

INFORMATION TECHNOLOGY UNIVERSITY
LAHORE, PAKISTAN
Roadway Surface Profiling Using an On-board Data Logger

A thesis submitted in partial fulfillment of the requirements for the

Degree of Master of Science in
Data Sciences

Naufil Hassan

Dr. Suleman Mazhar

Dr. Waqas Sultani

Ms. Amna Batool

ii
Declaration

This thesis is a presentation of my original research work. Wherever contributions of

others are involved, every effort is made to indicate this clearly, with due reference to the
literature, and acknowledgement of collaborative research and discussions. I also declare
that this work is the result of my own investigations, except where identified by references
and free from plagiarism of the work of others.

Signature: ………….…….
Student Name
Date: ………………..…....

iii
The undersigned hereby certify that they have read and recommend the thesis entitled
“Roadway Surface Profiling Using an On-board Data Logger” by Naufil Hassan for the
degree of Master of Science in Data Sciences.

_________________________________
Dr. Suleman Mazhar (ITU), Thesis Supervisor

_________________________________
Dr. Waqas Sultani (ITU), Co-supervisor and Committee Member

_________________________________
Ms. Amna Batool (ITU), Committee Member

_________________________________
Dr. Faisal Kamiran (ITU), Chairperson of the Department

iv
Acknowledgments

I would like to acknowledge support and supervision of Dr. Suleman Mazhar, who
not only helped me do some good research but also enabled me to contribute to the
world. The two publications I have from this thesis are also a result of his consistent
encouragement. Dr. Waqas Sultani and Dr. Mohsin Ali have been very thoughtful
with their productive comments on improving the quality of this work. I would also
like to acknowledge the efforts put into developing the basic framework of this project
by Mr. Haseeb Tariq, who worked on hardware development and initial data collection.
Ms. Hadia Hameed and Ms. Ifrah Siddiqui have also been very helpful in my work
and have co-authored my pulications.

iv
Table of Contents

Chapter Title Page

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Data Collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 System Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.5 Effect of Car Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.6 Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.7 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.8 Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.9 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.10 Auto Encoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1 Classification Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Class Instances Visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Generated Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Conclusion and Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1 Conclusion: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Recommendations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

7 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

v
List of Tables

Table Title Page

1 Abbreviations used in the thesis report with their details . . . . . . . . . viii

2.1 Brief literature review summarizing different papers in terms of ac-
curacies reported, techniques used and hardware used) . . . . . . . . . . . 6
3.1 Machine Learning Algorithms’ parameters . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Results of for reduced dimensional data by autoencoders . . . . . . . . . . 19
4.1 Results of Machine Learning Algorithms where Anomaly was treated
as a positive class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Results of Machine Learning Algorithms for multi-classification . . . 21
4.3 Results after balanced training with CNN . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Results after imbalanced training with CNN . . . . . . . . . . . . . . . . . . . . . . 22
5.1 SVM Results on full dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 SVM Results on 40% test-set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3 NB Results on full dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.4 RF Results on full dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.5 SLR Results on full dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.6 Results on full dataset with 93x3 dimensional data . . . . . . . . . . . . . . . 50
5.7 Results on 40% test-set with 93x3 dimensional data . . . . . . . . . . . . . . 50
5.8 Results on full dataset with 46x3 dimensional data . . . . . . . . . . . . . . . 51
5.9 Results on 40% test-set with 46x3 dimensional data . . . . . . . . . . . . . . 51
5.10 Results on full dataset with 26x3 dimensional data . . . . . . . . . . . . . . . 52
5.11 Results on 40% test-set with 26x3 dimensional data . . . . . . . . . . . . . . 52

vi
List of Figures

Figure Title Page

3.1 High level circuit diagram identifying the components of the sensor
alongwith the actual picture of the sensor . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Data Collection Setup showing two companions in the car where
one person is responsible for annotating data on the go. Also, the
pictures of real sensors being installed on the car . . . . . . . . . . . . . . . . . 9
3.3 Sensors mounted on car, near the tyre and on the dashboard of the
car. The sensor near the tyre is tied with the help of zips so that it
may not fell off or give absurd readings. . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Alignment of sensor’s axis with car are as shown in the picture and
the derived axis are calculated as per the given fomulae . . . . . . . . . . 10
3.5 Damping graph of car suspension showing the result of critical damp-
ing on the car suspension system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.6 Continuous Drive signal as observed during a drive on the road. The
anomalies are not usually this much close but this diagram gives an
idea how it would look like . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.7 Accelerometer patterns of different anomalies. . . . . . . . . . . . . . . . . . . . . 13
3.8 Block Diagram illustrating the steps involved in computing MFCC 15
3.9 MFCC coefficients of Normal and Anomaly Class . . . . . . . . . . . . . . . . . 15
3.10 Architecture of convolutional neural network . . . . . . . . . . . . . . . . . . . . . 16
3.11 Architecture of Auto Encoder network. LD represents the dimension
of latent vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1 Cateye Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Manhole Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 Pothole Signature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 Speed Bump Signature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 Generated map for drivers/passengers, showing the roads suitable
to drive comfortably . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.6 Generated map for road repairing authorities to identify the spots
require immediate repairing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

vii
List of Abbreviations

Table 1: Abbreviations used in the thesis report with their details

Sr. No. Abbreviation Details

1 SVM Support Vector Machine

2 NB Naive Bayes
3 SLR Simple Linear Regression

4 CNN Convolutional Neural Networks

5 AE Auto Encoders
6 RSD Roadway Surface Anomaly

7 TPR True Positive Rate

8 FPR False Positive Rate

viii
Abstract

Traditional road maintenance methods are costly; requiring expensive equipment and
manpower. Road quality categorization based on machine learning techniques, using
real-time opportunistic data gathered from inexpensive open-source inertial systems, is
a promising alternative. Existing open-source datasets for this problem are small and
less representative of actual situation where data is imbalanced and skewed towards
regular road surface instances. With the help of an inexpensive device and data col-
lection platform developed by our lab, I have collected a large, heterogeneous dataset
which is more realistic representative of the problem in real world settings. There are
four kinds of Roadway Surface Disruptions (RSDs) considered in this work, namely,
Cat eyes, Manholes, Potholes and Speed bumps. The feature set used consists of
spectral and MFCC features, kurtosis, skewness, time-series peaks and zero crossings.
Support Vector Machine (SVM), Simple Logistic Regression (SLR), Random Forest
(RF), Naive Bayes (NB) and Convolutional Neural Networks (CNN) were used for
classification. Auto-Encoders (AE) have also been used in this work to establish the
deterioration of accuracy with reducing the dimensionality of data. The best results are
reported by SVM with the True Positive Rate (TPR) of 95.2 %. These anomaly classi-
fication results can be used as a low-cost road maintenance solution by road repairing
authorities and the road quality maps thus generated can provide the passengers and
drivers with the information of most comfortable route for their journey. Hence, the
proposed unified classification framework will provide a solution to both of the target
audiences by considering relevant anomalies.

ix
Chapter 1
Introduction

1.1 Overview

A large amount of budget is spent on road infrastructure maintenance annually [1]. The
cost includes the money spent on municipality surveys for road quality assessment and repair
charges. According to American Association of Automobiles, pothole damages cost around
$15 billion to the drivers in the U.S only, in the past five years [2]. Keeping roadways
bump-free is a difficult task and requires proper information of road networks all over the
region. The unanticipated traffic loads, harsh weather conditions and usual deterioration, all
contribute to the degradation of roadways over a short-term.

As a passenger or a driver, one would want a comfortable and smooth ride with least wear
and tear of vehicle, but unfortunately current navigation systems are not capable of provid-
ing information of such a route. In addition, the constrained budget of road maintenance
authorities makes it difficult to maintain the quality of roads. This work addresses both
of these issues and presents a real-time system that can provide route quality maps to the
commuters and information of worn out roads to the related authorities.

1.2 Problem Statement

In this thesis I propose a framework for road-surface profiling using on-board data logger.
Using this classification framework, generate intelligent maps for road repairing authorities
and passengers/drivers.

1.3 Motivation

Maintaining good road infrastructure is a big problem for many countries because of absence
of some automated solution. There are rather special departments for surveys of different
areas regularly. Elimination of this tedious process can save a huge amount of budget and
efforts. This thesis aims to provide an efficient and low-cost solution to this problem, based
on ubiquitous data gathering with low to zero cost at all.

Another major issue is that drivers and passengers are spending a lot of money on repair of
their vehicle. Also a bumpy ride makes you tired and takes out all the joy of journey. If I
can somehow integrate road quality information into the navigation systems, it would be of

1
great help to the related audiences. This would also give rise to tourism where I would be
able to opt a better route and enjoy our journey.

1.4 Challenges

Currently there are many smartphone based solutions that claim to have solved the classifi-
cation problem; however they are only addressing a few events. Following are the challenges
in this work

• No benchmarking dataset publically available

• Collection of heterogeneous dataset [3]

• Proposing a unified solution for all cars

• Feature engineering to remove car dynamics dependency

• Designing feature agnostic architectures for better results

1.5 Research Contributions

The main contributions of this work are:

• Heterogonous dataset is gathered; representing the real-world scenario and shared pub-
licly for benchmarking [3]

• Statistical features have been explored in this domain and MFCC features’ use and
their clear reasoning has been provided that has helped us in removing car dynamics
dependency from the data. All of these features have significantly improved the results.

• CNN have been implemented which doesn’t require handcrafted features and Focal loss
is implemented for catering imbalanced dataset.

• System modelling is discussed and the major pointers are identified that can be very
helpful in improving the results.

1.6 Thesis Outline

The rest of the thesis is organized as follows:

• Chapter 2 presents related work to our research.

• Chapter 3 describes our method of data collection its description, data details and
machine learning techniques applied.

2
• Chapter 4 provides detailed results of our experiments.

• Chapter 5 presents conclusion and future recommendations of this work.

• Chapter 6 contains all the references that have been utilized in this work.

• Finally, Chapter 7 concludes our study.

3
Chapter 2
Literature Review

Classifying RSDs has been through many stages ranging from the use of camera based tech-
niques to static sensors. But the inertial embedded sensors are being used more recently, as
they are very cost effective and ensure constant road monitoring. Much of the work is done
using smart phone’s accelerometer data and then the signal processing techniques are applied.
Majorly the literature shows two paradigms for classification of road surface anomalies; (1)
Thresholding based heuristics and (2) Machine learning based approaches requiring feature
engineering on preprocessed data. A source for overviewing all these techniques can be [4]

Mohan et al. have used some thresholding based techniques to classify the RSDs [5]. They
identified that the major issue being faced is the lack of proper method for annotation of
the dataset, on the basis of which the classifiers are being trained. The paper attempts to
perform rich sensing which exploits the accelerometer, microphone and GPS sensor of mobile
phone to gather data that can then be localized with good precision. Eriksson et al. have
also used smartphones to collect data and then annotated it manually [6]. In their work,
they have tried to detect potholes, railway crossings, manholes, and extended joints. De
Silva et al. [7] proposed filter based system for pothole detection along the path traversed
by public transport buses. They named their system as BusNet. Mednis et al. [8] reported
the best heuristics with True Positive Rate of 92%. Bump detection accuracy of Astarita et
al. [9] reached 90%, however their false positive rate was about 35%. Onemaya et al. [10]
explored the relation between road roughness and acceleration patterns. They discovered
that major correlation between these two exists at low speed i.e. (<20km/h). Sinharay et al.
[11] explored the concerns of efficiently using the smartphone’s battery to take accelerometer
readings. They sampled the data at very low rates (4-6 Hz) and achieved 48% FPR with
TPR of about 44%.

Hadia et al. [12] proposed a system that classifies roadway surface anomalies using different
machine learning algorithms. They have reported best results with TPR of 95 for Honda
City, using FFT features and Decision Tree classifier. Gonzalez et al. used a novel approach
of representing the patterns of acceleration of RSDs by Bag of Words [13]. They have used
Artificial Neural Networks (ANN), Decision Trees (DT), Support Vector Machine (SVM),
Random Forest (RF), K-Nearest Neighbor (KNN), Naive Bayes (NB) and Kernel Ridge
(KR). However, ANN has given the best results for multi-classification of five distinct classes,
reporting TPR of 93.8%. Perttunen et al. [14] have explored Fast Fourier Transform (FFT)
and Mel Frequency Cepstral Coefficient (MFCC) based features along all of the three axes i.e.
x, y and z. The paper also reported the dependency of these features on vehicle’s speed and
had attempted to remove that dependency. They have reported FPR and FNR of 3 percent

4
and 18 percent respectively. Gonzalez et al. [15], [16] modeled the problem to differentiate
between five distinct classes using two Machine Learning algorithms. They reported an
average accuracy of 83.73% over several data sets using Artificial Neural Network. Logistic
Regression was also employed to compare the results with latter algorithm. Siraj et al. [17]
proposed audiovisual labelling for better management of data and some signal processing
techniques including wavelet decomposition to remove car effect from sensor’s data. Some
other papers that have contributed to the solution of detection roadway surface anomalies
are [18], [19], [20], [21], [22], [23], [24], [25], [26], [27].

Active machine learning is now being used in online systems for quickly adopting the learning
models with a changed environment [28] i.e. removing dependency of the vehicle. Transfer
learning techniques are also being investigated in labelling activity recognition data to reduce
the effort and cost [29]. In literature, the datasets available are somewhat synthetic because
they lack heterogeneity of classes. In the real-world scenario anomalies occur intermittently
during a drive and much of the journey contains a good amount of normal events. Hence, the
data must be skewed towards normal events rather than having equal proportion of all events.
There is a significant gap in the literature of application based generalized framework. The
proposed system by [12] was not generalized for all vehicles; rather it had a separate feature
set and a separate classifier for each vehicle. The recent work by [13] has only classified the
data based on z-axis acceleration patterns. Other works have only focused on distinguishing
between abrupt instances and normal events, while others have subcategorized the anomalies
into severe and mild. The literature does not show any feature agnostic approach, neither
has it showed a unified framework for all cars and catering all kinds of driver behaviors.
There is also no clear understanding of system dynamics; which could help in better feature
engineering. table 2.1 shows a brief review of literature.

5
Table 2.1: Brief literature review summarizing different papers in terms of accuracies
reported, techniques used and hardware used)

No.
Ref. Results Technique Device
of Vehicles

Hadia et al. [4], Anomalies Different Sensor

11
2018 (93 % TPR) ML Algorithms (93 Hz)

Gonzalez et al. Bump and Bag of Smartphone

12
[6], 2017 Pothole (93.8 % TPR) Words (50 Hz)

Perttunen et al. Anomalies SVM (RBF Smartphone

1
[8], 2011 (3% FPR, 18% FNR) kernel) (38 Hz)

Pothole Patrol [7], Potholes Sensor

7 Threshold
2008 (92.4% TPR) (380 Hz)

Bumps Sensor
Nericell [6], 2008 1 Threshold
(8% FPR, 41% FNR) (310 Hz)

Mednis et al. Potholes Smartphone

1 Threshold
[12], 2011 (92% TPR) (52 Hz)

Astarita et al. Bumps Smartphone

1 Threshold
[13], 2012 (93% TPR, Potholes 35% FPR) (5 Hz)

Fazeen et al. Five Smartphone

1 Threshold
[16], 2012 Classes (85.6% accuracy) (25 Hz)

Sinharay et al. Bumps Smartphone

1 Threshold
[15], 2013 (85% TPR) (4-6 Hz)

Gonzalez et al. Multiclass ANN, Smartphone

2
[18], 2018 (85.6% accuracy) Logistic Regression (50 Hz)

Seraj et al. [17], Anomalies SVM (RBF Smartphone

5
2014 (88.78% accuracy) kernel) (47 and 93 Hz), Sensor (200 Hz)

6
Chapter 3
Methodology

This section will cover every component of the proposed framework in detail. The organi-
zation of this section is as follows; Hardware details, Data Collection, Data Preprocessing,
System Modelling, Data Visualization and Feature Extraction.

3.1 Hardware

A low-cost, dedicated hardware is designed which logs the acceleration patterns and lon-
gitude/latitude coordinates using ADXL362 and VK2828U7G5LF sensors respectively. All
this data is stored in an on-board storage (SD card). The brain behind all the instructions
given to the sensors is PIC18F26K22 micro-controller. The data-logger is powered by 3.1Ah
Lithium-ion battery. Data is collected at 100Hz frequency but due to the inaccuracy of in-
ternal clock, there is a margin of 10% error. An average sampling rate of 93Hz is used for
maintaining consistency of the data Fig. 3.1.

3.2 Data Collection

This process consists of 12 cars, a driver, a co-pilot and two data-loggers. The first data-
logger is mounted outside the car, on its shock absorber, as shown in Fig. 3.3. The second
data-logger is placed inside the car, on the dashboard. The data is being annotated on-the-
go by co-pilot, using our own developed Graphical User Interface (GUI) in MATLAB. By
the end of the journey, a file is generated by the data-logger that contains vehicle’s speed,
GPS coordinates, time stamps and acceleration readings of 3-axes, per second. The GUI is
designed in such a way that the anomalies are associated with certain numbers, i.e. if a speed
bump is identified by a button labeled as ”5”, so now if the car traverses through the speed
bump, the co-pilot presses button ”5” as soon as the car hits the speed bump. The button
is kept pressed until the car has covered the full event and is released when car is back on
the normal road. In this way, the time is logged between the instances when the button
was pressed and then released. Based on this information I can compare the time stamps
from the file generated by GUI and data-logger to annotate the data. It was ensured that
each car gathers at least 50 instances of each RSD. For this purpose, the average journey
was 25-30km long and average time duration was 45 minutes. If the co-pilot has wrongly
tagged an anomaly, he can press the button for “Mistake” right after it and then I can correct
the label with manual inspection. Those readings of the sensor were labeled as “Normal”
for which no button was pressed. The 3-axes of accelerometer x, y and z are aligned with

7
Figure 3.1: High level circuit diagram identifying the components of the sensor along-
with the actual picture of the sensor

transversal, longitudinal and vertical axes of the car. Fig. 3.2 shows the experimental setup
for data collection.

3.3 Data Preprocessing

The data collected by data-logger is at 100Hz and has an error rate of 10%. So, in order to
maintain the consistency of data, I have linearly interpolated the number of samples to 93, in
every second. The time series data is then divided into non-overlapping chunks of 1 second
and these chunks are then labelled by comparing the time stamps logged by GUI. Similarly,
the mistakes encountered by human error were covered up by careful visual inspection.

In this work also incorporates combination of different axes and it hopes that these com-
binations hold very useful information. The derived axes are computed as shown in Fig.
3.4.

8
Figure 3.2: Data Collection Setup showing two companions in the car where one person
is responsible for annotating data on the go. Also, the pictures of real
sensors being installed on the car

Figure 3.3: Sensors mounted on car, near the tyre and on the dashboard of the car.
The sensor near the tyre is tied with the help of zips so that it may not
fell off or give absurd readings.

9
Figure 3.4: Alignment of sensor’s axis with car are as shown in the picture and the
derived axis are calculated as per the given fomulae

3.4 System Modelling

To obtain good results from a system, it is very important to understand its dynamics. The
signal I get from the accelerometer is actually a convoluted signal with additive noise in
it. Consider α(t), αevent (t), hcar (t) and α∗ (t) to be real valued 3D vectors with x, y and z
components.

∴ a(t) = [axt ayt azt ] (Equation 3.1)

The noise can be modeled with Gaussian as:

η(t) = N (0, σn2 ) (Equation 3.2)

where η(t) ∈ R3
So the signal I get from the sensor is:

a(t) = a∗ (t) + n(t) (Equation 3.3)

where a∗ (t) is the actual signal and can be represented as

a∗ (t) = aevent (t) ∗ h(t) (Equation 3.4)

3.5 Effect of Car Dynamics

Every car has a different comfort level depending on its quality and model. One of the major
contributing components towards ride quality is suspension system of a car. A bumpy ride

10
Figure 3.5: Damping graph of car suspension showing the result of critical damping
on the car suspension system

is always the one which makes the passengers tired and restless. This is because human
body is vulnerable to certain vibration frequencies in comparison to others and that is the
reason to isolate car cabin. The suspension system consists of spring and damper unit that
are responsible to minimize the shocks. Fig. 3.5 shows the damping effect of suspension in
comparison with other scenarios.

As different suspension systems have different damping responses, hence there is a significant
difference of comfort levels within different models of same car. Due to this the acceleration
pattern thus generated by car and recorded by data logger mounted inside the car has a
lot of variation. For a certain anomaly, there could be many signatures depending upon
the car, driver, speed etc. To overcome this problem in future, I placed another data logger
outside the car, below shock absorber. I believe that data recorded by the outside data logger
provides us with more appropriate acceleration patterns, disregarding car dynamics.

3.6 Data Visualization

From our sensor I get 3-axes accelerometer readings that are then processed and features are
extracted from them. As the data I gather exhibit time series information as shown in fig.
3.6, hence it provides us a brief description about signal’s shape. Fig 3.7 visualizations of
anomaly events considered in this work.

The continuous journey has a lot of Normal events and very little anomalies. A continuous

11
Figure 3.6: Continuous Drive signal as observed during a drive on the road. The
anomalies are not usually this much close but this diagram gives an idea
how it would look like

drive can be visualized as given in fig 3.6.

3.7 Feature Selection

Several features were extracted from the time series data that have been used in the literature.
As per the experimental setup, it is very clear that z-axis acceleration patterns will help in
classifying the anomalies. However, well established feature selection algorithms have been
used for ranking the potential features from all axes. The data has been dealt with in terms
of windows and each window is considered as an instance. Hence different features have been
extracted from each of the instances from five dimensions i.e. X, Y, Z, XY and XYZ., Group-
B contains time series peaks, Group-C contains Kurtosis, Group-D contains Skewness and
Group-E contains MFCC. However, the derived axes’ features are providing us with richer
information as they contain information pattern of multiple axes. The features are divided
into several groups where;

• Group A: Fast Fourier Transform

• Group B: Time Series Peak

• Group C: Kurtosis

• Group D: Skewness

• Group E: MFCC

• Group F: Zero Crossing

FFT help in classifying the events based on the frequencies present in them. Usually high
frequencies are present in anomalous events but this can be spurious. Literature has proposed
to pass the signal through High-Pass filter and then choose the features from it.

12
Figure 3.7: Accelerometer patterns of different anomalies.

13
Time series peaks play a crucial role while identifying sudden events. Most commonly z-axis
peaks are used as a feature. However, I are also taking care of other axes peaks. For example,
the events when driver tries to avoid an anomaly and takes a sharp turn, there is a significant
change in y-axis acceleration pattern. In this way, it can easily separate out False Negative
events.

Zero crossing was computed for classifying events that show very abrupt changes in the
acceleration patterns, i.e. Potholes and Cateyes.

n
m4 i=1 (Xi − Xavg )
4
P
Kurtosis = = n (Equation 3.5)
(m2 )2 ( ni=1 (Xi − Xavg )2 )4
P

n
m3 √ i=1 (Xi − Xavg )
2
P
Skewness = = n n (Equation 3.6)
(m2 )3/2 ( i=1 (Xi − Xavg )2 )3/2
P

The statistical features (i.e Kurtosis and Skewness) have proved to be very helpful in classi-
fication and this can be easily understood from their signatures in previous section where it
is very clear that Speed bump pattern has different tail as compared to other patterns and
Manholes have a skewed pattern. The equation used to compute these statistical features are
given by Equation 3.5 and Equation 3.6.

The cepstral features (i.e MFCC) also turned out to be helpful in classifying the RSDs.
MFCCs are usually used as features in speech recognition applications where they are used to
represent the vibrations of vocal tract. In the context of the work presented in this paper, our
approach is to use these MFCCs to get an accurate representation of the vibrations produced
when the vehicle passes through a certain RSD. Different anomalies will produce different
types of vibrations, so a true representation of these vibrations, can help in the classification
of these anomalies. The steps involved in the computation of MFCCs are shown in Fig. 3.8.
Computing MFCCs is governed by Equation 3.7.

r N
2 X π
M F CC(t, k) = log[Emel (t, i)]cos[k(i − 0.5) ] (Equation 3.7)
N i=1 N

Here N is the number of filters, Emel (t, i) is the ith filters energy at time t and the order of
filter is represented by k in the equation where k = 1, 2, 3, ....., p. I have used MATLAB’s
MIR tool box to compute MFCC features which gives us fixed length vector of 13 coefficients
for each instance. MFCC’s also proved to be useful for resolving the problem of convolution
of the two signals i.e signal of car response and actual anomaly response. Since MFCCs
represent the signals in frequency domain, so the time convoluted signals are transformed to
additive signals in the frequency domain. MFCC coefficients are actually the amplitudes of
Mel-frequencies, calculated by DCT and these amplitudes are treated as a feature to be fed
to the classifier. There is a significant difference among the MFCC coefficients of the signals

14
Figure 3.8: Block Diagram illustrating the steps involved in computing MFCC

Figure 3.9: MFCC coefficients of Normal and Anomaly Class

representing normal and anomaly class as shown in Fig. 3.9.

The feature vector of one window contains 284 features where FFT based features are ex-
tracted from the combination of axes information [12] i.e. FFT-XY and FFT-XYZ. The
feature vector is then passed through two feature selection algorithms namely, Sequential
Forward Selection (SFS) and Relief Algorithm. For the later, I have used implementation of
Relief-F from MATLAB Statistics and Machine Learning Toolbox.

3.8 Machine Learning Algorithms

Four different machine learning algorithms were used for these features. Feature dependent
algorithms only perform well if you have put the right feature set into the algorithm. Hence,
there is a need to design feature agnostic classifier for this problem as well. Table 3.8 shows
the hyper-parameters of different algorithms used in this thesis. The confusion matrices of
all these can be found in Appendix B and C, where the test-train split criteria being used is
10-fold cross validation.

15
Table 3.1: Machine Learning Algorithms’ parameters

Parameters
Classifiers
Setting

Support Kernel =

Vector Machine RBF, Cost = 23.16 , Gamma = 100.985

Random Trees = 10,

Forest Split Criteria = Gini Index

No priors
Naïve Bayes
stated

Simple Max boosting

Logistic Regression iterations = 500

Figure 3.10: Architecture of convolutional neural network

16
3.9 Convolutional Neural Networks

The CNN architecture I have implemented is a 5-layer architecture consisting of 3 convolu-

tional layers and 2 fully connected layers. The architecture is shown in Fig. 3.10. Equation
3.8, Equation 3.9 and Equation 3.10 represents the vector of an instance in 3-axes. Each
vector has length of 93 samples.

ax (t) = [ax (1)ax (2)ax (3).....ax (93)] (Equation 3.8)

ay (t) = [ay (1)ay (2)ay (3).....ay (93)] (Equation 3.9)

az (t) = [az (1)az (2)az (3).....az (93)] (Equation 3.10)

F ocalLoss = FL (pt ) = −(1 − pt )γ log(pt ) (Equation 3.11)

By using these vectors I form a matrix of dimensions 93x3 and feed it to architecture. Yi is
the output of neural network which belongs to vector A. I have the problem of imbalanced
dataset and to tackle it I have used Focal Loss [15] which handles the case of imbalanced
datasets very well. In the equation of focal loss, pt represents the probability of the class to
be predicted. The γ (gamma) factor is a hyper parameter that decides how much penalty
i.e. (1 − pt ) is to be given to the network for certain misclassified example.

This is naïve architecture and hence it is not able to surpass the results produced by support
vector machine. However this feature agnostic approach can be utilized in feature extraction,
by removing the last layer of the network and inputting the features to our discriminative
models.

For classifying data into multiple classes, I add four more neurons in the last layer and the
remaining setup remains the same.

Dropout layers were used in order to generalize our network for different set of instances,
i.e. different instances of cars. Relu activations helped with switching off the non-important
values of filters. I also tried with other activations such as sigmoid, leaky relu, tanh etc. but
the best results were obtained with relu. Also the FC layers were experimented with different
number of neurons and layers; however the most accurate and real-time results are reported
with this architecture.

17
3.10 Auto Encoders

The encoder-decoder based architecture helps in identifying the most representative features
of the data [30]. This architecture is also capable of removing noise from the data and
learning the latent space between the two distributions. There are three major parts of this
architecture; encoder network, latent space, decoder network.

Auto encoders can be designed either with FC layers or dense layers; depending upon the
nature of the problem. It is a usual practice to have decoder network similar to its encoder
part. For the problems where the latent space is very small in comparison to the input
space, skip connections are used from the encoder to decoder networks to keep the encoding
information intact. Such example of encoder-decoder architecture is U-Net [22], where skip
connections are used to retain the encoding information. Simple auto encoders learn a latent
space after encoding network and decode that latent space to match the input. A variant
of this architecture called as Variational Auto Encoder rather learns a distribution of latent
space and then decodes a sample from that distribution. Variational auto encoders can also
be used for generating more amounts of the data, once they have learned the distribution of
data.

In this work, I have utilized auto encoders as a tool for dimensionality reduction. The actual
1 sec window contains 93 samples of acceleration from each of the axis, hence totaling up to
93x3 samples. With the help of auto encoders, this 93x3 dimensional space is reduced to 46x3
and 26x3 dimensions in latent space. By reducing the data in a way that we keep the most
representative parts and discard others; do not deteriorate the results by a big difference.

I have finalized this after a series of experimentation with other different architectures and
variables. I started off with the convolutional based auto encoder which was able to ingest
whole 93x3 dimensional data in a single go and convert it to latent vector. This experiment
was failed as the data we have is probably not complex enough for such architectures. I then
moved on to use dense layers based auto encoders and finally converged to a simplest design,
shown in 3.11, that was able to serve the purpose.

As for the activations, I have used sigmoid function and mean squared error as a loss function.
Learning rate was set at 0.05 and 128 batch size was used. For each axis of acceleration,
there is a separate network that converts the input space to latent space. See appendix-E for
confusion matrices.

18
Figure 3.11: Architecture of Auto Encoder network. LD represents the dimension of
latent vector

Table 3.2: Results of for reduced dimensional data by autoencoders

Data
FPR FNR TPR
Dimension

93x3 2.6 4.8 95.2

46x3 3.02 8.62 91.38

26x3 2.75 10.95 89.05

19
Chapter 4
Experimental Results

This section includes results of multi-classification and binary classification using different
algorithms. It also covers the detailed analysis of different classes i.e. data visualization and
features’ distribution and finally the maps generated by this solution.

4.1 Classification Results

I have combined all the data from different cars, hence the total of 59916 instances are
present. The distribution of different classes is as follows; 93.2The results obtained from
different classifiers are tabulated in Table (II). The best results are reported by SVM with
TPR 95.2, FPR 2.551 and FNR 4.797. The confusion matrices of binary and multi-class
classification are shown in Table 4.1 and Table 4.1 respectively.

CNN training was tried with two different approaches where one was to test the results with
balanced training set and the other was to test with actual training set. The main idea
to train with balanced training set was to remove the biasness while training the network.
Data augmentation was used to generate balanced data for Anomaly class. The method for
data augmentation involved fitting Gaussian curve over each of the 93 samples of Anomaly
class. All the anomaly samples were stacked over each other and Gaussian characteristics (i.e.
mean and std. deviation) were learned for each sample. Then, new samples were generated
by varying the weight with the standard deviation as calculated by Equation 4.1. Table 6
shows the results after training with balanced dataset.

Generated_Anomaly(i) = µ(i) + w.σ(i) (Equation 4.1)

Equation 4.1 shows calculation of it h instance of generated anomaly sample. The weight w is
changed for each the sample randomly. Also, the mean µ(i) represents the mean of all the it h
instances and similarly σ(i) represents their standard deviation. Hence for generating a full
sample, equation Equation 4.1 is computed for all the instances and then they are combined
into a single vector.

The results were not good with respect to Anomaly class; however the Normal class was
largely misclassified. This possibly would have occurred because there is much similarity
between some anomalous events with normal events. As I removed the biasness from data,
the classifier did not learn much representative features of the classes; hence could not classify
them well.
The rectification of this problem was tried with the help of actual training set and to counter

20
Table 4.1: Results of Machine Learning Algorithms where Anomaly was treated as a
positive class

Data
FPR FNR TPR
Dimension

93x3 2.6 4.8 95.2

46x3 3.02 8.62 91.38

26x3 2.75 10.95 89.05

Table 4.2: Results of Machine Learning Algorithms for multi-classification

TPR
Classifier
Cateye Manhole Normal Pothole S.Bump

SVM 65.8 67.6 81.3 91.3 92.2

NB 39.2 23.29 88.2 24.6 78.1

RF 44.7 16.2 99.3 22.9 74.5

SLR 37.8 5.8 99.3 24.6 68.9

Table 4.3: Results after balanced training with CNN

Predicted Labels

Normal Anomaly
True
Normal 14061 38539
Labels
Anomaly 119 758

21
Table 4.4: Results after imbalanced training with CNN

Predicted Labels

Normal Anomaly
True
Normal 49653 2947
Labels
Anomaly 123 781

the imbalanced classes, focal loss was utilized. The results were improved for Normal class
and did not affect much the other class. Table 6 shows the results I obtained with imbal-
anced/actual training of CNN.

4.2 Class Instances Visualization

Different classes in the dataset have their distinct signatures and these signatures are very
intuitive to understand. For example, a speed bump is supposed to be making a sort of low
frequency sine wave in the z-axis of time domain signal. Similarly, a cateye is supposed to
have sudden sharp peaks etc.

But there are many variations within a same class, due to different shapes of road anomalies
throughout the journey. This subsection visualizes different instances of a certain class to get
an idea of spatial variance of an event within a window size. More instance examples can be
found in Appendix A.

4.3 Generated Maps

The main focus of this work was to solve problem for two target audiences including driver-
s/passengers and road repairing authorities. Fig. 4.5 shows the map generated by the system
for passengers/drivers. The color coding in the map is as per the number of anomalies
per kilometer. This information can be integrated to our navigation systems; hence further
improving the information quality for users.

Similarly the road repairing authorities can plan their priorities better if they have informa-
tion of the areas that are in much need of repair. Anomalies considered to be plotted in these
maps include Manholes and Potholes only because the rest of the anomalies are part of road
planning. Fig. 4.6 shows the map generated by the system for road repairing authorities.
Anomalies are marked with blue dots on the map. These geo-tags of anomalies may not

22
Figure 4.1: Cateye Signature

23
Figure 4.2: Manhole Signature

24
Figure 4.3: Pothole Signature

25
Figure 4.4: Speed Bump Signature

26
Figure 4.5: Generated map for drivers/passengers, showing the roads suitable to drive
comfortably

be 100% accurate due to errors in GPS readings but they can provide a good idea of the
intensity of repair required in a certain zone/locality. More maps of different cars can be
found in Appendix D.

27
Figure 4.6: Generated map for road repairing authorities to identify the spots require
immediate repairing

28
Chapter 5
Conclusion and Recommendations

5.1 Conclusion:

From system modeling, it is very clear that the data is dependent on the car dynamics.
Hence for better classification results, it is necessary to remove this dependency from the
data if we have to increase our data and combine the dataset for all the different cars. I have
tried to remove this dependency by using MFCC features. This approach of processing the
signal in frequency domain significantly improved our TPR from 94.08% to 95.2%. However
this problem can also be solved by using the data from both sensors, i.e. the one mounted
on dashboard and the other mounted near tire. Similarly, from visual inspection of anomaly
patterns, it is interpreted that cat eyes pattern resembles to that of potholes and speed bumps’
resembles to manholes. This is also very much evident from the confusion matrices. Kurtosis
and Skewness were computed for this cause, i.e. to differentiate between these classes.

5.2 Recommendations:

Following can be good future directions to make this work more robust and trustworthy.

• Use of denoising algorithms or deep learning architectures to remove car dependancy

from data.

• Considering non-fixed windows for anomalies, as the signature usually doesn’t fit per-
fectly to a fix sized window

• Utilizing anomaly classification algorithms for time series domain specifically

29
REFERENCES

[1] DW.COM, “Trump reveals plan to repair america’s creaky infrastructure,” 2018.

[2] I. J. W. Magazine, “Study: Pothole damage costs u.s. drivers $3b a year,” 2016.

[3] “Road Anomaly Detection (RAD) Dataset,” 2018. [Online]. Available: https:
//bit.ly/2Fbfkaj

[4] H. B. SALAU, A. J. ONUMANYİ, A. M. AİBİNU, E. N. ONWUKA, J. J. DUKİYA, and

H. OHİZE, “A survey of accelerometer-based techniques for road anomalies detection
and characterization,” International Journal of Engineering Science and Application,
vol. 3, no. 1, pp. 8–20, 2019.

[5] P. Mohan, V. N. Padmanabhan, and R. Ramjee, “Nericell: rich monitoring of road and
traffic conditions using mobile smartphones,” in Proceedings of the 6th ACM conference
on Embedded network sensor systems, 2008, pp. 323–336.

[6] J. Eriksson, L. Girod, B. Hull, R. Newton, S. Madden, and H. Balakrishnan, “The pot-
hole patrol: using a mobile sensor network for road surface monitoring,” in Proceedings
of the 6th international conference on Mobile systems, applications, and services, 2008,
pp. 29–39.

[7] G. D. De Silva, R. S. Perera, N. M. Laxman, K. M. Thilakarathna, C. I. Keppitiyagama,

and K. De Zoysa, “Automated pothole detection system,” in Proc. Int. Conf. Adv. ICT
Emerg. Regions, 2013.

[8] A. Mednis, G. Strazdins, R. Zviedris, G. Kanonirs, and L. Selavo, “Real time pothole
detection using android smartphones with accelerometers,” in 2011 International con-
ference on distributed computing in sensor systems and workshops (DCOSS). IEEE,
2011, pp. 1–6.

[9] V. Astarita, M. V. Caruso, G. Danieli, D. C. Festa, V. P. Giofrè, T. Iuele, and R. Vaiana,

“A mobile application for road surface quality control: Uniqualroad,” Procedia-Social
and Behavioral Sciences, vol. 54, pp. 1135–1144, 2012.

[10] V. Douangphachanh and H. Oneyama, “A study on the use of smartphones for road
roughness condition estimation,” Journal of the Eastern Asia Society for Transportation
Studies, vol. 10, pp. 1551–1564, 2013.

30
[11] A. Sinharay, S. Bilal, A. Pal, and A. Sinha, “Low computational approach for road
condition monitoring using smartphones,” in Proceedings of the Computer Society of In-
dia (CSI) Annual Convention, Theme: Intelligent Infrastructure, Visakhapatnam, India,
2013, pp. 13–15.

[12] H. Hameed, S. Mazhar, and N. Hassan, “Real-time road anomaly detection, using an on-
board data logger,” in 2018 IEEE 87th Vehicular Technology Conference (VTC Spring).
IEEE, 2018, pp. 1–5.

[13] L. C. González, R. Moreno, H. J. Escalante, F. Martínez, and M. R. Carlos, “Learn-

ing roadway surface disruption patterns using the bag of words representation,” IEEE
Transactions on Intelligent Transportation Systems, vol. 18, no. 11, pp. 2916–2928, 2017.

[14] M. Perttunen, O. Mazhelis, F. Cong, M. Kauppila, T. Leppänen, J. Kantola, J. Collin,

S. Pirttikangas, J. Haverinen, T. Ristaniemi et al., “Distributed road surface condition
monitoring using mobile phones,” in International conference on ubiquitous intelligence
and computing. Springer, 2011, pp. 64–78.

[15] M. Fazeen, B. Gozick, R. Dantu, M. Bhukhiya, and M. C. González, “Safe driving using
mobile phones,” IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 3,
pp. 1462–1468, 2012.

[16] F. Martinez, L. C. Gonzalez, and M. R. Carlos, “Identifying roadway surface disruptions

based on accelerometer patterns,” IEEE Latin America Transactions, vol. 12, no. 3, pp.
455–461, 2014.

[17] F. Seraj, B. J. van der Zwaag, A. Dilo, T. Luarasi, and P. Havinga, “Roads: A road
pavement monitoring system for anomaly detection using smart phones,” in Big data
analytics in the social and ubiquitous context. Springer, 2015, pp. 128–146.

[18] H. R. Eftekhari and M. Ghatee, “An inference engine for smartphones to preprocess
data and detect stationary and transportation modes,” Transportation Research Part
C: Emerging Technologies, vol. 69, pp. 313–327, 2016.

[19] L. C. González-Gurrola, F. Martínez-Reyes, and M. R. Carlos-Loya, “The citizen road

watcher–identifying roadway surface disruptions based on accelerometer patterns,” in
Ubiquitous computing and ambient intelligence. Context-awareness and context-driven
interaction. Springer, 2013, pp. 374–377.

[20] H. Bello-Salau, A. Aibinu, A. Onumanyi, E. Onwuka, J. Dukiya, and H. Ohize, “New

road anomaly detection and characterization algorithm for autonomous vehicles,” Ap-
plied Computing and Informatics, 2018.

31
[21] Y.-L. Jeng, S.-B. Huang, and C.-F. Lai, “Inspect road quality by using anomaly detec-
tion approach,” in 2018 International Conference on System Science and Engineering
(ICSSE). IEEE, 2018, pp. 1–4.

[22] M. A. Walker, R. Catron, B. Vaughan, and H. J. Lu, “Systems and methods of deter-
mining road quality,” Aug. 13 2019, uS Patent 10,378,160.

[23] C. Gorges, K. Öztürk, and R. Liebich, “Impact detection using a machine learning ap-
proach and experimental road roughness classification,” Mechanical Systems and Signal
Processing, vol. 117, pp. 738–756, 2019.

[24] É. Renault, V. H. Ha et al., “Road anomaly detection using smartphone: A brief anal-
ysis,” in International Conference on Mobile, Secure, and Programmable Networking.
Springer, 2018, pp. 86–97.

[25] Y. Chen, M. Zhou, Z. Zheng, and M. Huo, “Toward practical crowdsourcing-based road
anomaly detection with scale-invariant feature,” IEEE Access, vol. 7, pp. 67 666–67 678,
2019.

[26] Z. Zheng, M. Zhou, Y. Chen, M. Huo, and D. Chen, “Enabling real-time road anomaly
detection via mobile edge computing,” International Journal of Distributed Sensor Net-
works, vol. 15, no. 11, p. 1550147719891319, 2019.

[27] H. Bello-Salau, A. Aibinu, A. Onumanyi, E. Onwuka, J. Dukiya, and H. Ohize, “New

road anomaly detection and characterization algorithm for autonomous vehicles,” Ap-
plied Computing and Informatics, 2020.

[28] T. Sztyler and H. Stuckenschmidt, “Online personalization of cross-subjects based ac-

tivity recognition models on wearable devices,” in 2017 IEEE International Conference
on Pervasive Computing and Communications (PerCom). IEEE, 2017, pp. 180–189.

[29] J. Wang, Y. Chen, L. Hu, X. Peng, and S. Y. Philip, “Stratified transfer learning for
cross-domain activity recognition,” in 2018 IEEE International Conference on Pervasive
Computing and Communications (PerCom). IEEE, 2018, pp. 1–10.

[30] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedi-
cal image segmentation,” in International Conference on Medical image computing and
computer-assisted intervention. Springer, 2015, pp. 234–241.

32
Appendix A

Different Signature of Anomalies

1. Cateyes

33
34
2. Manholes

35
36
3. Potholes

37
38
4. Speed Bumps

39
40
Appendix B

Binary Classification Results

1. Support Vector Machine

Table 5.1: SVM Results on full dataset

Anomaly Normal Total Actuals

3986 78 4064
Anomaly
(98.0) (2.00) (100)

19084 36768 55852

Normal
(34.1) (65.9) (100)

Total Predicted 3176 32774 59916

Table 5.2: SVM Results on 40% test-set

Anomaly Normal Total Actuals

1548 78
Anomaly 1626
(95.2) (4.8)

114 4354
Normal 4468
(2.6) (97.4)

Total Predicted 1663 4432 6094

41
2. Naive Bayes

Table 5.3: NB Results on full dataset

Anomaly Normal Total Actuals

3284 780 4064

Anomaly
(80.8%) (19.19%) (100%)

5157 50695 55852

Normal
(9.23%) (90.76%) (100%)

Total Predicted 8361 51475 59916

3. Random Forest

Table 5.4: RF Results on full dataset

Anomaly Normal Total Actuals

2965 1099 4064

Anomaly
(72.95%) (27.05%) (100%)

715 55137 55852

Normal
(1.28%) (98.72%) (100%)

Total Predicted 3680 56236 59916

42
4. Simple Logistic Regression

Table 5.5: SLR Results on full dataset

Anomaly Normal Total Actuals

2512 1552 4064

Anomaly
(61.81%) (38.19%) (100%)

504 55348 55852

Normal
(0.9%) (99.1%) (100%)

Total Predicted 3016 56900 6095

43
Appendix C

Multiclass Classification Results

1. Support Vector Machine

Cat eye Manhole Normal Pothole S.Bump Total

478 9 18 217 4 726

Cat eye
(65.84) (1.23) (2.47) (29.88) (0.55) (100)

6 537 52 146 53 794

Manhole
(0.75) (67.63) (6.54) (18.38) (6.67) (100)

127 1417 45405 6432 2471 55852

Normal
(0.22) (2.53) (81.29) (11.51) (4.42) (100)

4 22 42 1008 28 1104
Pothole
(0.36) (1.99) (3.80) (91.3) (2.53) (100)

3 6 33 71 1327 1440
S.Bump
(0.20) (0.41) (2.29) (4.93) (92.15) (100)

44
2. Naive Bayes

Cat eye Manhole Normal Pothole S.Bump Total

285 136 42 160 103 726

Cat eye
(39.25) (18.73) (5.78) (22.03) (14.18) (100)

56 185 153 125 275 794

Manhole
(7.05) (23.29) (19.26) (15.74) (34.63) (100)

258 3527 49264 653 2150 55852

Normal
(0.46) (6.31) (88.20) (1.16) (3.84) (100)

160 260 105 272 307 1104

Pothole
(14.49) (23.55) (9.51) (24.63) (27.80) (100)

5 89 153 68 1125 1440

S.Bump
(0.34) (6.18) (10.62) (4.72) (78.12) (100)

45
3. Random Forest

Cat eye Manhole Normal Pothole S.Bump Total

325 3 333 49 16 726

Cat eye
(44.76) (0.41) (45.86) (6.74) (2.20) (100)

34 129 514 29 88 794

Manhole
(4.28) (16.24) (64.73) (3.65) (11.08) (100)

85 15 55497 85 170 55852

Normal
(0.15) (0.02) (99.36) (0.15) (0.3) (100)

86 1 715 253 49 1104

Pothole
(7.78) (0.09) (64.76) (22.91) (4.43) (100)

8 2 349 8 1073 1440

S.Bump
(0.55) (0.14) (24.23) (0.55) (74.51) (100)

46
4. Simple Logistic Regression

Cat eye Manhole Normal Pothole S.Bump Total

275 24 304 116 7 726

Cat eye
(37.87) (3.30) (41.87) (15.97) (0.96) (100)

48 46 510 109 81 794

Manhole
(6.04) (5.79) (64.23) (13.72) (10.20) (100)

90 25 55485 118 134 55852

Normal
(0.16) (0.04) (99.34) (0.21) (0.23) (100)

75 28 663 272 66 1104

Pothole
(6.79) (2.53) (60.05) (24.63) (5.97) (100)

9 14 386 38 993 1440

S.Bump
(0.62) (0.97) (26.80) (2.63) (68.95) (100)

47
Appendix D

Generated Maps

1. Corolla GLI

48
2. Hyundai Santro

49
Appendix E

Auto Encoder Results

All the results reported in this section have been compiled using SVM model that has per-
formed the best in our work. These results are a narrative of the fact that our solution can
be much more efficient in terms of memory being used, with very minimal loss of accuracy.

1. Results with 93x3 dimensional data

Table 5.6: Results on full dataset with 93x3 dimensional data

Anomaly Normal Total Actuals

3986 78 4064
Anomaly
(98.0) (2.00) (100)

19084 36768 55852

Normal
(34.1) (65.9) (100)

Total Predicted 3176 32774 59916

Table 5.7: Results on 40% test-set with 93x3 dimensional data

Anomaly Normal Total Actuals

1548 78
Anomaly 1626
(95.2) (4.8)

114 4354
Normal 4468
(2.6) (97.4)

Total Predicted 1663 4432 6094

50
2. Results with 46x3 dimensional data

Table 5.8: Results on full dataset with 46x3 dimensional data

Anomaly Normal Total Actuals

3914 150 4064

Anomaly
(96.3) (3.70) (100)

23094 32758 55852

Normal
(41.34) (58.66) (100)

Total Predicted 3176 32774 59916

Table 5.9: Results on 40% test-set with 46x3 dimensional data

Anomaly Normal Total Actuals

1486 140
Anomaly 1626
(91.38) (8.62)

135 4333
Normal 4468
(3.02) (96.98)

Total Predicted 1663 4432 6094

51
3. Results with 26x3 dimensional data

Table 5.10: Results on full dataset with 26x3 dimensional data

Anomaly Normal Total Actuals

3829 235 4064

Anomaly
(94.2) (5.8) (100)

21353 34499 55852

Normal
(38.23) (61.77) (100)

Total Predicted 3176 32774 59916

Table 5.11: Results on 40% test-set with 26x3 dimensional data

Anomaly Normal Total Actuals

1448 178
Anomaly 1626
(89.05) (10.95)

123 4345
Normal 4468
(2.75) (97.25)

Total Predicted 1663 4432 6094

52
Appendix F

Publications

Publications from this work

1. Hassan, Naufil, Ifrah Siddiqui, Suleman Mazhar, and Hadia Hameed. "Road Anomaly
Classification for Low-Cost Road Maintenance and Route Quality Maps." in 2019 IEEE
International Conference on Pervasive Computing and Communications Workshops
(PerCom Workshops), IEEE, 2019, pp. 645-650.

2. Hameed, Hadia, Suleman Mazhar, and Naufil Hassan. "Real-Time Road Anomaly
Detection, Using an On-Board Data Logger." in 2018 IEEE 87th Vehicular Technology
Conference (VTC Spring), IEEE, 2018, pp. 1-5.

53
Road Surface Profiling using On-Board Diagnostic Data Logger
ORIGINALITY REPORT

% 5
SIMILARITY INDEX
% 4
INTERNET SOURCES
% 2
PUBLICATIONS
%
STUDENT PAPERS

PRIMARY SOURCES

1
Luis C. Gonzalez, Ricardo Moreno, Hugo Jair
Escalante, Fernando Martinez, Manuel Ricardo
% 1
Carlos. "Learning Roadway Surface Disruption
Patterns Using the Bag of Words
Representation", IEEE Transactions on
Intelligent Transportation Systems, 2017
Publication

2
umpir.ump.edu.my
Internet Source <%1
3
dblp.dagstuhl.de
Internet Source <%1
4
H. Bello-Salau, A. J. Onumanyi, A. T.
Salawudeen, M. B. Mu'azu, A. M. Oyinbo. "An
<%1
Examination of Different Vision based
Approaches for Road Anomaly Detection", 2019
2nd International Conference of the IEEE
Nigeria Computer Chapter
(NigeriaComputConf), 2019
Publication

5
"Advances in Neural Networks – ISNN 2013",
Springer Science and Business Media LLC,
2013 <%1
Publication

6
prism.ucalgary.ca
Internet Source <%1
7
lib.dr.iastate.edu
Internet Source <%1
8
pdf.usaid.gov
Internet Source <%1
9
www.eurekaselect.com
Internet Source <%1
10
"Communications, Signal Processing, and
Systems", Springer Science and Business
<%1
Media LLC, 2020
Publication

11
tel.archives-ouvertes.fr
Internet Source <%1
12
scholar.valpo.edu
Internet Source <%1
13
Bo Wang, Wayne Enright. "Parameter
Estimation for ODEs Using a Cross-Entropy
<%1
Approach", SIAM Journal on Scientific
Computing, 2013
Publication

14
export.arxiv.org
Internet Source

<%1
15
nagoya.repo.nii.ac.jp
Internet Source <%1
16
www.suffolk.gov.uk
Internet Source <%1
17
eprints.mdx.ac.uk
Internet Source <%1
18
open.library.ubc.ca
Internet Source <%1
19
repositorium.sdum.uminho.pt
Internet Source <%1
20
Chun Sern Choong, Ahmad Fakhri Ab. Nasir,
Anwar P. P. Abdul Majeed, Muhammad Aizzat
<%1
Zakaria, Mohd Azraai Mohd Razman. "Chapter
35 Machine Learning Approach in Identifying
Speed Breakers for Autonomous Driving: An
Overview", Springer Science and Business
Media LLC, 2020
Publication

21
www.sciencemag.org
Internet Source <%1
22
repository.dl.itc.u-tokyo.ac.jp
Internet Source <%1
23
Anna Ferrari, Daniela Micucci, Marco Mobilio,
Paolo Napoletano. "On the Personalization of
<%1
Classification Models for Human Activity
Recognition", IEEE Access, 2020
Publication

24
doi.org
Internet Source <%1
25
www.mdpi.com
Internet Source <%1
26
scholarscompass.vcu.edu
Internet Source <%1
27
Boquan Tian, Yongbo Yuan, Hengyu Zhou,
Zhen Yang. "Pavement Management Utilizing
<%1
Mobile Crowd Sensing", Advances in Civil
Engineering, 2020
Publication

EXCLUDE QUOTES OFF EXCLUDE MATCHES OFF

EXCLUDE ON
BIBLIOGRAPHY

aiml manual 6th sem
No ratings yet
aiml manual 6th sem
15 pages
Dokumen - Pub - Research Topics in Graph Theory and Its Applications 1527535339 9781527535336
No ratings yet
Dokumen - Pub - Research Topics in Graph Theory and Its Applications 1527535339 9781527535336
310 pages
CV Notes PDF
No ratings yet
CV Notes PDF
206 pages
Papyrus Sys MLTutorial
No ratings yet
Papyrus Sys MLTutorial
64 pages
T.O.R.C.S. Manual Installation and Robot Tutorial
No ratings yet
T.O.R.C.S. Manual Installation and Robot Tutorial
154 pages
Richard Fabian Data Oriented Design Software Engineering For Limited
No ratings yet
Richard Fabian Data Oriented Design Software Engineering For Limited
327 pages
A State of Art Techniques On Machine Learning Algorithms A Perspective of Supervised Learning Approaches in Data Classification
100% (1)
A State of Art Techniques On Machine Learning Algorithms A Perspective of Supervised Learning Approaches in Data Classification
5 pages
Department of Mining Engineering: Indian Institute of Technology (Indian School of Mines) Dhanbad
No ratings yet
Department of Mining Engineering: Indian Institute of Technology (Indian School of Mines) Dhanbad
25 pages
Microsoft Malware Prediction
100% (1)
Microsoft Malware Prediction
16 pages
Pmos Report
No ratings yet
Pmos Report
183 pages
Project Proposal 260 Copy
No ratings yet
Project Proposal 260 Copy
38 pages
Karanja Evanson Mwangi Cit Masters Report Libre PDF
No ratings yet
Karanja Evanson Mwangi Cit Masters Report Libre PDF
136 pages
Essentials of Pattern Recognition An Accessible Approach 1st Edition Jianxin Wu all chapter instant download
100% (2)
Essentials of Pattern Recognition An Accessible Approach 1st Edition Jianxin Wu all chapter instant download
37 pages
Ismaykim1 PDF
No ratings yet
Ismaykim1 PDF
522 pages
Flexible Temperature Sensors A Review
No ratings yet
Flexible Temperature Sensors A Review
16 pages
Differential Evolution in Search of Solutions by Vitaliy Feoktistov PDF
No ratings yet
Differential Evolution in Search of Solutions by Vitaliy Feoktistov PDF
200 pages
A Tour of C 3rd Edition Stroustrup B. download
100% (1)
A Tour of C 3rd Edition Stroustrup B. download
62 pages
1905.13750 Sketch2code Generating A Website From A Paper
No ratings yet
1905.13750 Sketch2code Generating A Website From A Paper
64 pages
Projects On Iot
No ratings yet
Projects On Iot
2 pages
I95 CSP DraftMasterPlan 20220405 ToPrint Opt
No ratings yet
I95 CSP DraftMasterPlan 20220405 ToPrint Opt
160 pages
Malicious Twitter Bots Detection Using Machine Learning: A Mini Project Report
No ratings yet
Malicious Twitter Bots Detection Using Machine Learning: A Mini Project Report
54 pages
A Comprehensive Survey On Pretrained Foundation Models
No ratings yet
A Comprehensive Survey On Pretrained Foundation Models
97 pages
Seminar
No ratings yet
Seminar
38 pages
Fdident
No ratings yet
Fdident
232 pages
Class PPT - Unit2
No ratings yet
Class PPT - Unit2
139 pages
M.tech Syllabus
No ratings yet
M.tech Syllabus
32 pages
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
No ratings yet
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
4 pages
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
No ratings yet
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
18 pages
Deep Neural Network
No ratings yet
Deep Neural Network
12 pages
Hs1501 Notes
No ratings yet
Hs1501 Notes
117 pages
BE - Report Project Final
No ratings yet
BE - Report Project Final
56 pages
ARTIFICIAL INTELL《人工智能支持的多任务资源分配战术决策辅助》美海军2022最新108页论文IGENCE-ENABLED MULTI-MISSION RESOURCE ALLOCATION TACTICAL DECISION AID
No ratings yet
ARTIFICIAL INTELL《人工智能支持的多任务资源分配战术决策辅助》美海军2022最新108页论文IGENCE-ENABLED MULTI-MISSION RESOURCE ALLOCATION TACTICAL DECISION AID
108 pages
GNR651 Spring2023 - C1 C2 C3 C4 Session9++
No ratings yet
GNR651 Spring2023 - C1 C2 C3 C4 Session9++
57 pages
UGC NET Library Science Question Paper PDF
No ratings yet
UGC NET Library Science Question Paper PDF
16 pages
IIT Ropar CV Template 1
No ratings yet
IIT Ropar CV Template 1
1 page
Lab 3 - Linear Regression
No ratings yet
Lab 3 - Linear Regression
15 pages
NW Investigator
No ratings yet
NW Investigator
168 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
Marran Computer Science 2017 PDF
No ratings yet
Marran Computer Science 2017 PDF
75 pages
Download full Intrusion Detection A Machine Learning Approach 3rd Edition Zhenwei Yu ebook all chapters
100% (4)
Download full Intrusion Detection A Machine Learning Approach 3rd Edition Zhenwei Yu ebook all chapters
60 pages
Black Book Final Year
No ratings yet
Black Book Final Year
99 pages
MiniTab Introduction
100% (1)
MiniTab Introduction
124 pages
(Time: Hours) (Marks: 75) Please Check Whether You Have Got The Right Question Paper
No ratings yet
(Time: Hours) (Marks: 75) Please Check Whether You Have Got The Right Question Paper
16 pages
Download Full Python Programming in Context, 3rd Edition (eBook PDF) PDF All Chapters
100% (2)
Download Full Python Programming in Context, 3rd Edition (eBook PDF) PDF All Chapters
41 pages
Remotely Sensed Data Characterization Classification and Accuracies
100% (1)
Remotely Sensed Data Characterization Classification and Accuracies
712 pages
Classification of Flower Species Final
No ratings yet
Classification of Flower Species Final
32 pages
Data Science and Big Data Analytics-1-82
No ratings yet
Data Science and Big Data Analytics-1-82
82 pages
FePEST User Guide PDF
No ratings yet
FePEST User Guide PDF
130 pages
STAT613
No ratings yet
STAT613
295 pages
COURSE PLAN COs IoT
No ratings yet
COURSE PLAN COs IoT
7 pages
QGIS 3.22 TrainingManual en
No ratings yet
QGIS 3.22 TrainingManual en
701 pages
Madhan-1
No ratings yet
Madhan-1
90 pages
Neutrosophic Operational Research, II
No ratings yet
Neutrosophic Operational Research, II
207 pages
(Ebook) A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe ISBN 9781118848456, 1118848454 - The ebook is available for instant download, read anywhere
100% (1)
(Ebook) A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe ISBN 9781118848456, 1118848454 - The ebook is available for instant download, read anywhere
52 pages
Object Detection
No ratings yet
Object Detection
7 pages
Visvesvaraya Technological University: Vemana Institute of Technology
No ratings yet
Visvesvaraya Technological University: Vemana Institute of Technology
17 pages
Thesis Machine Learning
No ratings yet
Thesis Machine Learning
28 pages
Machine Learning for Environmental Noise Classification in Smart Cities (Ali Othman Albaji) (Z-Library)
No ratings yet
Machine Learning for Environmental Noise Classification in Smart Cities (Ali Othman Albaji) (Z-Library)
179 pages
Tesis Velasquez Rendon David
No ratings yet
Tesis Velasquez Rendon David
223 pages