Disruptive Innovation For Auto Insurance Entrepreneurs: New Paradigm Using Telematics and Machine Learning
Disruptive Innovation For Auto Insurance Entrepreneurs: New Paradigm Using Telematics and Machine Learning
Disruptive Innovation For Auto Insurance Entrepreneurs: New Paradigm Using Telematics and Machine Learning
Abstract Currently, Motor insurers are playing a passive role in terms of identi-
fication of risk incidents for the policy holders. Traditional insurance does not
differentiate safe drivers and unsafe drivers. Since they do not have the vehicle
telematics data of the policy holders. Many insurance corporations are planning to
utilize telematics data to build a model of predictive risk for policy holder and claim
possibility. They can reward safe drivers by low premiums and/or no-claim bonus.
Likewise, unsafe drivers need to pay extra risk premium. This means drivers have a
stronger incentive to adopt safer practices. This chapter describe black-box auto
insurance predictive model utilizing basic telemetry like GPS sensor data for usage
based insurance. Predictive model is developed using binary logistic regression
machine learning technique. It is an informative chapter for entrepreneurs since it
highlights the business proposition from an insurer perspective to gain competi-
tiveness in highly commoditized insurance market.
1 Introduction
Existing insurance billing models are based on policy data and claims data. Policy
data include vehicle cost, depreciation, first party or third party claims, and tenure
of the policy. Claims data include number of claims and claim amount. Currently
market segmentation is based on claims and we have no-claim and claim customers.
It encourages customers to drive cautiously by giving no-claim bonus. This descrip-
tive model is general and the premium cost is fixed since it does not consider usage.
So customers who have low annual mileage with good driving behavior subsidize
bad drivers who have higher annual mileage and exhibit reckless driving. It is unfair
to good drivers and does not penalize bad drivers. So there is a need to rationalize
the premium cost based on miles driven and driving behavior and provide value
addition to insured. It has been observed that there is a growing competitiveness in
1.1 Scope
The Logistic regression model (LRM for short) is used in an auto insurance risk
premium calculator to predict the risk from features extracted from GPS data. In risk
premium calculator, the LRM is used to perform regression and binary categorical
classification of risk. The accuracy of the LRM as a ratio of correct predictions to
number of predictions done is computed by using two features from GPS data.
Disruptive Innovation for Auto Insurance Entrepreneurs: New Paradigm. . . 557
PAYD insurance policies are offered by many insurance companies across the
world, collecting data in a variety of methods. It differs on the level of privacy
provided to the users.
WGV, a German insurance company gather vehicle speed and location infor-
mation and it is being verified whether speed limit is adhered. If the speed exceeded
for a given route, then the policy holder earns “negative” points that will have an
impact on risk premium.
Progressive Casualty Insurance (US) and AVIVA (Canada) use proprietary devices
which connect to OBDII (On Board Diagnostics II) port of the vehicle. This device
collects trip start and end time, miles driven, duration of trip, number of sudden starts
and stops, and time and date of each connection/disconnection to the OBDII port. This
data is reviewed by the user in a computer and can be exchanged with the insurer.
In Germany, Swiss Re and DVB Winterthur insurance companies have a similar
device to exchange data with the insurer. Route information, behavior of the user,
kilometers travelled and route information is inferred by using GPS.
Hollard Insurance provide PAYD insurance based on GPS, which records all the
data related to location, time and stores it in a server. The policy holder can access
the policy details using internet.
Progressive Insurance Corp. (US), registered the US Patent US5797134 to
capture necessary data using GPS and transmit it using GSM network. The data
includes safety equipment used (seat belts, turning signals . . ..). It also includes
driving behavior like rate of acceleration, rate of braking and observation of traffic
signals and speed.
Norwich Union (UK), owner of European patent (EP) number 0700009 and
Uniqa Group (Austria) follow the architecture using GPS and GSM. However, data
is limited to time of day, riskiness of the road and kilometers driven.
MAPFRE (Spain) use architecture using GPS and GSM. The data includes
percentage of night hours, average speed, time of day, type of road driven, average
length of trips and kilometers driven.
STOK (Netherland) use architecture using GPS and either active or passive way
to transmit data to the server. Passively by USB, Blue-tooth or wirelessly or
actively by using GSM network. The statistics and trip logs are accessible by
insurance companies and user (Troncoso et al. 2011). The summary of all existing
PAYD implementation is given in Table 1.
Annual risk can be calculated as the product of per-mile risk and annual mileage. It
was found that a relationship exist between reduction in VMT (vehicle miles
travelled) and reduction in risk. Mileage is not the only important risk factor.
558 N. Arun Kumar and S. Yellampalli
A GPS device with GSM (GPRS enabled) with required control board will be fixed
into the vehicle as shown in Fig. 1. This device will be powered from the vehicle’s
battery. GPS capabilities provided include Speed, Idle time, Latitude and Longitude
of the vehicle. It shall have a battery in it for the failsafe mechanism if the vehicle
Disruptive Innovation for Auto Insurance Entrepreneurs: New Paradigm. . . 559
battery has been disconnected or drained. The device shall be able to store the GPS
Sentences in case of non-availability of the GSM network and the same shall be
pushed to the server in the First In First Out basis (FIFO). The device shall provide
OTA (Over the Air) based firmware upgrades. The device configuration settings
like server address and other details should be configurable by SMS. The device
shall be configured to send GPS Sentences as Web Requests. The HTTP based Web
Requests will be received by public IP server. The device shall directly talk to Web
Server which shall aggregate the telematics data and insurance data. It hosts the
machine learning algorithm and generates the reports regarding the predictive risk
(Husnjaka et al. 2015)
LRM is a customized version of generalized linear model and it is similar to
linear regression (McCullagh and Nelder 1989). It is based on Machine Learning
architecture shown in Fig. 2. It is used to compute the possibility of a dichotomous
outcome “Risk” or “No risk” based on one or more independent variables which are
called as predictors or features. The features are extracted from GPS data namely
average speed and average driving time.
Machine learning architecture involves model, input, output and classifier. Any
model before deployment needs to be learnt using offline system. In this case,
mapped GPS and Insurance data is used as training dataset. Typically, it includes
Risk, Average speed and Average driving time for each insurance policy holder.
Classifier is specified during online learning of the model and it is also called
classification cut-off. The model is scalable wherein new features like mobile usage
560 N. Arun Kumar and S. Yellampalli
Training Data
→ →
{ (x (1) ,y (1)),..., (x (N),y (N)) }
→ Input
where x (i ) = (x 0(i ),..., x d(i )) →
object encoded with features
x = (x 0,..., x d )
Offline
Online
Training Model classifier
System
Sub-system
TRAINING
Final
Output predicition
y (response/dependent variable)
by driver, road condition can be added. Model can be calibrated using classifier and
updated using online learning mechanism.
Risk is dependent on many factors namely, terrain, usage of mobile while
driving, vehicle age, maintenance cycle, speed, road condition, travel time without
break and driving behavior. “Risk” or “No Risk” is a dependent variable and speed,
road condition, travel time without break and driving behavior are independent
variables. LRM shall predict a discrete outcome with the assumption that there is no
linearity between independent variables and dependent variables.
2 Methodology
Device
Data
Claims Risk-adjusted
Model
Data Premium
Policy
Data
2.1 Infrastructure
Risk-adjusted premium policy requires GPS unit with GSM network connectivity to
be installed in the vehicle. Insurance companies need to host predictive analytical
software which receive the GPS data and to process it for business insights. The
GPS unit need to have configuration and firmware upgrade capability, battery
failsafe mechanism, in-built memory for storing historical data, intelligence to
check the GSM availability and send the aggregated data. The device may be
procured from vendors or insurance companies may go for their own product. In
either case the cost implication of the device need to be considered. Consequently,
this equipment need to be certified for data transfer laws for telematics devices.
Likewise, insurance companies need to adhere to legal considerations that the
insured telematics data is used only for insurance purposes only. So there is an
562 N. Arun Kumar and S. Yellampalli
overhead cost with regard to compliance of data protection laws. The deployment
and operational cost of the policy and the ROI need to be studied further. This
section provides an overview regarding the business entities and operational
challenges.
Insurance claim data is mapped with telematics data of the fleet to create training
dataset. Telematics data consists of aggregated GPS data, which are average speed
and average driving time. The training data has one categorical dependent variable
called risk and two independent continuous predictor’s namely average speed and
average driving time for each vehicle in the fleet. This training data is used to learn
the predictive model by computing the coefficients (Table 4).
Disruptive Innovation for Auto Insurance Entrepreneurs: New Paradigm. . . 563
3 Results
Bubble chart plotted for as subset of vehicles for risk which is linearly proportional
to average speed and average driving. Bubble with bigger size is risk and smaller
size is no risk (Fig. 4).
It is observed that segmentation of customer as “Risk” or “No risk” is dependent
on GPS device data, Insurance claim and policy data and model classifier data. GPS
data is the actual data for which prediction shall be computed. Insurance data is
used for offline and online learning of the model. Model classifier data is the
classification cut-off threshold used in the calibration of the model. It is dependent
on other factors like road condition, terrain, atmospheric condition and used to the
fit the model.
The learned LRM model was used to predict risk for a given GPS dataset of a
month. Consequently, reconciliation was performed with insurance claim data of
2 years to compute the segmentation. It was observed that accuracy was 51%. The
Fig. 5 shows the predicted outcomes p ¼ 1 (Risk) and p ¼ 0 (No Risk) against the
features (Fig. 6).
564 N. Arun Kumar and S. Yellampalli
Distribution of Risk
100
90
80
70
Average speed
60
50
40
30
20
10
0
–2 0 2 4 6 8 10 12
Travel time without break
GPS Data
R
I
S
Model Classifier K Insurance Data
Data
In this chapter, telematics and insurance data of 418 vehicles were analyzed.
Consequently, a predictive model with 51% prediction accuracy was developed
using binary logistic regression technique. This model has the key advantage of
calibration on the fly as per the business needs. Any degradation of the model can be
circumventing by updating the model with online learning and tuning the classifi-
cation cut-off. The cutoff value can be configured based on the aging of the vehicle,
Disruptive Innovation for Auto Insurance Entrepreneurs: New Paradigm. . . 565
1.0
0.8
0.6
Prob(y=1)
0.4
0.2
0.0
0 2 4 6 8 10
x
engine run in hours, road condition and terrain, driving behavior and other factors.
It also helps the researcher to study the influence of one independent variable like
driver behavior on the outcome risk while keeping other predictors constant.
However, the predictive model drifts as the dataset size increases from moderate
to large and features increase. Further study is needed regarding the comparative
study of predicting risk in usage based auto insurance using SVM, Decision tress
and other models. A benchmark would help the insurance companies to choose the
appropriate tool.
An attempt has been made to create a synergy or fusion of insurance claim data
and telematics data to predict risk. As on today, the insurance companies do not
have business insight regarding the potential insurance claim. This information is
vital in terms of business planning of insurance companies and also in alerting the
policy holder regarding the imminent risk. So that the policy holder can take
necessary precaution for safety. Following observations was made during execution
of the model:
1. On correlating the accuracy of the learned model (70%) and executed model
(51%) there was a dip of 19% due to other factors like road condition, terrain,
vehicle maintenance cycle and mobile usage during driving which were unac-
counted in the classification cut-off value of 0.5.
2. It is recommended to calibrate the model by performing online learning and
revisiting the classification cut-off based on the above mentioned factors for the
next execution in order to improve the accuracy
566 N. Arun Kumar and S. Yellampalli
3.2 Application
Due to privacy concerns personal vehicle owners are reluctant to embrace UBI.
However, commercial vehicles, distribution network and public transport network
owners are eager to leverage the UBI benefits (Introducing Pay How you Drive
Insurance, 2016). We can list the following benefits:
1. Insurers can enhance lower premiums for non-risk drivers to improve volume
which is profitable (Lovick 2011)
2. Improve pricing accuracy based on risk profiles (Introducing Pay How you
Drive Insurance, 2016)
3. Enhance efficiency and effectiveness of claims processing by using telematics
data as evidence and automating the process (Lovick 2011)
4. Prevention of fraudulent claims and underwriting, stronger customer engage-
ment (Introducing Pay How you Drive Insurance, 2016)
5. Stronger customer engagement and retention of profitable accounts (Digital
Insurance Telematics Solution, 2017)
6. Reduce claim costs
7. Differentiate brand and De-commoditization (Introducing Pay How you Drive
Insurance, 2016)
8. Initiating new revenue generating personalized and customized value add ser-
vices like Geo-fencing, Tracking, Automated Maintenance, Stolen vehicle
recovery, Route planning, Reduce fleet costs (Introducing Pay How you Drive
Insurance, 2016)
9. New entrepreneurs serving customers through smart phones or online touch
points (Top 10 Trends in Insurance in 2016, 2016)
4 Conclusion
Acknowledgments Authors are thankful to VTU Extension Centre, UTL Technologies Ltd
for providing the much needed infrastructure to conduct our research. We are also thankful to
Dr. B.S. Nagabhushana, Professor, Department of Electronics and Communication Engineering,
B.M.S College of Engineering, Bengaluru, India for his guidance.
References
Baecke, P., & Bocca, L. (2017). The value of vehicle telematics data in insurance risk selection
processes. Decision Support Systems, 98, 69–79.
Boquete, L., Rodrı́guez-Ascariz, J. M., Barea, R., Cantos, J., Miguel-Jiménez, J. M., & Ortega,
S. (2010). Data acquisition, analysis and transmission platform for pay-as-you-drive system.
Sensors, 10(6), 5395–5408.
Digital Insurance Telematics Solution. (2017). Tata Consulting Services Limited.
Ferreira, J., & Minikel, E.. (2012). Measuring per mile risk for pay-as-you-drive automobile
insurance. Transportation Research Record: Journal of the Transportation Research Board,
2297, 97–103.
568 N. Arun Kumar and S. Yellampalli
Husnjaka, S., Perakovića, D., Forenbachera, I., & Mumdzievb, M. (2015). Telematics system in
usage based motor insurance. Procedia Engineering, 100, 816–825.
Introducing Pay How you Drive Insurance. (2016). Ernst & Young Global Limited.
Kantor, S., & Stárek, T. (2014). Design of algorithms for payment telematics systems evaluating
driver’s driving style. Transactions on Transport Sciences, 7(1), 9–16.
Litman, T. (2006). Distance-based vehicle insurance as a TDM strategy. Victoria: Transport
policy Institute.
Lovick, T. (2011). Insurance telematics understanding risk with technology. https://www.actuar
ies.org.uk/documents/b02-insurance-telematics-understanding-risk-technology
McCullagh, P., & Nelder J. A. (1989). Generalized linear models (Vol. 37). Florida: CRC Press.
The Telematics Advantage: Growth, Retention and Transformational Improvement with Usage-Based
Insurance. (2012). Cognizant.
Top 10 Trends in Insurance in 2016. (2016). Capgemini.
Troncoso, C., Danezis, G., Kosta, E., & Preneel, B. (2011). PriPAYD: Privacy-friendly pay-as-you-
drive insurance. IEEE Transactions on Dependable and Secure Computing, 8(5), 742–755.
Tselentis, D. I., Yannis, G., & Vlahogianni, E. I. (2016). Innovative insurance schemes: Pay
as/how you drive. Transportation Research Procedia, 14, 362–371.