EAI Endorsed Transactions
on Internet of Things Research Article
Crime Prediction using Machine Learning
Sridharan S1, Srish N2, Vigneswaran S3 and Santhi P4
1, 2, 3, 4
Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Chennai, India
Abstract
The process of researching crime patterns and trends in order to find underlying issues and potential solutions to crime
prevention is known as crime analysis. This includes using statistical analysis, geographic mapping, and other approaches
of type and scope of crime in their areas. Crime analysis can also entail the creation of predictive models that use previous
data to anticipate future crime tendencies. Law enforcement authorities can more efficiently allocate resources and target
initiatives to reduce crime and increase public safety by evaluating crime data and finding trends. For prediction, this data
was fed into algorithms such as Linear Regression and Random Forest. Using data from 2001 to 2016, crime-type
projections are made for each state as well as all states in India. Simple visualisation charts are used to represent these
predictions. One critical feature of these algorithms is identifying the trend-changing year in order to boost the accuracy of
the predictions. The main aim is to predict crime cases from 2017 to 2020 by using the dataset from 2001 to 2016.
Keywords: Crime prediction, Linear regression, Visualisation, Geographic mapping, Crime analysis, Random Forest Classifier,
Machine Learning
Received on 01 December 2023, accepted on 04 February 2024, published on 15 February 2024
Copyright © 2024 Sridharan S. et al., licensed to EAI. This is an open-access article distributed under the terms of the CC BY-NC-
SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as
the original work is properly cited.
doi: 10.4108/eetiot.5123
*Corresponding author. Email: ch.en.u4cys21080@ch.students.amrita.edu
possible to foresee the location where it is most likely to
happen within a certain time frame.
1. Introduction The urgent demand to preserve the data and investigate
the numerous crimes that occur in our modern cities is the
This paper focuses on the problem of predicting crime in the need for data prediction, which entails data mining and
location of a particular year and the type of crime that manipulation. To identify criminals that are well-
occurred in the past. Due to a population outburst, there are coordinated and well-equipped to exploit contemporary
lack of unemployment, and drug abuse among others. technology for good communication and to pose significant
Crimes are of two types violent and passive. Violent crime, hazards to the country's security, extensive information, and
it is murder, forcible Rape, Robbery, etc… that lead to potent modern procedures are needed. Classification,
injury. The challenge faced in violent crime is it cannot be clustering, and regression techniques, as well as data mining
predicted with great certainty since it is systematic or technology, can be used to identify trends and conduct
random [1]. According to National Crime Research Center, criminal investigations [3]. Data mining is an effective
NCRC, the crime like burglary, and arson are said to have method for evaluating huge amounts of data, but it requires
reduced meanwhile others like murder, sex abuse, and rape pre-processing processes to extract the needed data rapidly.
have been reported. Traditional methods of crime prediction
are often based on analyzing past incidents manually so they
may not adapt well to changing patterns and trends. 1.1 . Problem Statement
Nevertheless, thanks to current technology and a rise in the
usage of artificial intelligence, machine learning, and The problem that crime analysts face is determining how to
quantitative statistics, now have some excellent instruments extrapolate previous criminal activity data into the chances
at our disposal for researching and, eventually, containing of future occurrences occurring at specific points in space
crime [2]. Even while it can be challenging to predict who and time. Descriptive analysis deals with identifying
will commit a crime and who will be the victims, it is temporal and spatial relationships in crime data Predictive
EAI Endorsed Transactions on
Internet of Things
1 | Volume 10 | 2024 |
Sridharan S. et al.
analytics techniques are typically used to foretell the type of supervised technique, such as classification, because crimes
crime that could happen at any place at a certain time. The vary considerably in nature and criminal databases are
analyst would want a visual map that depicted the degree of frequently overflowing with unresolved crimes. The model
probable criminal activity at each place inside their was then examined, created, and used to sample the data set
jurisdictional limits. This happens by merging crime and and train the algorithm [4]. More than 75% were provided
population statistics and feeding them to machine learning by the K-Means Clustering technique. Akash et al. have
algorithms. The prescriptive analyzer offers process re- applied the theory of broken windows. The model was then
engineering measures that effectively deploy police analysed, pre-processed then put into practice to train the
resources with the goal of reducing crime and its impact on algorithm and taste the collection of data. The K-Means
the general public. Points of view or opinions are clearly Clustering algorithm returned over 75% [5].
important in preventing crime since they enable the police to
direct resources to high-risk areas. The model was then examined, prepared, and put into
use to sample the data set and train the algorithm. More than
1.2. Objective 75% came from the method of K-Means Clustering. The
authors frequently generated item sets using a priori
Using analytical and predictive data analytics techniques, which were also possible for criminals. A crime
methodologies, provide a platform for assessing crime data. prediction approach to finding the most likely perpetrators
Analyse the spatial and temporal (time of day, day of week, of a given crime. The authors compared the performance of
and seasons) connections in crime data using the suggested the Nave Bayes Classifier and Decision Tree offender
platform. Analyse the relationship between crime data and prediction methods. Data sets are transformed into clusters
census data. using clustering algorithms, which are then investigated to
The model examines the crime patterns and identifies identify crime-prone locations [6]. Rajesh Kanna et al.
shifts in the general crime ratio depending on population or presented a CNN-based deep learning model based on long
demographic ratios, which will make it easier to predict short-term memory for detecting crime.
future crime instances. Additionally, it is forecastable how
many security measures will be required and how many A deep learning model based on Map Reduce is also
criminal activities will need to be controlled to stop or used to detect intrusions using spatiotemporal
lessen the occurrence of any low-level to treacherous characteristics. To improve feature selection accuracy, the
criminal activity. black widow optimised method is utilised. [7]. Researchers
have published a variety of data mining strategies that
1.3. Scope enable crime data analysis, crime forecasting, criminal
identification, and location of crime hotspots [8]. Utilizing a
The suggested system will be in charge of criminal stacked sparse autoencoder network for the identification of
management, detection, and prevention. The system will malicious modules for the reliability of VLSI circuits [9].
use time series, clustering, and data mining techniques to The Naive Bayes Classifier required less time to execute and
generate predictions of future crime rates. This will be achieved a higher accuracy of 78.05%. He investigated
accomplished by displaying crime patterns graphically and various offenses committed by offenders and predicted the
by using geographic heat maps to show data concentration likelihood of each offence being committed by that offender
and hotspots in real time. again [10].
1.4. Dataset Sivaranjani et al. [13] present a crime study of six cities in
Tamil Nadu, India, using clustering methods k means,
Data for our dataset go from 2001 to 2016. Fieldwork-based DBSCAN, and Agglomerative clustering to group similar
primary data collection is used to collect the initial data set patterns for crime detection. Conclude that is superior.
Kansara, Chirag, E. [14] The authors compared the
for crimes. More than 500 more than ten rows' worth of
information make up this set. The primary elements are performance of random Forests, Nave Bayes, and linear
Name, Years, Months, Crime Types, Crime Areas, Victims regression in identifying factors influencing high crime
Genders, Victim Ages, Victim Areas, and Year. Our dataset rates. The authors conclude that the Random Forest
is divided into three categories: cases involving women, performs better based on the comparison with 81.35%
kids, and IPC at the state level. We used this dataset to accuracy .
forecast crime at the state level.
3. Proposed Methodology
2. Literature Survey
The answer is provided as a statistical and machine learning
When a suspected list of criminals is merged with criminal model that employs classification, clustering, and regression
data generated synthetically using the Gaussian Mehmet Sait algorithms; K-NN algorithms, Bayes Nave algorithms, and
and Mustafa Gök came up with the Mixture Model. Tayebi et Regression algorithms that can be used to describe the
al. chose the clustering technique above any other functional relationships among demographic, economic,
EAI Endorsed Transactions on
Internet of Things
2 | Volume 10 | 2024 |
Crime Prediction using Machine Learning
social, victim, and geographic variables. through analysing [2]. The model was then examined, prepared, and put into
patterns in criminal data sets. temporal series techniques use to sample the data set and train the algorithm. More than
have been proven to be beneficial in conjunction with the 75% came from the method of K-means clustering. Using
algorithms mentioned previously in allowing the model to the broken window theory, random forest, and naive Bayes,
forecast criminal incidents with high accuracy according to crime was reduced, and the crime area was located. Create
temporal growth and changing features. the data frame required to train the model for image
recognition, information preprocessing, and identifying
3.1. Algorithm criminal hotspots. 0.87% of the best accuracy is provided by
the deep learning-tuned model. It is possible to predict crime
rates using machine learning's classification and regression
Clustering algorithms are included in the domain. The K- techniques. To establish a link between both the dependent
means partitioning method is widely employed and and independent variables, multi-linear regression is
accepted. Instead of the K-means method, this linear performed. For single-class and multi-class variable
regression is utilized to access consumers to decide the classification, K-Nearest Neighbours is utilized. When we
number of clusters on the bases of values however Navies need to divide the target variable into more than two groups,
Bayes gives a first-rate outcome, with the two algorithms that use the K-nearest neighbours’ method. This dataset has
which get a high accuracy rate. Linear and multi-linear three groups of people based on their gender: men, women,
regression which shows the relation of dependent data or and those whose gender is not known. Age can be classified
variables (like age, gender, etc) and a collection of into three categories: young, old, and young. K-nearest
independent variables discovered at the site of the crime. Neighbours Classifier is used to group or classify the
This method calculates the age values for the victims based target variables.
on the input criteria mentioned in the metadata column.
Given the crime locations, linear regression is used in the
crime prediction scenario to determine the age of the most
probable offender. Analyzing historical data demonstrates
that the ratio of female victims to male victims is steadily
rising. This statistic depicts the victim rate
for men and women. Due to the numerous crime data sets
and the intricate connections between these different forms
of data, criminology is a suitable subject for the application
of data mining techniques.
Figure 1 illustrates the proposed method. Data sets are
transformed into clusters using clustering algorithms, which
are then investigated to identify crime-prone locations.
These clusters graphically depict a collection of crimes
superimposed on a map of the police jurisdiction. combines
the location of crimes in stores with details on the type and
timing of the incidents. The members of these clusters are
used to categorize them. Clusters with a high density of
people become crime hotspots, whereas clusters with less
people are disregarded Depending on the type of offense,
preventive measures are implemented in areas where crime
is a problem. The simplest and most popular clustering
technique in research and commercial applications is K-
means. Large data sets can be clustered using this method
because of its lower processing complexity. Given that
crimes vary greatly in character and that crime databases are
frequently filled with unsolved crimes, we preferred the
clustering technique over any other supervised technique,
such as classification. The model was then examined,
created, and used to sample the data set and train the
algorithm. More than 75% were provided by the K-Means Figure 1. System architecture
Clustering technique. The author applied the theory of
broken windows. 3.2. Outcome
The model was then analysed, pre-processed then put The System “Indian Crime Analysis” has Software is
into practice to train the algorithm and taste the collection of currently available and has been designed specifically for
data. The K-Means Clustering algorithm returned over 75% criminal investigation to perform tasks that no other method
EAI Endorsed Transactions on
Internet of Things
3 | Volume 10 | 2024 |
Sridharan S. et al.
can. Thus, it is clear that despite the fact that several Every predictive model aims to demonstrate the
answers to the issue have been put up, a perfect solution has relationship between a certain predictor and a dependent
been produced for every city, state, and nation for variable. In order for these models to be more accurate, they
every kind of user. The System is precise and would present must be able to recognize and foresee the variety of
the analysis in the form of animate visuals and predict the circumstances that may in the future affect victimization and
crime ratio precisely, if the system is unable to provide crime. Future crime rates are projected in this study in a
accurate results, then it would notify about the unavailability much more thorough and precise manner.
of data or the proximate cause.
4. Test Cases
An overview of forthcoming crime data sets and
algorithmic crime bases may be found in this section. They The output of the proposed model is presented in figure 2 to
assess the crime rate based on a variety of factors, including 5, Figure 2 illustrates the visualization of the data set from
age, gender, location, and monthly ratios. A range of data 2001 to 2015.
sources and methods are used to make predictions, including Case 1:
literature reviews, surveys of common personal data, and Prediction for Prohibition of Child Marriage 2020 in
statistical models that predict future crime trends. Because Haryana as shown in figure 3.
some minority groups were included in the classification of
crimes, which caused data imbalance, the prediction model Case 2:
had a significant miss rate. So, in order to solve the problem, Prediction for murder cases up to 2020 in Tamil Nadu as
we employed random oversampling. By extrapolating using shown in figure 4.
a time series analysis of existing crime trends, predict future
crime trends., algorithm forecast future crime patterns using Case 3:
a time series study of current crime trends., algorithm Using Prediction for ARMS ACT cases up to 2020 in Kerala as
algorithms that extrapolate using a time series study of shown in figure 5.
current crime trends, anticipate future crime trends. The
behaviour of previously recorded data can be used to
forecast future patterns in crime.
Figure 2. Visualization of the data set from 2001 to 2015
EAI Endorsed Transactions on
Internet of Things
4 | Volume 10 | 2024 |
Crime Prediction using Machine Learning
Figure 3. Predicted graph of Child Marriage in Haryana
Figure 4. Predicted graph of Murder case in Tamil Nadu
Figure 5. Predicted graph of ARMS act (2020) in Kerala
EAI Endorsed Transactions on
Internet of Things
5 | Volume 10 | 2024 |
Sridharan S. et al.
Figure 6. Crime prediction for the state with accuracy
Figure 7. crime prediction for overall India and it’s accuracy
References
5. Conclusion
[1] Zakir Hussain, K, Durairaj, M, Farzana, G.R.J.:
It is difficult to use the prediction rate area-specific Criminal behaviour analysis by using data mining
modelling since crime is rare in many places. In that techniques,30-31 March 2012, Nagapattinam, India,
study, a machine learning algorithm was used to construct Proceedings of the International Conference On
Advances In Engineering Science And Management,
and test a model that forecasts crime by age, sex, year,
IEEE, 2012 pp. 1 – 8.
and month. In that study, three distinct machine learning
[2] Kavitha, M, Roobini, S, Systematic View and Impact
methods are employed. To assess the efficacy of K- of Artificial Intelligence in Smart Healthcare Systems,
nearest neighbour, Naive Bayes, and linear regression in Principles, Challenges and Applications, Machine
diverse contexts Figures 1 and 2 show how accurate the Learning and Artificial Intelligence in Healthcare
crime is. While certain linear systems work well and Systems. 2023; 25-56.
provide greater precision, the overall scenario model uses [3] Sathya, R, Ananthi S, Vaidehi K, A Hybrid Location-
K-nearest neighbour as our crime prediction approach dependent Ultra Convolutional Neural Network-based
since it also provides the desired accuracy. We can Vehicle Number Plate Recognition Approach for
determine the stronger with the use of these prediction Intelligent Transportation Systems, Concurrency and
techniques. The algorithm will also show superior Computation: Practice and Experience, 2023; 35:1-25.
accuracy in identifying and locating the places with the [4] Tayebi, M.A, Gla, U, Brantingham, P. L. Learning
where to inspect: location learning for crime
highest incidence of crime. Finally, it makes use of the
prediction, 27-29 May 2015, Baltimore, MD, USA,
CNN algorithm to analyze the photo data and the Google Proceedings of the IEEE International Conference on
API to detect the heated zone.
EAI Endorsed Transactions on
Internet of Things
6 | Volume 10 | 2024 |
Crime Prediction using Machine Learning
Intelligence and Security Informatics (ISI), IEEE,
2015, pp. 25-30.
[5] Akash, S, Prabaharan Poornachandran, Vijay Krishna
Menon, Soman, K.P. Cybersecurity and Secure
Information Systems, Springer Cham, 2019, Chapter
number:12, A Detailed Investigation and Analysis of
Deep Learning Architectures and Visualization
Techniques for Malware Family Identification,
Cybersecurity and Secure Information Systems, pp.
24-46.
[6] Rajesh Kanna, P, Santhi, P, Hybrid Intrusion
Detection using Map Reduce based Black Widow
Optimized Convolutional Long Short-Term Memory
Neural Networks, Expert Systems with Applications,
2022, Vol. 194:(116545).
[7] Rajesh Kanna, P, Santhi, P, Unified Deep Learning
approach for Efficient Intrusion Detection System
using Integrated Spatial–Temporal Features, 2021,
Knowledge-Based Systems Vol. 226:(107132).
[8] Sathyadevan, S., Gangadharan, S.: Crime analysis and
prediction using data mining, Date of conference: 19-
20 August 2014, Location of conference: Guntur,
India, Proceedings of International Conference on
Networks & Soft Computing, IEEE, 2016, pp. 406-
412.
[9] Nath, S. V.: Crime pattern detection using data
mining, 18-22 December 2006, Hong Kong, China,
Proceedings of the International Conference in Web
intelligence and intelligent agent technology, IEEE,
2007, pp. 41-44.
[10] Zhao, X, Tang, J.: Exploring Transfer Learning for
Crime Prediction, 18-21 November 2017, New
Orleans, LA, USA, Proceedings of the International
Conference on Data Mining Workshops, IEEE, 2017,
pp. 1158-1159.
[11] Priyatharishini, M, Nirmala Devi, M.: A deep learning
based malicious module identification using stacked
sparse autoencoder network for VLSI circuit
reliability, Measurement, 2022, Vol. 194(111055).
[12] Shamsuddin, N. H. M., Ali, N. A., Alwee, R.: An
overview on crime prediction method, 23-24 May
2017, Johor, Malaysia, 2017 6th ICT International
Student Project Conference, IEEE, 2017, pp. 1-5.
[13] Sivaranjani, S, Sivakumari, S, Aasha, M.: Crime
prediction and forecasting in Tamil Nadu using
clustering approaches, 21-22 October 2016, Kollam,
India, Proceedings of the International Conference on
Emerging Technological Trends, IEEE, 2016, pp. 1-6.
[14] Kansara, Chirag, E.: Crime mitigation at Twitter using
Big Data analytics and risk modelling, 23-25
December 2016, Jaipur, India, Proceedings of the
IEEE International Conference on Recent Advances
and Innovations in Engineering, IEEE, 2017, pp. 1-8.
EAI Endorsed Transactions on
Internet of Things
7 | Volume 10 | 2024 |