Project Title: A Major Project Report Submitted in Partial Fulfillment of The Requirements For The Degree of

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 25

Project Title

A Major Project Report Submitted


in Partial Fulfillment of the Requirements
for the Degree of

Master of Computer Application

Submitted By

Boby Mahato
<08 >
Ritu Munda
<21>

Faculty of Computing and Information Technology


Usha Martin University, Ranchi
i
BONAFIDE CERTIFICATE

Certified that this project report title CRIME RATE PREDICTION is the
Bonafide work of MR. BOBY MAHATO and MS. RITU MUNDA who carried out the
research under my supervision. Certified further that to the best of my knowledge the
work reported herein does not form part of any other project or dissertation on the
basis of which a degree or award was conferred on an earlier occasion on this or any
other candidate.

ii
DECLARATION

I do hereby declare that this report entitled “Crime Rate Prediction”, submitted by Boby Mahato (08) & Ritu
Munda (21), in the fulfillment of the requirement for the degree of Master of Computer Application to Usha
Martin
University, Ranchi, is my own and it is not submitted to any other institute.

< Boby Mahato>


<08>
<Ritu Munda>
< 21>

iii
CERTIFICATE

This is to certify that entitled “Crime Rate Prediction” being submitted by BOBY MAHATO (08) &

RITU MUNDA (21), in the fulfillment of the requirement for the degree of Master of Computer Application to

Usha Martin University, Ranchi, is a bonafide work carried out under my/our supervision. The matter embodied

in this report is original and has not been submitted for the award of any other degree.

(Sucheta Panda)
Guide

(Sharmistha Roy)
HOD External Examiner

iv
ACKNOWLEDGEMENTS

I express my deepest sense of gratitude to my guide SUCHETA PANDA , Faculty of Computing and
Information
Technology, Usha Martin University, Ranchi, for suggesting the subject of work and constant supervision
throughout this work. His/Her co-operation and timely suggestions have been unparalleled stimuli for me to
travel eventually towards the completion of this project report. Indeed his/her continuous involvement has
helped me in bringing of this project work which otherwise would have remained a distant dream.
I am indeed thankful to < SHARMISTHA ROY >, HOD, Faculty of Computing and Information
Technology, Usha Martin University, Ranchi, for giving me permission to carry out my project work. I would
like to express my gratitude to all teaching and non teaching staff members of Faculty of Computing and
Information Technology, Usha Martin University, Ranchi, for their co-operation in my work.

Boby Mahato
08
Ritu Munda
21

v
Abstract

Crime analysis and prediction is a systematic approach for identifying the crime. This system can predict region
which have high probability for crime occurrences and visualize crime prone area. Using the concept of data
mining we can extract previously unknown, useful information from an unstructured data. The extraction of
new information is predicted using the existing datasets. Crimes are treacherous and common social problem
faced worldwide. Crimes affect the quality of life, economic growth and reputation of nation. With the aim of
securing the society from crimes, there is a need for advanced systems and new approaches for improving the
crime analytics for protecting their communities. We propose a system which can analysis, detect, and predict
various crime probability in given region. This paper explains various types of criminal analysis and crime
prediction using several data mining techniques.

vi
Table of Contents

Declaration ………………………………………………………………………………iii
Certificate ………………………………………………………………………………iv
Acknowledgement ………………………………………………………………………………v
Abstract ………………………………………………………………………………vi
Chapter Topic Name Page No.
1. Introduction

vii
List of Tables
Sl. No. Table Name Page No.
Table 1. Sample Data ………………………………………………………….

List of Figures
Sl. No. Figure Name Page No.
Fig. 1 Software Life cycle model ……………………………………………

viii
Chapter 1
Introduction
1.1 Brief overview of crime and its effect on a country
Crime, like any other definition of word, is not always so simple to define as it may mean differently for
different person. A typical understanding of the word ‘crime’, according to Britannica, can be defined as an act
that is socially harmful or dangerous that is usually prohibited and punishable under criminal law Crime has
been known to be a prevalent social problem that has affected the quality of life and the economic growth of
every country. Now to truly understand the
effects of crime on society, let us dive into why it is a social problem. Firstly, the effects it could have on city is
that
it creates chaos which in turn disrupts the natural order of society. As crime naturally goes against social
conventions, it disrupts many everyday activities from running a business, going shopping or even just walking
outside. Another effect
crime has on society is that it impedes collaboration and trust in a community. As with higher crime rates the
trust toward law enforcement will be affected. Seeing how the law enforcement that was supposed to maintain
the peace has failed to
do their job, the people’s willingness to collaborate will decrease not only towards law enforcement but also
others in
their community. Moving on to economic losses, let us take our night bouring country, Indonesia. It is much
like
Malaysia and has an abundance of natural resources as well as human resources which should have accelerated
the pace of their economy, and yet it was found that the number of crimes may have limited the economic
growth. The growth of Indonesia's economy is usually attributed to the consumption of goods which is directly
influenced by the ability of income sources of households. Other than that, it is also found that foreign
investments also aid in the economic growth of the country as it increases the production capacity of the
country by reducing the basic costs and variable costs of the industrial sector which in turn increases the
purchasing power of the people thus aiding in the increase of the consumption. Though if crime were to
increase it would give investors a bad perspective thus causing fewer investments to be made in the country.
Thus, Kusuma, Hariyani, and Wahyu found that when the number of criminal acts increases it would reduce the
Gross regional domestic product (GRDP) of the country

ix
Figure 1.1 Descriptive Data Analysis of Economic Growth Vs Crime (1980-2011)
Another example of crime affecting the economy is a study done on Pakistan which is also another developing
country. Pakistan’s economy much like Indonesia could benefit from foreign investors but due to the high crime
rates, it may have deterred some investors from investing in Pakistan. Thus, according to the figure shown
below, between the years 1980 to 2011 Ahmad, Ali, and Ahmad (2014) found that the economic growth
fluctuates through the years, but the trend is that as crime increases economic growth decreases.

Reasons to Use Machine Learning in Crime rate Prediction


Machine Learning has been gaining a lot of traction these past few years from being used to forecast future
business investments to being used in medicine. Over the years more and more papers have been popping up on
using machine learning in order to predict crime rate. A paper that did an overview on other papers regarding
crime rate predictions found that a variety of methods from Support Vector Machine (SVM) that were used for
hotspots prediction, Fuzzy Theory, which was used to increase the prediction efficiency, Artificial Neural
Network (ANN) that was used to predict geo-temporal variation of disorder, etc .The reason why this is
important for the police and the civilians is it could potentially aid in reducing crime and increasing the safety
in our country. An instance of this is that in the United States, Pearsall found that every New Year’s Eve there
would be an increase in random gunfire. Hence, by using the data they have gathered over the years, the police
managed to anticipate the location, time, and nature of future incidents. Thus, with this data gathered the police
were put to locations they were able to reduce the cases of random gunfire by 47 percent and increase the
number of weapons seized by 246 percent whilst saving the police department around 15,000 USD in personnel
costs that day.

x
1.2 Problem Statement and motivation

Figure 1.2 Crime index in the Year 2020


Crime rate affects a large amount of people annually in Malaysia. In 2019 crime rate was shown to have
affected an average of 256.6 in a 100,000 people and while there has been a minor decrease compared to the
year before this rate is still relatively high . Hence most Malaysians are afraid to be alone outside or bring out
their valuables with them. So, through the implementation of this system the police can pre-emptively patrol the
highest risk areas, effectively reducing the crime rate and catch more criminals. The next problem statement is
that with the help of this system we could increase the effectiveness of predictive policing. As to analyse all the
data manually would take up a large amount of time and effort making it extremely tedious for the police. With
this the police could inherently reduce what could be days of works to minutes.

1.3: Project Scope and Project Objectives


Project Scope
The scope of this project is to develop a framework that is able to aid the police in predictive policing through
predicting the location, category and time of a future crime with a decent amount of accuracy. The proposed
tool will enable users to characterize and analyse crime data to find the actionable patterns and future trends. It
should also be able to take in large amounts of data to aid the police in analysing the large amounts of data
effectively reducing the time needed for police to go through the data.

xi
Project objectives:
To produce a system that is able predict areas that will have higher crime rates.
To explore and enhance classification algorithms to predict future crime category based on previous crime
trends.
Create a web-based system to allow for easy access to the application

1.4 DOMAIN OVERVIEW


Machine Learning is the most popular technique of predicting the future or classifying information to help
people in making necessary decisions. Machine Learning algorithms are trained over instances or examples
through which they learn from past experiences and also analyze the historical data. Therefore, as it trains over
the examples, again and again, it is able to identify patterns in order to make predictions about the future. Data
is the core backbone of machine learning algorithms. With the help of the historical data, we are able to create
more data by training these machine learning algorithms. For example, Generative Adversarial Networks are an
advanced concept of Machine Learning that learns from the historical images through which they are capable of
generating more images. This is also applied towards speech and text synthesis. Therefore, Machine Learning
has opened up a vast potential for data science applications.

1.4.1 MACHINE LEARNING


Machine Learning combines computer science, mathematics, and statistics. Statistics is essential for drawing
inferences from the data. Mathematics is useful for developing machine learning models and finally, computer
science is used for implementing algorithms. However, simply building models is not enough. You must also
optimize and tune the model appropriately so that it provides you with accurate results. Optimization techniques
involve tuning the hyper parameters to reach an optimum result. The world today is evolving and so are the
needs and requirements of people. Furthermore, we are witnessing a fourth industrial revolution of data. In
order to derive meaningful insights from this data and learn from the way in which people and the
system interface with the data, we need computational algorithms that can churn the data and provide us with
results that would benefit us in various ways. Machine Learning has revolutionized industries like medicine,
healthcare, manufacturing, banking, and several other industries. Therefore, Machine Learning has become an
essential part of modern industry. Data is expanding exponentially and in order to harness the power of this
data, added by the massive increase in computation power, Machine Learning has added another dimension to
the way we perceive information. Machine Learning is being utilized everywhere. The electronic devices you
use, the applications that are part of your everyday life are powered by powerful machine learning algorithms.
With an exponential increase in data, there is a need for having a system that can handle this massive load of
data. Machine Learning models like Deep Learning allow the vast majority of data to be handled with an
xii
accurate generation of predictions. Machine Learning has revolutionized the way we perceive information and
the various insights we can gain out of it. These machine learning algorithms use the patterns contained in the
training data to perform classification and future predictions. Whenever any new input is introduced to the ML
model, it applies its learned patterns over the new data to make future predictions. Based on the final accuracy,
one can optimize their models using various standardized approaches. In this way, Machine Learning model
learns to adapt to new examples and produce better results. Types of Machine Learning Machine Learning
Algorithms can be classified into 3 types as follows –
 Supervised Learning
 Unsupervised Learning
 Reinforcement Learning

Figure1.3 supervised Architecture

SUPERVISED LEARNING
In the majority of supervised learning applications, the ultimate goal is to develop a finely tuned predictor
function h(x) (sometimes called the “hypothesis”). “Learning” consists of using sophisticated mathematical
algorithms to optimize this function so that, given input data x about a certain domain (say, square footage of a
house), it will accurately predict some interesting value h(x) (say, market price for said house).
h(x1, x2, x3, x4) = Ø0+Ø1x1+ Ø2x32+Ø3x3x4+Ø4x13x22+ Ø5x2x34x24
This function takes input in four dimensions and has a variety of polynomial terms. Deriving a
normal equation for this function is a significant challenge. Many modern machine learning problems take
xiii
thousands or even millions of dimensions of data to build predictions using hundreds of coefficients. Predicting
how an organism’s genome will be expressed, or what the climate will be like in fifty years, are examples of
such complex problems.

Under supervised ML, two major subcategories are:


 Regression machine learning systems: Systems where the value being predicted falls somewhere on a
continuous spectrum.
 Classification machine learning systems: Systems where we seek a yes-or-no prediction. In practice, x
almost always represents multiple data points. So, for example, a housing price predictor might take not
only square-
footage (x1) but also number of bedrooms(x2), number of bathrooms (x3), number of floors (x4), year
built(x5), zip code (x6), and so forth. Determining which inputs to use is an important part of ML design.
However,
for the sake of explanation, it is easiest to assume a single input value is used.

Steps Involved in Supervised Learning:


 First Determine the type of training dataset
 Collect/Gather the labelled training data.
 Split the training dataset into training dataset, test dataset, and validation dataset.
 Determine the input features of the training dataset, which should have enough knowledge so that
the model can accurately predict the output.
 Determine the suitable algorithm for the model, such as support vector machine decision tree, etc.
 Execute the algorithm on the training dataset. Sometimes we need validation sets as the control
parameters, which are the subset of training datasets.
 Evaluate the accuracy of the model by providing the test set.

xiv
REGRESSION
Regression algorithms are used if there is a relationship between the input variable and the output
variable. It is used for the prediction of continuous variables, such as Weather forecasting, Market
Trends, etc.
 Linear Regression
 Regression Trees
 Non-Linear Regression
 Bayesian Linear Regression
 Polynomial Regression

CLASSIFICATION
Classification algorithms are used when the output variable is categorical, which
means there are two classes such as Yes-No, Male-Female, True-false, etc.
Spam Filtering,
 Random Forest
 Decision Tree
 Logistic Regression
 Support vector Machines
1.4.2 PROPOSED ALGORITHMS
Decision Tree Classification Algorithm
 Decision Tree is a supervised learning technique that can be used for both classification and regression
problems, but mostly it is preferred for solving Classification problems. It is a tree-structured
classifier, where internal nodes represent the features of a dataset, branches represent the decision rules
and each leaf node represents the outcome.

 In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are
used to make any decision and have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.

 The decisions or the test are performed on the basis of features of the given dataset.
 It is a graphical representation for getting all the possible solutions to a problem/decision based on given
conditions.

 It is called a decision tree because, similar to a tree, it starts with the root node, which expands on
further branches and constructs a tree-like structure.
 In order to build a tree, we use the CART algorithm, which stands for Classification and Regression
Tree algorithm.
 A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into
sub trees.
 Below diagram explains the general structure of a decision tree

xv
Figure-1.3 Structure of decision tree

There are various algorithms in Machine learning, so choosing the best algorithm for the given dataset
and problem is the main point to remember while creating a machine learning model. Below are the two
reasons for using the Decision tree:
 Decision Trees usually mimic human thinking ability while making a decision, so it is easy to
understand.
 The logic behind the decision tree can be easily understood because it shows a tree-like
structure. Decision Tree Terminologies
 Root node is from where the decision tree starts. It represents the entire dataset, which further
gets divided into two or more homogeneous sets.
 Leaf nodes are the final output node, and the tree cannot be segregated further after getting a leaf
node.
 Splitting is the process of dividing the decision node/root node into sub-nodes according to the
given conditions.
 A tree formed by splitting the tree known as branch tree.
 Pruning is the process of removing the unwanted branches from the tree.
 The root node of the tree is called the parent node, and other nodes are called the child nodes.
In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node of the
tree. This algorithm compares the values of root attribute with the record (real dataset) attribute and, based on
the comparison, follows the branch and jumps to the next node. For the next node, the algorithm again
compares the attribute value with the other sub-nodes and move further. It continues the process until it reaches
the leaf node of the tree.

xvi
Python Implementation of Decision Tree
Now we will implement the Decision tree using Python. For this, we will use the dataset "user_data.csv," which
we have used in previous classification models. By using the same dataset, we can compare the Decision tree
classifier with other classification models such as KNN, SVM, Logistic Regression, etc.

Steps will also remain the same, which are given below:
 Data Pre-processing step
 Fitting a Decision-Tree algorithm to the Training set
 Predicting the test result
 Test accuracy of the result (Creation of Confusion matrix)
 Visualizing the test set result.

Advantages of the Decision Tree


 It is simple to understand as it follows the same process which a human follow while making
any decision in real-life.
 It can be very useful for solving decision-related problems.
 It helps to think about all the possible outcomes for a problem.
 There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree


 The decision tree contains lots of layers, which makes it complex.
 It may have an over fitting issue, which can be resolved using the Random Forest algorithm.
 For more class labels, the computational complexity of the decision tree may increase.

RANDOM FOREST ALGORITHM


Random forest algorithm can use both for classification and the regression kind of problems. In this you
are going to learn, how the random forest algorithm works in machine learning for the classification task.
Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique.

It can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the
performance of the model. A random forest algorithm consists of many decision trees. The ‘forest’ generated by
the random forest algorithm is trained through bagging or bootstrap aggregating. Bagging is an ensemble meta-
algorithm that improves the accuracy of machine learning algorithms.

As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various subsets
of the given dataset and takes the average to improve the predictive accuracy of that dataset." Instead of relying
on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of
predictions, and it predicts the final output.

xvii
The below diagram explains the working of the Random Forest algorithm

Fig-1.4 Structure of Random Forest


Below are some points that explain why we should use the Random Forest algorithm:
 It takes less training time as compared to other algorithms.
 It predicts output with high accuracy, even for the large dataset it runs efficiently.
 It can also maintain accuracy when a large proportion of data is missing.

Features of a Random Forest Algorithm


 It’s more accurate than the decision tree algorithm.
 It provides an effective way of handling missing data.
 It can produce a reasonable prediction without hyper-parameter tuning.
 It solves the issue of over fitting in decision trees.
 In every random forest tree, a subset of features is selected randomly at the node’s splitting
point.

xviii
Classification in random forests
Classification in random forests employs an ensemble methodology to attain the outcome. The
training data is fed to train various decision trees. This dataset consists of observations and features that will be
selected randomly during the splitting of nodes. A rain forest system relies on various decision trees. Every
decision tree consists of decision nodes, leaf nodes, and a root node. The leaf node of each tree is the final
output produced by that specific decision tree. The selection of the final output follows the majority-voting
system. In this case, the output chosen by the majority of the decision trees becomes the final output of the rain
forest system. The diagram below shows a simple random forest classifier.

Figure-1.5 Random Forest Classifier

Random Forest Steps


 Randomly select “k” features from total “m” features where k << m
 Among the “k” features, calculate the node “d” using the best split point.
 Split the node into daughter nodes using the best split.
 Repeat 1 to 3 steps until 1 number of nodes has been reached.
 Build forest by repeating steps 1 to 4 for “n” number times to create “n” number of trees.
 The beginning of random forest algorithm starts with randomly selecting “k” features out of total “m”
features.
 In the image, you can observe that we are randomly taking features and observations.

Applications of Random Forest


There are mainly four sectors where Randome forest mostly used:
 Banking: Banking sector mostly uses this algorithm for identification of loan risk.
 Medicine: With the help of this algorithm, disease trends and risks of the disease can be identified.
 Land Use: We can identify the areas of similar land use by this algorithm.
 Marketing: Marketing trends can be identified using this algorithm.

xix
Advantages of Random Forest
 Random Forest is capable of performing both Classification and Regression tasks.
 It is capable of handling large datasets with high dimensionality.
 It enhances the accuracy of the model and prevents the over fitting issue.
Disadvantages of Random Forest
 Although random forest can be used for both classification and regression tasks, it is not more suitable
for Regression tasks.

LOGISTIC REGRESSION
 Logistic regression is one of the most popular Machine Learning algorithms, which comes under the
Supervised Learning technique. It is used for predicting the categorical dependent variable using a
given set of independent variables.
 Logistic regression predicts the output of a categorical dependent variable. There for the outcome
must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but
instead of giving the exact value as 0 and1, it gives the probabilistic values which lie between 0 and
 Logistic Regression is much similar to the Linear Regression except that how they are used. Linear
Regression is used for solving Regression problems, whereas Logistic regression is used for solving
the classification problems.
 In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function,
which predicts two maximum values (0 or 1).
 The curve from the logistic function indicates the likelihood of something such as whether the cells
are cancerous or not, a mouse is obese or not based on its weight, etc.
 Logistic Regression is a significant machine learning algorithm because it has the ability to provide
probabilities and classify new data using continuous and discrete datasets.
 Logistic Regression can be used to classify the observations using different types of data and can
easily determine the most effective variables used for the classification.

Fig-1.6 Logistic Regression graph

xx
CHAPTER 2
LITERATURE SURVEY

2.1 ANALYSIS OF THE LITERATURE

Literature survey is the main advance in programming improvement measure. Prior to building up the
instrument it is important to decide the time factor, economy and friends strength. When these things are
fulfilled, at that point the subsequent stage is to figure out which working framework and language can be
utilized for building up the device. When the developers begin assembling the apparatus the software engineers
need parcel of outer help. This help can be gotten from senior developers, from book or from sites. The major
part of the project development sector considers and fully survey all the required needs for developing the
project. Before developing the tools and the associated designing it is necessary to determine and survey the
time factor, resource requirement, man power, economy, and company strength. Prior to building the framework
the above thought are considered for building up the proposed framework. The significant piece of the
undertaking advancement area considers and completely survey all the necessary requirements for building up
the venture. For each undertaking Literature survey is the main area in programming improvement measure.
Prior to building up the instruments and the related planning it is important to decide and survey the time factor,
asset prerequisite, labor, economy, and friends strength. When these things are fulfilled and completely
surveyed, at that point the following stage is to decide about the product details in the separate framework, for
example, what kind of working framework the venture would require and what are largely the important
programming are expected to continue with the subsequent stage like building up the apparatuses, and the
related activities. Here we have taken the general surveys of different creators and noted down the fundamental
central issues with respect to their work. In this venture literature survey assumes a prevailing part in get assets
from different areas and all the connected points that are exceptionally valuable under this segment. The most
awesome aspect if this is the manner in which things get all together and encourages us to suite our work
according to the current information.

xxi
2.2 LITERARY REVIEWS
“An Exploration of Crime Prediction Using Data Mining on Open Data”Ginger Saltos and Minhaela
Cocea (2017)
The increase in crime data recording coupled with data analytics resulted in the growth of research
Approaches aimed at extracting knowledge from crime records to better understand criminal behavior and
ultimately
prevent future crimes. While many of these approaches make use of clustering and association rule mining
techniques, there are fewer approaches focusing on predictive models of crime. In this paper, we explore
models for predicting the frequency of several types of crimes by LSOA code (Lower Layer Super Output
Areas — an administrative system of areas used by the UK police) and the frequency of anti-social behavior
crimes. Three algorithms are used from different categories of approaches: instance-based learning, regression
and decision trees. The data are from the UK police and contain over 600,000 records before preprocessing.
The results, looking at predictive performance as well as processing time, indicate that decision trees (M5P
algorithm) can be used to reliably predict crime frequency in general as well as anti-social behavior frequency.
The experiments were conducted using the SCIAMA High Performance Computer Cluster at the University of
Portsmouth.

“Crime Analysis and Prediction Using Data Mining” Shiju Sathyadevan, Devan M.S, Surya
Gangadharan (IEEE-2014)
Crime analysis and prevention is a systematic approach for identifying and analyzing patterns and trends in
crime. Our system can predict regions which have high probability for crime occurrence and can visualize
crime prone areas. With the increasing advent of computerized systems, crime data analysts can help the Law
enforcement officers to speed up the process of solving crimes. Using the concept of data mining we can extract
previously unknown, useful information from an unstructured data. Here we have approach between computer
science and criminal justice to develop a data mining procedure that can help solve crimes faster. Instead of
focusing on causes of crime occurrence like criminal background of offender, political enmity etc we are
focusing mainly on crime factors of each day. This paper has tested the accuracy of classification and prediction
based on different test sets. Classification is done based on the Bayes theorem which showed more than 90%
accuracy.

xxii
“Crime Pattern Analysis, Visualization and Prediction Using Data Mining”
Rajkumar Sakkarai Soundarya Jagan. J Varnikasree P (2015)

Crime against women these days has become problem of every nation around the globe many countries are
trying to curb this problem. Preventive are taken to reduce the increasing number of cases of crime against
women. A huge amount of data set is generated every year on the basis of reporting of crime. This data can
prove very useful in analyzing and predicting crime and help us prevent the crime to some extent. Crime
analysis is an area of vital importance in police department. Study of crime data can help us analyze crime
pattern, inter-related clues& important hidden relations between the crimes. That is why data mining can be
great aid to analyze, visualize and predict crime using crime data set. Classification and correlation of data set
makes it easy to understand similarities & dissimilarities amongst the data objects. We group data objects using
clustering technique. Dataset is classified on the basis of some predefined condition. Here grouping is done
according to various types of crimes against women taking place in different states and cities of India. Crime
mapping will help the administration to plan strategies for prevention of crime, further using data mining
technique data can be predicted and visualized in various form in order to provide better understanding of crime
patterns.

“Survey paper on crime prediction using ensemble approach” Ayisheshim


Alma, Kalyani kadam (2018)
Crime is a foremost problem where the top priority has been concerned by individual, the community and
government. This paper investigates a number of data mining algorithms and ensemble learning which are
applied on crime data mining. This survey paper describes a summary of the methods and techniques which are
implemented in crime data analysis and prediction. Crime forecasting is a way of trying to mining out and
decreasing the upcoming crimes by forecasting the future crime that will occur. Crime prediction practices
historical data and after examining data, predict the upcoming crime with respect to location, time, day, season
and year. In present crime cases rapidly increases so it is an inspiring task to foresee upcoming crimes closely
with better accuracy. Data mining methods are too important to resolving crime problem with investigating
hidden crime patterns.so the objective of this study could be analyzing and discussing various methods which
are applied on crime prediction and analysis. This paper delivers reasonable investigation of Data mining
Techniques and ensemble classification techniques for discovery and prediction of upcoming crime.

“Survey on crime analysis and predict ion using data mining techniques”
Benjamin Fredrick David. H and Suruliand I (2017)
Data Mining is the procedure which includes evaluating and examining large pre-existing databases in order to

xxiii
generate new information which may be essential to the organization. The extraction of new information is
predicted using the existing datasets. Many approaches for analysis and prediction in data mining had been
performed. But, many few efforts has made in the criminology field. Many few have taken efforts for
comparing the information all these approaches produce. The police stations and other similar criminal justice
agencies hold many large databases of information which can be used to predict or analyze the criminal
movements and criminal activity involvement in the society. The criminals can also be predicted based on the
crime data. The main aim of this work is to perform a survey on the supervised learning and unsupervised
learning techniques that has been applied towards criminal identification. This paper presents the survey on the
Crime analysis and crime prediction using several Data Mining techniques. The quantitative analysis produced
results which shows the increase in the Accuracy level of classification because of using the GA to optimize the
parameters.

“Systematic Literature Review of Crime Prediction and Data Mining” Falade


Adesola and Ambrose Azeta (2019)
Using crime datasets requires different strategies for the varying types of data that describe illicit activity.
Falade et al. (2019) provide a survey of crime prediction efforts wherein various machine learning methods
have been applied to multiple types of datasets: criminal records, social media, news, and police reports. The
authors note the different opportunities and challenges that each type of crime dataset presents, such as
social media posts being highly unstructured and First Information Reports (FIRs) being unstructured but
reliable. This paper is explains techniques used, challenges addressed, methodologies used, and crime data
mining and analysis paper. The methodologies is composed of three stages the first stage involves the research
work related to crime data mining, second stage is concerned with establishing a classification and the third
stage is involves the presentation of summary of research in crime data mining and analysis and report of this
survey.

“Crime Detection Techniques Using data Mining and K-Means” Khushabu


A.Bokde, Tisksha P. Kakade, Dnyanes hwari S. Tumsare Chetan G. Wadhai(2018

Crimes will somehow influence organizations and institutions when occurred frequently in a society. Thus, it
seems necessary to study reasons, factors and relations between occurrence of different crimes and finding the
most appropriate ways to control and avoid more crimes. The main objective of this paper is to classify
clustered crimes based on occurrence frequency during different years. Data mining is used extensively in terms
of analysis, investigation and discovery of patterns for occurrence of different crimes. We applied a theoretical
model based on data mining techniques such as clustering and classification to real crime dataset recorded by

xxiv
police in England and Wales within 1990 to 2011. We assigned weights to the features in order to improve the
quality of the model and remove low value of them. The Genetic Algorithm (GA) is used for optimizing of
Outlier Detection operator parameters using Rapid Miner tool.

“Empirical Analysis for Crime Prediction and Forecasting Using Machine Learning
and Deep Learning Techniques” Wajiha Safat, Sohail Asghar, Saira Andleeb Gillani (IEEE-2021)

Crime and violation are the threat to justice and meant to be controlled. Accurate crime prediction and future
forecasting trends can assist to enhance metropolitan safety computationally. The limited ability of humans to
process complex information from big data hinders the early and accurate prediction and forecasting of crime.
The accurate estimation of the crime rate, types and hot spots from past patterns creates many computational
challenges and opportunities. Despite considerable research efforts, yet there is a need to have a better
predictive algorithm, which direct police patrols toward criminal activities. Previous studies are lacking to
achieve crime forecasting and prediction accuracy based on learning models. Therefore, this study applied
different machine learning algorithms, namely, the logistic regression, support vector machine (SVM), Naïve
Bayes, k-nearest neighbors (KNN), decision tree, multilayer perceptron (MLP), random forest, and extreme
Gradient Boosting, and time series analysis by long- short term memory (LSTM) and autoregressive integrated
moving average (ARIMA) model to better fit the crime data. The performance of LSTM for time series analysis
was reasonably adequate in order of magnitude of root mean square error (RMSE) and mean absolute error
(MAE), on both data sets. Exploratory data analysis predicts more than 35 crime types and overall, these results
provide early identification of crime, hot spots with higher crime rate.

xxv

You might also like