Fulldoc - Dsec Mca - Crime Prediction

ANALYSIS AND ANTICIPATING OF HIGH RISK USING AI TECHNIQUES
ABSTRACT
An incredible increase in urban population over the past three decades has created a
desire for a safe, friendly, and sustainable society. The management of urbanization continues
to be a significant concern for administrative authorities due to the city's ever-expanding
population, which is devouring suburbs and rural areas. Cities are becoming overcrowded;
forcing governments to launch smart city programs that would assist improve infrastructure
management and address the key issues of development, sustainability, and security. Despite
the enormous momentum that smart city efforts have gathered and the promises they make
about improving quality of life, they do have some difficult elements as well. Public safety is
one of the biggest obstacles to living in a smart city. To create a healthy society, crime rates
must be identified and reduced. Big Data approaches are used to gather and analyze data in
order to identify the necessary characteristics and key traits that lead to the creation of crime
prediction. Traditional crime detection and machine learning-based algorithms frequently
struggle to accurately forecast crime trends because they are unable to produce important
prime qualities from the crime dataset. In order to improve the subject machine learning
algorithm's accuracy, this system aims to extract the key features such as time zones, crime
likelihood, and crime types. As an alternative to current modelling methodologies, we may
use the Support Vector Machine algorithm in this project to construct a framework for
identifying and forecast crime. SVM belongs to the new generation of machine learning
methods used to determine the best class separation inside datasets. Using geographical
datasets gathered from Kaggle Data sources, experiments demonstrate that SVMs produce
accurate results when used with a Python tool.
1. INTRODUCTION
1.1 DOMAIN INSTRUCTION
In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which work
on our instructions. Machine Learning is said as a subset of Artificial Intelligence that is
mainly concerned with the development of algorithms which allow a computer to learn from
the data and past experiences on their own. The term machine learning was first introduced
by Arthur Samuel in 1959. Machine Learning is a growing technology which enable
computers the capability to learn without being explicitly programmed.
ML is one of the most exciting technologies that one would have ever come across.
But can a machine also learn from experiences or past data like a human does? So here comes
the role of Machine Learning. As it is evident from the name, it gives the computer that
makes it more similar to humans: The ability to learn. Machine learning is actively being
used today, perhaps in many more places than one would expect. Machine learning uses
various algorithms for building mathematical models and making predictions using historical
data or information. Currently, it is being used for various tasks such as image
recognition, speech recognition, email filtering, Facebook auto-tagging, recommender
system, and many more.
HOW DOES IT WORK?
A Machine Learning system learns from historical data, builds the prediction
models, and whenever it receives new data, predicts the output for it. The accuracy of
predicted output depends upon the amount of data, as the huge amount of data helps to build
a better model which predicts the output more accurately. Suppose we have a complex
problem, where we need to perform some predictions, so instead of writing a code for it, we
just need to feed the data to generic algorithms, and with the help of these algorithms,
machine builds the logic as per the data and predict the output. Machine learning has changed
our way of thinking about the problem. The below block diagram explains the working of
Machine Learning algorithm:
NEED FOR MACHINE LEARNING
The need for machine learning is increasing day by day and the reason behind is that
it is capable of doing tasks that are too complex for a person to implement directly. As a
human, we have some limitations as we cannot access the huge amount of data manually, so
for this, we need some computer systems and here comes the machine learning to make
things easy for us. We can train machine learning algorithms by providing them the huge
amount of data and let them explore the data, construct the models, and predict the required
output automatically.
The performance of the machine learning algorithm depends on the amount of data,
and it can be determined by the cost function. With the help of machine learning, we can save
both time and money. The importance of machine learning can be easily understood by its
uses cases, currently, machine learning is used in self-driving cars, cyber fraud detection, face
recognition, and friend suggestion by Facebook, etc. Various top companies such as Netflix
and Amazon have build machine learning models that are using a vast amount of data to
analyze the user interest and recommend product accordingly.
TYPES OF MACHINE LEARNING

Machine Learning can classify into following types:
 Supervised Learning
 Unsupervised Learning
 Reinforcement Learning (Semi-structured Learning)
SUPERVISED LEARNING
Supervised learning is a type of machine learning method .we provide sample labelled
data to the machine learning system in order to train it, and on that basis, it predicts the
output. The system creates a model using labelled data to understand the datasets and learn
about each data, once the training and processing are done then we test the model by
providing a sample data to check whether it is predicting the exact output or not. The goal of
supervised learning is to map input data with the output data. The supervised learning is
based on supervision, and it is the same as when a student learns things in the supervision of
the teacher. The example of supervised learning is spam filtering. Supervised learning can be
grouped further in two categories of algorithms:
 Classification
 Regression
CLASSIFICATION
Classification is a process of categorizing a given set of data into classes, It can be

performed on both structured or unstructured data. The process starts with predicting the class
of given data points. The classes are often referred to as target, label or categories. The
classification predictive modelling is the task of approximating the mapping function from
input variables to discrete output variables. The main goal is to identify which class/category
the new data will fall into. Let us try to understand this with a simple example. Heart disease
detection can be identified as a classification problem, this is a binary classification since
there can be only two classes i.e has heart disease or does not have heart disease. The
classifier, in this case, needs training data to understand how the given input variables are
related to the class. And once the classifier is trained accurately, it can be used to detect
whether heart disease is there or not for a particular patient. Since classification is a type
of supervised learning, even the targets are also provided with the input data. Let us get
familiar with the classification in machine learning terminologies.
Classification Terminologies in Machine Learning
 Classifier – It is an algorithm that is used to map the input data to a specific category.
 Classification Model – The model predicts or draws a conclusion to the input data
given for training, it will predict the class or category for the data.
 Feature – A feature is an individual measurable property of the phenomenon being
observed.
 Binary Classification – It is a type of classification with two outcomes, for example:
Either true or false.
 Multi-Class Classification – The classification with more than two classes, in multi-
class classification each sample is assigned to one and only one label or target.
REGRESSION
The main goal of regression is the construction of an efficient model to predict the
dependent attributes from a bunch of attribute variables. A regression problem is when the
output variable is either real or a continuous value i.e. salary, weight, area, etc. We can also
define regression as a statistical means that is used in applications like housing, investing, etc.
It is used to predict the relationship between a dependent variable and a bunch of independent
variables. Let us take a look at various types of regression techniques.
REGRESSION TYPES
Simple Linear Regression - One of the most interesting and common regression technique is
simple linear regression. In this, we predict the outcome of a dependent variable based on the
independent variables, the relationship between the variables is linear.
Polynomial Regression - In this regression technique, we transform the original features into
polynomial features of a given degree and then perform regression on it.
Support Vector Regression - For support vector machine regression or SVR, we identify a
hyper plane with maximum margin such that the maximum number of data points are within
those margins. It is quite similar to the support vector machine classification algorithm.
Decision Tree Regression - A decision tree can be used for both regression
and classification. use the ID3 algorithm(Iterative Dichotomise 3) to identify the splitting
node by reducing the standard deviation.
Random Forest Regression - In random forest regression, we ensemble the predictions of

several decision tree regressions. Now that we know about different types of regression let us
take a look at simple linear regression in detail.
UNSUPERVISED LEARNING
Unsupervised learning is a learning method in which a machine learns without any

supervision. The training is provided to the machine with the set of data that has not been
labelled, classified, or categorized, and the algorithm needs to act on that data without any
supervision. The goal of unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns. In unsupervised learning, we don't have a
predetermined result. The machine tries to find useful insights from the huge amount of data.
It can be further classifieds into two categories of algorithms:
 Clustering
 Association
CLUSTERING
Clustering is the task of dividing the population or data points into a number of
groups such that data points in the same groups are more similar to other data points in the
same group and dissimilar to the data points in other groups. It is basically a collection of
objects on the basis of similarity and dissimilarity between them. Clustering is very much
important as it determines the intrinsic grouping among the unlabelled data present. There
are no criteria for good clustering. It depends on the user, what is the criteria they may use
which satisfy their need.
CLUSTERING METHOD:
 Density-Based Methods: These methods consider the clusters as the dense region
having some similarities and differences from the lower dense region of the space.
These methods have good accuracy and the ability to merge two clusters.
 Hierarchical Based Methods: The clusters formed in this method form a tree-type
structure based on the hierarchy. New clusters are formed using the previously
formed one. It is divided into two category.
 Partitioning Methods: These methods partition the objects into k clusters and each
partition forms one cluster. This method is used to optimize an objective criterion
similarity function such as when the distance is a major parameter.
 Grid-based Methods: In this method, the data space is formulated into a finite
number of cells that form a grid-like structure. All the clustering operations done on
these grids are fast and independent of the number of data objects
APPLICATIONS OF CLUSTERING IN DIFFERENT FIELDS
 Marketing - It can be used to characterize & discover customer segments for

marketing purposes.
 Biology - It can be used for classification among different species of plants and
animals.
 Libraries - It is used in clustering different books on the basis of topics and
information.
 Insurance - It is used to acknowledge the customers, their policies and identifying
the frauds.
ASSOCIATION
Association rule learning is a type of unsupervised learning technique that checks for
the dependency of one data item on another data item and maps accordingly so that it can be
more profitable. It tries to find some interesting relations or associations among the variables
of dataset. It is based on different rules to discover the interesting relations between variables
in the database.
The association rule learning is one of the very important concepts of machine
learning, and it is employed in Market Basket analysis, Web usage mining, continuous
production, etc. Here market basket analysis is a technique used by the various big retailers
to discover the associations between items. We can understand it by taking an example of a
supermarket, as in a supermarket, all products that are purchased together is put together. For
example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so these
products are stored within a shelf or mostly nearby
Applications of Association Rule Learning
It has various applications in machine learning and data mining. Below are some
popular applications of association rule learning:
 Market Basket Analysis - It is one of the popular examples and applications of
association rule mining. This technique is commonly used by big retailers to
determine the association between items.
 Medical Diagnosis - With the help of association rules, patients can be cured
easily, as it helps in identifying the probability of illness for a particular disease.
 Protein Sequence - The association rules help in determining the synthesis of
artificial Proteins.
 It is also used for the Catalog Design and Loss-leader Analysis and many more
other applications.
REINFORCEMENT LEARNING
Reinforcement learning is a feedback-based learning method, in which a learning

agent gets a reward for each right action and gets a penalty for each wrong action. The agent
learns automatically with these feedbacks and improves its performance. In reinforcement
learning, the agent interacts with the environment and explores it. The goal of an agent is to
get the most reward points, and hence, it improves its performance. The robotic dog, which
automatically learns the movement of his arms, is an example of Reinforcement learning.
RL can be used in large environments in the following situations:
 A model of the environment is known, but an analytic solution is not available;

 Only a simulation model of the environment is given (the subject of simulation-
based optimization)
 The only way to collect information about the environment is to interact with it.
Types of Reinforcement:
1. Positive –
Positive Reinforcement is defined as when an event, occurs due to a particular
behaviour, increases the strength and the frequency of the behaviour. In other words, it
has a positive effect on behaviour.
2. Negative –
Negative Reinforcement is defined as strengthening of behavior because a negative
condition is stopped or avoided.
Applications of Reinforcement Learning –

 RL can be used in robotics for industrial automation.
 RL can be used in machine learning and data processing
 RL can be used to create training systems that provide custom instruction
and materials according to the requirement of students.
APPLICATIONS OF MACHINE LEARNING
Machine learning is a buzzword for today's technology, and it is growing very rapidly
day by day. We are using machine learning in our daily life even without knowing it such as
Google Maps, Google assistant, Alexa, etc.
IMAGE RECOGNITION - Image recognition is one of the most common applications of

machine learning. It is used to identify objects, persons, places, digital images, etc. The
popular use case of image recognition and face detection is, Automatic friend tagging
suggestion: Facebook provides us a feature of auto friend tagging suggestion. Whenever we
upload a photo with our Facebook friends, then we automatically get a tagging suggestion
with name, and the technology behind this is machine learning's face
detection and recognition algorithm. It is based on the Facebook project named "Deep Face,"
which is responsible for face recognition and person identification in the picture.
SPEECH RECOGNITION - At present, machine learning algorithms are widely used by

various applications of speech recognition. Google Assistant, Siri, Cortana, and Alexa are
using speech recognition technology to follow the voice instructions. While using Google, we
get an option of "Search by voice," it comes under speech recognition, and it's a popular
application of machine learning. Speech recognition is a process of converting voice
instructions into text, and it is also known as "Speech to text", or "Computer speech
recognition." At present, machine learning algorithms are widely used by various applications
of speech recognition. Google assistant, Siri, Cortana, and Alexa are using speech recognition
technology to follow the voice instructions.
PRODUCT RECOMMENDATION - Machine learning is widely used by various e-

commerce and entertainment companies such as Amazon, Netflix, etc., for product
recommendation to the user. Whenever we search for some product on Amazon, then we
started getting an advertisement for the same product while internet surfing on the same
browser and this is because of machine learning. Google understands the user interest using
various machine learning algorithms and suggests the product as per customer interest. As
similar, when we use Netflix, we find some recommendations for entertainment series,
movies, etc., and this is also done with the help of machine learning.
EMAIL SPAM AND MALWARE FILTERING - Whenever we receive a new email, it is

filtered automatically as important, normal, and spam. We always receive an important mail
in our inbox with the important symbol and spam emails in our spam box, and the technology
behind this is Machine learning.. Some machine learning algorithms such as Multi-Layer
Perceptron, Decision tree, and Naïve Bayes classifier are used for email spam filtering and
malware detection. Below are some spam filters used by Gmail
 Content Filter
 Header filter
 General blacklists filter
 Rules-based filters
 Permission filters
AUTOMATIC LANGUAGE TRANSLATION - Nowadays, if we visit a new place and we
are not aware of the language then it is not a problem at all, as for this also machine learning
helps us by converting the text into our known languages. Google's GNMT (Google Neural
Machine Translation) provide this feature, which is a Neural Machine Learning that translates
the text into our familiar language, and it called as automatic translation. The technology
behind the automatic translation is a sequence to sequence learning algorithm, which is used
with image recognition and translates the text from one language to another language.
ONLINE FRAUD DETECTION:
Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various ways that a
fraudulent transaction can take place such as fake accounts, fake ids, and steal money in the
middle of a transaction. So to detect this, Feed Forward Neural network helps us by checking
whether it is a genuine transaction or a fraud transaction. For each genuine transaction, the
output is converted into some hash values, and these values become the input for the next
round. For each genuine transaction, there is a specific pattern which gets change for the
fraud transaction hence, it detects it and makes our online transactions more secure.
TRAFFIC PREDICTION - If we want to visit a new place, we take help of Google Maps,
which shows us the correct path with the shortest route and predicts the traffic conditions.
Everyone who is using Google Map is helping this app to make it better. It takes information
from the user and sends back to its database to improve the performance. It predicts the
traffic conditions such as whether traffic is cleared, slow-moving, or heavily congested with
the help of two ways:
2. LITERATURE SURVEY
2.1 TITLE: SURVEY OF ANALYSIS OF CRIME DETECTION TECHNIQUES

USING DATA MINING AND MACHINE LEARNING
AUTHOR: SARAVANAN, P., ET AL
This paper implemented a data mining procedure is to analyze data from an

informational collection to change it into a reasonable structure for additional utilization. It
predicts future patterns and also enables the organization to make the learning driven
decision. Generally utilized strategies for mining of data are artificial neural networks,
decision tree, rule induction, nearest neighbor method and genetic algorithm. They are
applied in many fields. One such interesting application is crime investigation. A crime is an
unlawful activity for which a man can be penalized by law. Crime against a person is called
personal crime like murder, robbery, etc. Property crime means theft of property. Crime
analysis is a law implementation task which includes an organized analysis that recognizes
and determines the pattern of crime. Crime can be classified into different types but, in this,
we focused on four types of crime i.e. Fraud detection, traffic violence, violent crime, web
crime and sexual offense. The various techniques used for different crimes have been
discussed with an introduction to the concerned crime.
2.2 TITLE: Grid-based crime prediction using geographical features
AUTHOR: Lin, Ying-Lung, Meng-Feng Yen
This paper implemented two traditional approaches such as spatial-temporal models
and empirical models. Spatial-temporal models are commonly used for crime prevention,
including Kernel Density Estimation (KDE) and Time Series. However, these two methods
only consider time or space independently, but crime is affected by more than one factor.
Empirical models are similar in some ways to machine-learning models. When we consider
each person’s experience as a model, we can use machine learning to consider certain issues,
such as the amount of training time required for the models, the effectiveness of model
learning, and the various types and scope of learning. Training a senior police officer takes
significant amounts of time, and is restricted by limited time and learning capacity. In random
decision forests, a given tree only deals with certain features. A senior police officer who is
familiar with their own jurisdiction can effectively work to prevent and investigate crime.
However, an officer without this local knowledge and familiarity must develop it. This
analogy helps to promote a holistic perspective for crime prevention, which is beyond the
scope of any single empirical model. Thus, how can empirical models best be used to guide
policing? A commonly used approach is to expose police offers to key features from typical
cases, but this approach is time- and labor-intensive, and outcomes are dependent on
individual learning capacity. The requirements of citywide crime prevention emphasize the
difficulty of integrating empirical models. The value of empirical models is beyond question,
as is their ability to assess complex heterogeneous data, but we hope to make more efficient
use of individual experience by gradually transforming the experience of front-line law
enforcement officers and criminology theory into machine-processible features.
2.3 TITLE: Technologies of crime prediction: The reception of algorithms in policing and
criminal courts
AUTHOR: Brayne, Sarah, and Angèle Christin
This paper concluded by discussing the implications of these findings for research on
technology and in-equality in criminal justice. Whereas the current wave of critical
scholarship on algorithmic bias often leans upon technological deterministic narratives in
order to make social justice claims, here we focus on the social and institutional contexts
within which such predictive systems are deployed and negotiated. In the process, we show
that these tools acquire political nuance and meaning through practice, which can lead to
unanticipated or undesirable outcomes: forms of workplace surveillance and the displacement
of discretion to less accountable places. We argue that this sheds new light on the
transformations of police and judicial discretion – with important consequences for social and
racial inequality – in the age of big data. Given the rationalizing impetus that guides the
adoption of algorithmic technologies in the criminal justice context, these profound changes
lead us to raise the question of the reception of predictive algorithms in the context of law
enforcement and criminal courts. Although there is strong theoretical work in surveillance
studies that focuses on the possibilities, good and bad, of new forms of algorithmic decision-
making, there is a dearth of empirical work on the social contexts of their reception in
policing and courts.
2.4 TITLE: Machine learning for risk assessment in gender-based crime.
AUTHOR: González-Prieto,
This paper propose a hybrid model that combines the statistical prediction methods with the
ML method, permitting authorities to implement a smooth transition from the preexisting
model to the ML-based model. This hybrid nature enables a decision-making process to
optimally balance between the efficiency of the police system and aggressiveness of the
protection measures taken. Despite the apparent regular occurrence of crime, as it was
already recognized in the 19th century, it has defied the predictability provided by the
scientific method in the Natural Sciences. Surprisingly, it is easier to accurately predict where
a rocket will be after its launch in its way to a distant planet than to foresee the next victim of
an offense. The unpredictable nature of crime arises the question of whether the classical
scientific method can be a solving tool instead of only a descriptive framework. Given a large
amount of structured data about IPVAW cases, we will apply ML techniques to develop
novel models of risk assessment of recidivism of a victim, understood as the probability that a
female victim, who has been offended and has reported her case, is aggressed again. In our
case, the data will be provided by the Spanish VioGen system, a governmental program for
tracking and controlling gender violence, but the approach and applied methods are general
and can be straightforwardly translated to other data sources.
2.5 TITLE: Economic crime detection using support vector machine classification
AUTHOR: Krysovatyy, Andriy
This paper implemented method of detecting a fictitious enterprise based on the

Support Vector Machine is proposed, which allows to quickly track fictitious enterprises,
which is useful for public sector employees to prevent economic crimes. Fictitious business –
is the creation or acquisition of business entities in order to cover up illegal activities or
activities that are prohibited. Investigation of economic crime takes a lot of time for law
enforcement officers, so in this regard, the development of an algorithm for detecting a
fictitious enterprise based on the classic method of machine learning, namely Support Vector
Machine Classification, will develop a single software environment for rapid detection of
economic crimes. The main reasons for the emergence and existence of economic crime and
fictitious entrepreneurship are: imperfection of legislation governing economic activity, high
levels of corruption, bondage of taxes, control of corrupt individuals in major industries, low
professional level of law enforcement officers in detecting, documenting, investigating these
crimes. However, the investigation of economic crime often takes a lot of time for law
enforcement officers, so in this regard, the development of an algorithm for detecting a
fictitious enterprise based on the classical method of machine learning, namely Support
Vector Machine Classification, will develop a single software environment that is one of the
most promising areas for the rapid detection of economic crimes
3. SYSTEM ANALYSIS
3.1 EXISTING SYSTEM
The existing system for crime rate prediction using machine learning algorithms aims
to analyze historical crime data and build predictive models that can forecast crime rates in
specific. This system utilizes various machine learning algorithms and techniques to analyze
the data and make accurate predictions. The process begins with data collection, where
historical crime data from different sources, such as police records, crime databases, and
public reports, is gathered. The existing system for crime rate prediction using machine
learning algorithms has shown promising results in forecasting crime rates By utilizing
historical crime data and leveraging the power of machine learning, this system provides
valuable insights for law enforcement agencies and policymakers. The accuracy and
reliability of crime rate predictions heavily rely on the quality and representativeness of the
historical crime data. If the data used for training the models is biased or incomplete, it can
lead to biased predictions. Crime patterns can change over time due to various factors such as
socio-economic changes, policy interventions, or community dynamics. Machine learning
models trained on historical data may struggle to adapt to these changing patterns and may
not accurately predict future crime rates. Machine learning models trained on data may not
generalize well to other regions or future time periods. The models may fail to capture unique
local factors or fail to adapt to changes in crime patterns over time
3.1.1 DISADVANTAGES
 Biased or incomplete historical crime data can lead to biased predictions.

 Predictions heavily rely on historical data, which may not accurately reflect current or
future crime patterns.
 Models may struggle to adapt to changing crime patterns over time
3.2 PROPOSED SYSTEM
Crime is a complex social issue impacting a considerable number of individuals

within a society. Preventing and reducing crime is a top priority in many countries. Given
limited policing and crime reduction resources, it is often crucial to identify effective
strategies to deploy the available resources. Towards this goal, crime hotspot prediction has
previously been suggested. Crime hotspot prediction leverages past data in order to identify
geographical areas susceptible of hosting crimes in the future. However, most of the existing
techniques in crime hotspot prediction solely use historical crime records to identify crime
hotspots, while ignoring the predictive power of other data such as urban or social media
data. In criminological studies, crime hotspot map is a popular analytical tool to discover and
visualize the areas of high concentrations of crime (i.e., crime hotspots). In practice, they are
widely used as a basic form of crime prediction, relying on retrospective criminal activity
data to identify whether a geographical area is likely to be a crime scene or not. Such a crime
hotspot map would be beneficial not only for urban authorities when taking decisions on
where to deploy policing and other crime reduction resources, but also for citizens when
examining the public safety in their neighbourhoods and in the places, they plan to travel.
Therefore, it is of critical importance for both urban authorities and citizens to have a crime
hotspot map with high predictive ability that accurately reflects the crime risk across places.
In this project, we propose system in Python Application, which predicts and visualizes crime
hotspots based on a fusion of different data types. Our platform continuously collects crime
data as well as urban data from Kaggle Website. It then extracts key features from the
collected data based on both statistical and linguistic analysis. Finally, it identifies crime
hotspots by leveraging the extracted features, and offers visualizations of the hotspots on an
interactive map
3.2.1 ADVANTAGES
 Predict the hotspot accurately

 Handle large number of datasets
 Multiple classification can be occurred
 Time and computational complexity can be reduced
4. SYSTEM SPECIFICATIONS
4.1 HARDWARE REQUIREMENTS
 Processor : Dual core processor 2.6.0 GHZ

 RAM : 4GB
 Hard disk : 320 GB
 Compact Disk : 650 Mb
 Keyboard : Standard keyboard
4.2 SOFTWARE REQUIREMENTS
 Operating system : Windows OS

 Front End : Python
 Tool : Pycharm
5. SYSTEM IMPLEMENTATION
5.1 MODULES
 Datasets Acquisition
 Preprocessing
 Features Extraction
 Classification
5.2 MODULE DESCRIPTION
DATASETS ACQUISITION
A data set (or dataset, although this spelling is not present in many contemporary
dictionaries like Merriam-Webster) is a collection of data. Most commonly a data set
corresponds to the contents of a single database table, or a single statistical data matrix,
where every column of the table represents a particular variable, and each row corresponds to
a given member of the data set in question. The data set lists values for each of the variables,
such as height and weight of an object, for each member of the data set. Each value is known
as a datum. The data set may comprise data for one or more members, corresponding to the
number of rows. The term data set may also be used more loosely, to refer to the data in a
collection of closely related tables, corresponding to a particular experiment or event. In this
module, we can upload the datasets which year, month, day, hour, minutes, latitude and
longitude values
PREPROCESSING
Data pre-processing is an important step in the [data mining] process. The

phrase "garbage in, garbage out" is particularly applicable to data mining and machine
learning projects. Data-gathering methods are often loosely controlled, resulting in out-of-
range values, impossible data combinations, missing values, etc. Analyzing data that has not
been carefully screened for such problems can produce misleading results. Thus, the
representation and quality of data is first and foremost before running an analysis. If there is
much irrelevant and redundant information present or noisy and unreliable data,
then knowledge discovery during the training phase is more difficult. Data preparation and
filtering steps can take considerable amount of processing time. In this module, we can
eliminate the irrelevant values and also estimate the missing values of data. Finally provide
structured datasets.
FEATURES SELECTION
Feature selection refers to the process of reducing the inputs for processing and
analysis, or of finding the most meaningful inputs. A related term, feature engineering
(or feature extraction), refers to the process of extracting useful information or features from
existing data. Filter feature selection methods apply a statistical measure to assign a scoring
to each feature. The features are ranked by the score and either selected to be kept or removed
from the dataset. The methods are often uni-variate and consider the feature independently, or
with regard to the dependent variable. It can be used to construct the multiple crimes. In this
module, select the multiple features from uploaded datasets. And train the datasets with
multiple crime type’s murder, violence, abuse, vehicle thefts and so on.
CLASSIFICATION
In this module implement classification algorithm to predict the crime types and using
machine learning algorithm such as Support Vector Machine algorithm to predict the crimes.
A Support Vector Machine (SVM) is a feed forward machine learning model that maps sets
of input data onto a set of appropriate outputs. It (SVM) consists of multiple layers of nodes
in a directed graph, and each layer is fully connected to the next one. Each node is a neuron
with a nonlinear activation function except for the input nodes. SVM utilizes a supervised
learning technique called back propagation for training the network. SVM is a modified form
of the standard linear Perceptron and can distinguish data that are not linearly separable.
6. SYSTEM DESIGN
6.1 SYSTEM ARCHITECTURE
A system architecture or systems architecture is the conceptual model that defines the
structure, behavior, and more views of a system. An architecture description is a formal
description and representation of a system, organized in a way that supports reasoning about
the structures and behaviors of the system. System architecture can comprise system
components, the externally visible properties of those components, the relationships (e.g. the
behavior) between them. It can provide a plan from which products can be procured, and
systems developed, that will work together to implement the overall system. There have been
efforts to formalize languages to describe system architecture; collectively these are called
architecture description languages (ADLs).
Various organizations define systems architecture in different ways, including:
 An allocated arrangement of physical elements which provides the design solution for
a consumer product or life-cycle process intended to satisfy the requirements of the
functional architecture and the requirements baseline.
 Architecture comprises the most important, pervasive, top-level, strategic inventions,
decisions, and their associated rationales about the overall structure (i.e., essential
elements and their relationships) and associated characteristics and behavior.
 If documented, it may include information such as a detailed inventory of current
hardware, software and networking capabilities; a description of long-range plans and
priorities for future purchases, and a plan for upgrading and/or replacing dated
equipment and software
 The composite of the design architectures for products and their life-cycle processes
6.2 DATA FLOW DIAGRAM
A data flow diagram shows the way information flows through a process or system. It
includes data inputs and outputs, data stores, and the various sub processes the data moves
through. DFDs are built using standardized symbols and notation to describe various entities
and their relationships. Data flow diagrams visually represent systems and processes that
would be hard to describe in a chunk of text. You can use these diagrams to map out an
existing system and make it better or to plan out a new system for implementation.
Visualizing each element makes it easy to identify inefficiencies and produce the best
possible system.
LEVEL 0
The Level 0 DFD shows how the system is divided into 'sub-systems' (processes),
each of which deals with one or more of the data flows to or from an external agent, and
which together provide all of the functionality of the system as a whole. It also identifies
internal data stores that must be present in order for the system to do its job, and shows the
flow of data between the various parts of the system.
LEVEL-1
The next stage is to create the Level 1 Data Flow Diagram. This highlights the main
functions carried out by the system. As a rule, to describe the system was using between two
and seven functions - two being a simple system and seven being a complicated system. This
enables us to keep the model manageable on screen or paper
LEVEL-2
A Data Flow Diagram (DFD) tracks processes and their data paths within the business
or system boundary under investigation. A DFD defines each domain boundary and
illustrates the logical movement and transformation of data within the defined boundary. The
diagram shows 'what' input data enters the domain, 'what' logical processes the domain
applies to that data, and 'what' output data leaves the domain. Essentially, a DFD is a tool for
process modeling and one of the oldest.
LEVEL-3
A data flow diagram (DFD) is a graphical representation of the flow of data through
an information system. A DFD shows the flow of data from data sources and data stores to
processes and from processes to data stores and data sinks. DFDs are used for modelling and
analyzing the flow of data in data processing systems, and are usually accompanied by a data
dictionary, an entity-relationship model, and a number of process descriptions
6.3 UML DIAGRM
6.3.1 CLASS DIAGRAM
A Class diagram is the main building block of object-oriented modeling. It is used for
general conceptual modeling of the structure of the application and for detailed modelling
translating the models into programming code. Class diagram can also be used for data
modelling. The classes in a class diagram represent both the main elements, interactions in
the application and the classes to be programmed.
System
Datasets Acquisition
Preprocessing
Features Selection
User
Rules construction
Classification
Crime type prediction

6.3.2 SEQUENCE DIAGRAM
A Sequence diagram is an interaction diagram that shows how processes operate with
one another and in what order. It is a construct of a Message Sequence Chart. A sequence
diagram shows object interactions arranged in time sequence. Sequence diagram is
sometimes called event trace diagrams, event scenarios and timing diagrams. A sequence
diagram shows, as parallel vertical lines, different processes that live simultaneously and
horizontal arrows. The messages exchanged between them. Sequence diagram has three
objects. The connection between the objects is mentioned using stimulus and self-stimulus.
Datasets Acquisition Preprocessing Rules construction

Classification
1 : Upload the crime data()
2 : Missing value estimation()
3 : Irrelevant data removal()
4 : Features selection()
5 : MLP algorithm()
6 : Crime prediction()
6.3.3 COLLOBRATION DIAGRAM
A collaboration diagram resembles a flowchart that portrays the roles, functionality

and behavior of individual objects as well as the overall operation of the system in real time.
Objects are shown as rectangles with naming labels inside. These labels are preceded by
colons and may be underlined. The relationships between the objects are shown as lines
connecting the rectangles. The messages between objects are shown as arrows connecting the
relevant rectangles along with labels that define the message sequencing. Collaboration
diagram show the collaborative connections between three objects like sender, receiver and
server. Collaborative has self-stimulus and also connection between two objects.
6.3.4 ACTIVITY DIAGRAM
Activity diagrams are graphical representations of workflows of stepwise activities

and action with support for choice, iteration and concurrency. In the Unified Modeling
Language, activity diagrams are intended to model both computational and organizational
processes Activity diagrams show the overall flow of control. Activity diagram has initial and
final state. Then activities are mentioned between the states.
Dataset upload
Preprocessing
Rules construction
Classification
Crime type prediction

7. SOFTWARE DESCRIPTION
7.1 FRONT END
Python is an interpreted high-level programming language for general-purpose

programming. Created by Guido van Rossum and first released in 1991, Python has a design
philosophy that emphasizes code readability, notably using significant whitespace. It provides
constructs that enable clear programming on both small and large scales. In July 2018, Van
Rossum stepped down as the leader in the language community. Python features a dynamic
type system and automatic memory management. It supports multiple programming
paradigms, including object-oriented, imperative, functional and procedural, and has a large
and comprehensive standard library. Python interpreters are available for many operating
systems. CPython, the reference implementation of Python, is open source software and has a
community-based development model, as do nearly all of Python's other implementations.
Python and CPython are managed by the non-profit Python Software Foundation. Rather than
having all of its functionality built into its core, Python was designed to be highly extensible.
This compact modularity has made it particularly popular as a means of adding
programmable interfaces to existing applications. Van Rossum's vision of a small core
language with a large standard library and easily extensible interpreter stemmed from his
frustrations with ABC, which espoused the opposite approach. While offering choice in
coding methodology, the Python philosophy rejects exuberant syntax (such as that of Perl) in
favor of a simpler, less-cluttered grammar. As Alex Martelli put it: "To describe something as
'clever' is not considered a compliment in the Python culture."Python's philosophy rejects the
Perl "there is more than one way to do it" approach to language design in favour of "there
should be one—and preferably only one—obvious way to do it".
Python's developers strive to avoid premature optimization, and reject patches to non-
critical parts of CPython that would offer marginal increases in speed at the cost of clarity.
[ When speed is important, a Python programmer can move time-critical functions to
extension modules written in languages such as C, or use PyPy, a just-in-time compiler.
CPython is also available, which translates a Python script into C and makes direct C-level
API calls into the Python interpreter. An important goal of Python's developers is keeping it
fun to use. This is reflected in the language's name a tribute to the British comedy group
Monty Python and in occasionally playful approaches to tutorials and reference materials,
such as examples that refer to spam and eggs (from a famous Monty Python sketch) instead
of the standard for and bar.
A common neologism in the Python community is pythonic, which can have a wide
range of meanings related to program style. To say that code is pythonic is to say that it uses
Python idioms well, that it is natural or shows fluency in the language, that it conforms with
Python's minimalist philosophy and emphasis on readability. In contrast, code that is difficult
to understand or reads like a rough transcription from another programming language is
called unpythonic. Users and admirers of Python, especially those considered knowledgeable
or experienced, are often referred to as Pythonists, Pythonistas, and Pythoneers. Python is an
interpreted, object-oriented, high-level programming language with dynamic semantics. Its
high-level built in data structures, combined with dynamic typing and dynamic binding, make
it very attractive for Rapid Application Development, as well as for use as a scripting or glue
language to connect existing components together. Python's simple, easy to learn syntax
emphasizes readability and therefore reduces the cost of program maintenance. Python
supports modules and packages, which encourages program modularity and code reuse. The
Python interpreter and the extensive standard library are available in source or binary form
without charge for all major platforms, and can be freely distributed. Often, programmers fall
in love with Python because of the increased productivity it provides. Since there is no
compilation step, the edit-test-debug cycle is incredibly fast. Debugging Python programs is
easy: a bug or bad input will never cause a segmentation fault. Instead, when the interpreter
discovers an error, it raises an exception. When the program doesn't catch the exception, the
interpreter prints a stack trace. A source level debugger allows inspection of local and global
variables, evaluation of arbitrary expressions, setting breakpoints, stepping through the code a
line at a time, and so on. The debugger is written in Python itself, testifying to Python's
introspective power. On the other hand, often the quickest way to debug a program is to add a
few print statements to the source: the fast edit-test-debug cycle makes this simple approach
very effective.
Python’s initial development was spearheaded by Guido van Rossum in the late
1980s. Today, it is developed by the Python Software Foundation. Because Python is a
multiparadigm language, Python programmers can accomplish their tasks using different
styles of programming: object oriented, imperative, functional or reflective. Python can be
used in Web development, numeric programming, game development, serial port access and
more.
There are two attributes that make development time in Python faster than in other
programming languages:
1. Python is an interpreted language, which precludes the need to compile code before
executing a program because Python does the compilation in the background. Because
Python is a high-level programming language, it abstracts many sophisticated details
from the programming code. Python focuses so much on this abstraction that its code
can be understood by most novice programmers.
2. Python code tends to be shorter than comparable codes. Although Python offers fast
development times, it lags slightly in terms of execution time. Compared to fully
compiling languages like C and C++, Python programs execute slower. Of course,
with the processing speeds of computers these days, the speed differences are usually
only observed in benchmarking tests, not in real-world operations. In most cases,
Python is already included in Linux distributions and Mac OS X machines.
7.2 BACK END
MySQL is the world's most used open source relational database management
system (RDBMS) as of 2008 that run as a server providing multi-user access to a number of
databases. The MySQL development project has made its source code available under the
terms of the GNU General Public License, as well as under a variety of proprietary
agreements. MySQL was owned and sponsored by a single for-profit firm, the Swedish
company MySQL AB, now owned by Oracle Corporation.
MySQL is a popular choice of database for use in web applications, and is a central
component of the widely used LAMP open source web application software stack—LAMP is
an acronym for "Linux, Apache, MySQL, Perl/PHP/Python." Free-software-open source
projects that require a full-featured database management system often use MySQL.For
commercial use, several paid editions are available, and offer additional functionality.
Applications which use MySQL databases include: TYPO3, Joomla, Word Press, phpBB,
MyBB, Drupal and other software built on the LAMP software stack. MySQL is also used in
many high-profile, large-scale World Wide Web products, including Wikipedia,
Google(though not for searches), ImagebookTwitter, Flickr, Nokia.com, and YouTube.
Inter images
MySQL is primarily an RDBMS and ships with no GUI tools to administer MySQL
databases or manage data contained within the databases. Users may use the included
command line tools, or use MySQL "front-ends", desktop software and web applications that
create and manage MySQL databases, build database structures, back up data, inspect status,
and work with data records. The official set of MySQL front-end tools, MySQL Workbench
is actively developed by Oracle, and is freely available for use.
Graphical
The official MySQL Workbench is a free integrated environment developed by

MySQL AB that enables users to graphically administer MySQL databases and visually
design database structures. MySQL Workbench replaces the previous package of software,
MySQL GUI Tools. Similar to other third-party packages, but still considered the
authoritative MySQL frontend, MySQL Workbench lets users manage database design &
modeling, SQL development (replacing MySQL Query Browser) and Database
administration (replacing MySQL Administrator).MySQL Workbench is available in two
editions, the regular free and open source Community Edition which may be downloaded
from the MySQL website, and the proprietary Standard Edition which extends and improves
the feature set of the Community Edition.
8. SYSTEM TESTING
8.1 TESTING PROCESS
Testing is a set activity that can be planned and conducted systematically. Testing
begins at the module level and work towards the integration of entire computers based
system. Nothing is complete without testing, as it is vital success of the system.
Testing Objectives:
There are several rules that can serve as testing objectives, they are
1. Testing is a process of executing a program with the intent of finding

an error
2. A good test case is one that has high probability of finding an

undiscovered error.
3. A successful test is one that uncovers an undiscovered error.
If testing is conducted successfully according to the objectives as stated above, it

would uncover errors in the software. Also testing demonstrates that software functions
appear to the working according to the specification, that performance requirements appear to
have been met.
There are three ways to test a program
1. For Correctness
2. For Implementation efficiency
3. For Computational Complexity.
Tests used for implementation efficiency attempt to find ways to make a correct
program faster or use less storage. It is a code-refining process, which reexamines the
implementation phase of algorithm development. Tests for computational complexity amount
to an experimental analysis of the complexity of an algorithm or an experimental comparison
of two or more algorithms, which solve the same problem. The data is entered in all forms
separately and whenever an error occurred, it is corrected immediately. A quality team
deputed by the management verified all the necessary documents and tested the Software
while entering the data at all levels.
8.2 TYPES OF TESTING
Unit Testing
The first test in the development process is the unit test. The source code is normally
divided into modules, which in turn are divided into smaller units called units. These units
have specific behavior. The test done on these units of code is called unit test. Unit test
depends upon the language on which the project is developed. Unit tests ensure that each
unique path of the project performs accurately to the documented specifications and contains
clearly defined inputs and expected results.
Integration Testing
In integration testing modules are combined and tested as a group. Modules are
typically code modules, individual applications, source and destination applications on a
network, etc. Integration Testing follows unit testing and precedes system testing. Testing
after the product is code complete. Betas are often widely distributed or even distributed to
the public at large in hopes that they will buy the final product when it is released.
System Testing
System testing is defined as testing of a complete and fully integrated software

product. This testing falls in black-box testing wherein knowledge of the inner design of the
code is not a pre-requisite and is done by the testing team. It is the final test to verify that the
product to be delivered meets the specifications mentioned in the requirement document. It
should investigate both functional and non-functional requirements
Validation Testing
The process of evaluating software during the development process or at the end of
the development process to determine whether it satisfies specified business requirements.
Validation Testing ensures that the product actually meets the client's needs. It can also be
defined as to demonstrate that the product fulfills its intended use when deployed on
appropriate environment
9. CONCLUSION AND FUTURE ENHANCEMENT
9.1 CONCLUSION
In this project the problem of constraining and summarizing different algorithms of

data mining used in the field of crime prediction are discussed. The focus is on using
different algorithms and combinations of several target attributes for intelligent and effective
crime prediction using data mining. Data mining technology provides an important means for
extracting valuable rules hidden in crime data and acts as an important role in prediction and
law enforcements. There is an increasing interest in using classification to identify crime
which is present or not. In the current study, have demonstrated, using a large sample of
crime records with classification. Classification algorithm is very sensitive to noisy data. If
any noisy data is present then it causes very serious problems regarding to the processing
power of classification. It not only slows down the task of classification algorithm but also
degrades its performance. Hence, before applying classification algorithm it must be
necessary to remove all those attributes from datasets who later on acts as noisy attributes. In
this research work, we can implement preprocessing steps and implemented the classification
rule algorithms namely Support Vector Machine are used for classifying datasets which are
uploaded by user. By analyzing the experimental results it is observed that the Support
Vector Machine technique has yields better result than other techniques
9.2 FUTURE ENHANCEMENT

In future we tend to improve efficiency of performance by applying other data mining
techniques and algorithms
10. APPENDICES
APPENDIX 1 SOURCE CODE
from flask import Flask, render_template, flash, request, session,send_file
from flask import render_template, redirect, url_for, request
#from wtforms import Form, TextField, TextAreaField, validators, StringField, SubmitField
importdatetime
import sys
import pickle
importmysql.connector
importnumpy as np
app = Flask(__name__)
app.config['DEBUG']
app.config['SECRET_KEY'] = '7d441f27d441f27567d441f2b6176a'
@app.route("/")
def homepage():
returnrender_template('index.html')
@app.route("/AdminLogin")
defAdminLogin():
returnrender_template('AdminLogin.html')
@app.route("/NewQueryReg")
defNewQueryReg():
returnrender_template('NewQueryReg.html')
@app.route("/UploadDataset")
defUploadDataset():
returnrender_template('ViewExcel.html')
@app.route("/AdminHome")
defAdminHome():
returnrender_template('AdminHome.html')
@app.route("/WeatherInfo")
defWeatherInfo():
conn = mysql.connector.connect(user='root', password='', host='localhost',

database='1floodtb')
cursor = conn.cursor()
cur = conn.cursor()
cur.execute("SELECT * FROM regtb")
data = cur.fetchall()
returnrender_template('WeatherInfo.html',data=data)
@app.route("/adminlogin", methods=['GET', 'POST'])
defadminlogin():
error = None
ifrequest.method == 'POST':
ifrequest.form['uname'] == 'admin' or request.form['password'] == 'admin':
returnrender_template('AdminHome.html' )
else:
returnrender_template('index.html', error=error)
@app.route("/newquery", methods=['GET', 'POST'])
defnewquery():
t1 = request.form['t1']
filename2 ="Model/Crime-prediction-rfc-model.pkl"
classifier2 = pickle.load(open(filename2, 'rb'))
data = np.array([[t1, t2, t3, t4,t5,t6,t7]])
my_prediction = classifier2.predict(data)
print(my_prediction[0])
if (my_prediction) == 0:
Predict = 'Murder'
elif(my_prediction == 1):
Predict = 'violence'
elif (my_prediction == 2):
Predict = 'ChildAbusing'
Predict = 'Offence Against a Person'
Predict = 'Mischief'
Predict = 'TheftVehicle'
Predict = 'Accident'
returnrender_template('NewQueryReg.html', Predict=Predict)
@app.route("/excelpost", methods=['GET', 'POST'])
defuploadassign():
file = request.files['fileupload']
file_extension = file.filename.split('.')[1]
print(file_extension)
#file.save("static/upload/" + secure_filename(file.filename))
import pandas as pd
importmatplotlib.pyplot as plt
df = ''
iffile_extension == 'xlsx':
df = pd.read_excel(file.read(), engine='openpyxl')
eliffile_extension == 'xls':
df = pd.read_excel(file.read())
eliffile_extension == 'csv':
df = pd.read_csv(file)
importseaborn as sns
sns.countplot(df['TYPE'], label="Count")
plt.savefig('static/images/out.jpg')
iimg = 'static/images/out.jpg'
print(df)
# import pandas as pd
importmatplotlib.pyplot as plt
# read-in data
# data = pd.read_csv('./test.csv', sep='\t') #adjust sep to your needs
importseaborn as sns
sns.countplot(df['TYPE'], label="Count")
plt.show()
df.TYPE = df.TYPE.map({'Murder': 0,
'violence': 1,
'ChildAbusing': 2,
'Offence Against a Person': 3,
'Mischief': 4,
'TheftVehicle': 5,
'Accident': 6
})
defclean_dataset(df):
assertisinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
df.dropna(inplace=True)
indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
returndf[indices_to_keep].astype(np.float64)
df = clean_dataset(df)
# Replacing the 0 values from ['Glucose','BloodPressure','SkinThickness','Insulin','BMI'] by

NaN
df_copy = df.copy(deep=True)
df_copy[['YEAR', 'MONTH', 'DAY', 'HOUR', 'MINUTE', 'X', 'Y']] = df_copy[
['YEAR', 'MONTH', 'DAY', 'HOUR', 'MINUTE', 'X', 'Y']].replace(0, np.NaN)
# Model Building
fromsklearn.model_selection import train_test_split
# df.drop(df.columns[np.isnan(df).any()], axis=1)
X = df.drop(columns='TYPE')
y = df['TYPE']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=3)

fromsklearn.neural_network import MLPClassifier
fromsklearn.metrics import classification_report
classifier = MLPClassifier(random_state=3)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
print(classification_report(y_test, y_pred))
filename = 'Model/Crime-prediction-rfc-model.pkl'
pickle.dump(classifier, open(filename, 'wb'))
print("Training process is complete Model File Saved!")
df= df.head(300)
returnrender_template('ViewExcel.html', data=df.to_html(), dataimg=iimg )
if __name__ == '__main__':
app.run(debug=True, use_reloader=
APPENDIX 2 SCREENSHOTS
TYPE YEAR MONTH ... MINUTE X Y
0 Murder 2003 5 ... 15.0 493906.50 5457452.47
1 Murder 2003 5 ... 20.0 493906.50 5457452.47
2 Murder 2003 4 ... 40.0 493906.50 5457452.47
3 Murder 2003 4 ... 15.0 493906.50 5457452.47
4 Murder 2003 4 ... 45.0 493906.50 5457452.47
... ... ... ... ... ... ... ...
530647 violence 2017 3 ... 16.0 497265.49 5458296.71
530648 Mischief 2017 5 ... 30.0 494533.97 5456824.97
530649 Offence Against a Person 2017 4 ... NaN 0.00 0.00
530650 TheftVehicle 2017 6 ... 0.0 491487.85 5458385.78
530651 Accident 2017 6 ... 38.0 490204.00 5451444.00

[530652 rows x 8 columns]
accuracy 0.50 95258
macroavg 0.08 0.17 0.11 95258
weightedavg 0.25 0.50 0.33 95258

REFERENCES
BOOK REFERENCES
1. Heinold, Brian. "A practical introduction to Python programming." (2021).

2. Kneusel, Ronald T. Practical deep learning: A Python-based introduction. No Starch
Press, 2021.
3. Dhruv, Akshit J., Reema Patel, and NishantDoshi. "Python: the most advanced
programming language for computer science applications." Science and Technology
Publications, Lda (2021): 292-299.
4. Sundnes, Joakim. Introduction to scientific programming with Python. Springer
Nature, 2020.
5. Hill, Christian. Learning scientific programming with Python. Cambridge University
Press, 2020.
WEBSITE REFERENCES
1. https://medium.com/javarevisited/10-free-python-tutorials-and-courses-from-google-
microsoft-and-coursera-for-beginners-96b9ad20b4e6
2. https://www.bestcolleges.com/bootcamps/guides/learn-python-free/
3. https://www.programiz.com/python-programming
4. https://realpython.com/
5. https://www.codecademy.com/learn/learn-python

Fulldoc - Dsec Mca - Crime Prediction

Uploaded by

Copyright:

Available Formats

Fulldoc - Dsec Mca - Crime Prediction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fulldoc - Dsec Mca - Crime Prediction

Uploaded by

Copyright:

Available Formats

ANALYSIS AND ANTICIPATING OF HIGH RISK USING AI TECHNIQUES

1.1 DOMAIN INSTRUCTION

HOW DOES IT WORK?

TYPES OF MACHINE LEARNING

Classification is a process of categorizing a given set of data into classes, It can be

Classification Terminologies in Machine Learning

Random Forest Regression - In random forest regression, we ensemble the predictions of

Unsupervised learning is a learning method in which a machine learns without any

APPLICATIONS OF CLUSTERING IN DIFFERENT FIELDS

 Marketing - It can be used to characterize & discover customer segments for

Applications of Association Rule Learning

Reinforcement learning is a feedback-based learning method, in which a learning

RL can be used in large environments in the following situations:

 A model of the environment is known, but an analytic solution is not available;

Applications of Reinforcement Learning –

IMAGE RECOGNITION - Image recognition is one of the most common applications of

SPEECH RECOGNITION - At present, machine learning algorithms are widely used by

PRODUCT RECOMMENDATION - Machine learning is widely used by various e-

EMAIL SPAM AND MALWARE FILTERING - Whenever we receive a new email, it is

ONLINE FRAUD DETECTION:

2.1 TITLE: SURVEY OF ANALYSIS OF CRIME DETECTION TECHNIQUES

AUTHOR: SARAVANAN, P., ET AL

This paper implemented a data mining procedure is to analyze data from an

AUTHOR: Brayne, Sarah, and Angèle Christin

AUTHOR: Krysovatyy, Andriy

This paper implemented method of detecting a fictitious enterprise based on the

3.1 EXISTING SYSTEM

 Biased or incomplete historical crime data can lead to biased predictions.

Crime is a complex social issue impacting a considerable number of individuals

 Predict the hotspot accurately

4.1 HARDWARE REQUIREMENTS

 Processor : Dual core processor 2.6.0 GHZ

4.2 SOFTWARE REQUIREMENTS

 Operating system : Windows OS

5.2 MODULE DESCRIPTION

Data pre-processing is an important step in the [data mining] process. The

6.1 SYSTEM ARCHITECTURE

Various organizations define systems architecture in different ways, including:

Crime type prediction

Datasets Acquisition Preprocessing Rules construction

1 : Upload the crime data()

2 : Missing value estimation()

3 : Irrelevant data removal()

A collaboration diagram resembles a flowchart that portrays the roles, functionality

Activity diagrams are graphical representations of workflows of stepwise activities

Crime type prediction

7.1 FRONT END

Python is an interpreted high-level programming language for general-purpose

The official MySQL Workbench is a free integrated environment developed by

8.1 TESTING PROCESS

1. Testing is a process of executing a program with the intent of finding

2. A good test case is one that has high probability of finding an

3. A successful test is one that uncovers an undiscovered error.

If testing is conducted successfully according to the objectives as stated above, it

There are three ways to test a program

2. For Implementation efficiency

3. For Computational Complexity.

8.2 TYPES OF TESTING

System testing is defined as testing of a complete and fully integrated software