Data Mining

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

Data Mining

Presented by:
Dr Shahbaz Khan
Assistant Professor
Institute of Business Management
DATA MINING

• Extraction of interesting
knowledge (rules, regularities,
patterns, constraints) from
data in large databases
• Extraction of interesting (non-
trivial, implicit, previously
unknown and potentially
useful) information or patterns
from data in large databases
DATA MINING

Data mining is an automatic or semi-automatic technical


process that analyses large amounts of scattered
information to make sense of it and turn it into
knowledge.

It looks for anomalies, patterns or correlations among


millions of records to predict results
DATA MINING

Data mining has opened a world of possibilities for business.

This field of computational statistics compares millions of


isolated pieces of data and is used by companies to detect and
predict consumer behaviour.

Its objective is to generate new market opportunities.


Importance of Data Mining

In the past, we were only able to analyze what a


company’s customers or clients HAD DONE, but now,
with the help of Data Mining, we can predict what
client WILL DO.

Can make data analysis more accessible to end users

Results can be easier to interpret Strong focus on decisions and


than e.g., regression models their implementation
Importance of Data Mining
• DM Information can help to
• Increase return on investment (ROI),
• Improve CRM and market analysis,
• Reduce marketing campaign costs
• Facilitate fraud detection and customer retention
• Predict future trends
• Customer purchase habits
• Market basket analysis
Knowledge Discovery in
Databases Process
Data Mining: A KDD
Process
Pattern Evaluation
– Data mining: the core
of knowledge
discovery process. Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

14
Steps of a KDD Process
• Data cleaning: also known as data cleansing, it is a
phase in which noise data and irrelevant data are
removed from the collection.
• Data integration: at this stage, multiple data sources,
often heterogeneous, may be combined in a common
source.
• Data selection: at this step, the data relevant to the
analysis is decided on and retrieved from the data
collection.
• Data transformation: also known as data consolidation,
it is a phase in which the selected data is transformed
into forms appropriate for the mining procedure.
Steps of a KDD Process
• Data mining: it is the crucial step in which clever
techniques are applied to patterns potentially useful.
• Pattern evaluation: in this step, strictly interesting
patterns representing knowledge are identified based
on given measures.
• Knowledge representation: is the final phase in
which the discovered knowledge is visually
represented to the user. This essential step uses
visualization techniques to help users understand and
interpret the data mining results.
Data Mining Technique
• The generative aspect of data mining consists
of the building of a model from data.
• Each data mining technique can perform one
or more of the following types of data
modelling:
• Association;
• Classification;
• Clustering;
• Forecasting
• Regression;
Association
• Association aims to establishing
relationships between items which exist
together in a given database.
• It is intended to identify strong rules
discovered in database using different
measures.
• Algorithms:
• Apriori Algorithm
• Frequent-Pattern (FP) Tree
• FP-Growth Algorithm
Association
Classification
• Classification is one of the most common learning models
in data mining.
• It aims at building a model to predict future through
classifying database records into a number of predefined
classes based on certain criteria.
• It represents the largest part of problems to which data
mining is applied today creating models to predict class
membership.
Classification
Clustering
• Clustering is the task of grouping a set of objects in a
such a way that objects in the same group(cluster) are
similar to each other than to those in other clusters.
• Alogrithms:
• K-Means
• Mini-Batch K-Means
• Mean Shift
• OPTICS
• Hierarchical clustering
Clustering
Forecasting
• Forecasting estimates the future value based on a record’s
patterns.
• It deals with continuously valued outcomes.
• It relates to modelling and the logical relationships of the
model at some time in the future.
• Algorithms:
• Regression
• Autoregressive (AR)
• Autoregressive Integrated Moving Average (ARIMA)
• Seasonal Autoregressive Integrated Moving Average (SARIMA)
Forecasting
Thank You!

You might also like