Data Mining
Data Mining
Data Mining
Presented by:
Dr Shahbaz Khan
Assistant Professor
Institute of Business Management
DATA MINING
• Extraction of interesting
knowledge (rules, regularities,
patterns, constraints) from
data in large databases
• Extraction of interesting (non-
trivial, implicit, previously
unknown and potentially
useful) information or patterns
from data in large databases
DATA MINING
Task-relevant Data
Data Cleaning
Data Integration
14
Steps of a KDD Process
• Data cleaning: also known as data cleansing, it is a
phase in which noise data and irrelevant data are
removed from the collection.
• Data integration: at this stage, multiple data sources,
often heterogeneous, may be combined in a common
source.
• Data selection: at this step, the data relevant to the
analysis is decided on and retrieved from the data
collection.
• Data transformation: also known as data consolidation,
it is a phase in which the selected data is transformed
into forms appropriate for the mining procedure.
Steps of a KDD Process
• Data mining: it is the crucial step in which clever
techniques are applied to patterns potentially useful.
• Pattern evaluation: in this step, strictly interesting
patterns representing knowledge are identified based
on given measures.
• Knowledge representation: is the final phase in
which the discovered knowledge is visually
represented to the user. This essential step uses
visualization techniques to help users understand and
interpret the data mining results.
Data Mining Technique
• The generative aspect of data mining consists
of the building of a model from data.
• Each data mining technique can perform one
or more of the following types of data
modelling:
• Association;
• Classification;
• Clustering;
• Forecasting
• Regression;
Association
• Association aims to establishing
relationships between items which exist
together in a given database.
• It is intended to identify strong rules
discovered in database using different
measures.
• Algorithms:
• Apriori Algorithm
• Frequent-Pattern (FP) Tree
• FP-Growth Algorithm
Association
Classification
• Classification is one of the most common learning models
in data mining.
• It aims at building a model to predict future through
classifying database records into a number of predefined
classes based on certain criteria.
• It represents the largest part of problems to which data
mining is applied today creating models to predict class
membership.
Classification
Clustering
• Clustering is the task of grouping a set of objects in a
such a way that objects in the same group(cluster) are
similar to each other than to those in other clusters.
• Alogrithms:
• K-Means
• Mini-Batch K-Means
• Mean Shift
• OPTICS
• Hierarchical clustering
Clustering
Forecasting
• Forecasting estimates the future value based on a record’s
patterns.
• It deals with continuously valued outcomes.
• It relates to modelling and the logical relationships of the
model at some time in the future.
• Algorithms:
• Regression
• Autoregressive (AR)
• Autoregressive Integrated Moving Average (ARIMA)
• Seasonal Autoregressive Integrated Moving Average (SARIMA)
Forecasting
Thank You!