0% found this document useful (0 votes)
211 views51 pages

Introduction To Data Mining For Business Analytics

This document provides an introduction to data mining in business analytics. It discusses key concepts like business intelligence, the data mining process, common data mining techniques, and how data mining informs business analytics. The six steps of the data mining process are outlined as business understanding, data understanding, data preparation, modeling, evaluation, and deployment. Classification, prediction, association rules, and predictive analytics are presented as core data mining techniques.

Uploaded by

Sherwin Lopez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
211 views51 pages

Introduction To Data Mining For Business Analytics

This document provides an introduction to data mining in business analytics. It discusses key concepts like business intelligence, the data mining process, common data mining techniques, and how data mining informs business analytics. The six steps of the data mining process are outlined as business understanding, data understanding, data preparation, modeling, evaluation, and deployment. Classification, prediction, association rules, and predictive analytics are presented as core data mining techniques.

Uploaded by

Sherwin Lopez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

MODULE 1

INTRODUCTION TO DATA MINING


IN BUSINESS ANALYTICS
At the end of the topic, the learner should be able to:
• Learn and understand the importance of data mining in
business analytics
• Learn the different terminologies in data mining for business
analytics
• Understand the reasons why there are so many different
methods in data mining
• Identify and understand the different steps in data mining
• Understand the modeling process in dealing with data mining
• Business Analytics is the practice and art of bringing
quantitative data to bear on decision-making. The term
means different things to different organizations.
• Business Analytics, or more generically, analytics, include
a range of data analysis methods. Many powerful
applications involve little more than counting, rule-
checking, and basic arithmetic
• Business Intelligence (BI), refers to data visualization and
reporting for understanding “what happened and what is
happening.”
• BI, which earlier consisted mainly of generating static
reports, has evolved into more user-friendly and effective
tools and practices, such as creating interactive
dashboards that allow the user not only to access real-
time data, but also to directly interact with it.
Beware the organizational setting where
analytics is a solution in search of a problem
• Business Understanding
• Data Understanding
• Data Preparation
• Modeling
• Evaluation
• Deployment
The first step to successful data mining is to understand the
overall objectives of the business, then be able to convert this
into a data mining problem and a plan. Without an
understanding of the ultimate goal of the business, you won’t
be able to design a good data mining algorithm.
After you know what the business is looking for, it’s time to
collect data. There are many complex ways that data can be
obtained from an organization, organized, stored, and
managed. Data mining involves getting familiar with the data,
identifying any issues, getting insights, or observing subsets.
Data preparation involves getting the information production
ready. This is the biggest part of data mining. It is taking the
computer-language data, and converting it into a form that
people can understand and quantify.
In the modeling phase, mathematical models are used to
search for patterns in the data. There are usually several
techniques that can be used for the same set of data. There
is a lot of trial and error involved in modeling.
When the model is complete, it needs to be carefully
evaluated and the steps to make the model need to be
reviewed, to ensure it meets the business objectives. At the
end of this phase, a decision about the data mining results
will be made.
This can be a simple or complex part of data mining,
depending on the output of the process. It can be as simple
as generating a report, or as complex as creating a
repeatable data mining process to happen regularly.
How does data mining inform business
analytics?
• Classification. This data mining technique is more complex, using attributes of
data to move them into discernable categories, helping you draw further
conclusions.
• Clustering. This technique is very similar to classification, chunking data
together based on their similarities. Cluster groups are less structured than
classification groups, making it a more simple option for data mining.
• Associate Rules. Association in data mining is all about tracking patterns,
specifically based on linked variables.
• Regression Analysis. Regression is used to plan and model, identifying the
likelihood of a specific variable.
• Anomaly/outlier detection. For many data mining cases, just seeing the
overarching pattern might not be all you need. Data needs to be able to identify
and understand the outliers in your data as well.
• DataMelt. Performs mathematics, statistics, calculations, data analysis, and
visualization. Many scripting languages and Java packages are available in this
system.
• ELKI Data Mining Framework. Focuses on algorithms with a specific emphasis
on unsupervised cluster and outlier systems. ELKI is designed to be easy for
researchers, students, and business organizations to use
• Orange Data Mining. Helps organizations do simple data analysis and use top
visualization and graphics. Heatmaps, hierarchical clustering, decision trees, and
more are used in this process.
• The R Project for Statistical Computing. Used in statistical modeling and
graphics and is utilized on many operating systems and programs
• Rattle GUI. Presents statistical and visual summaries of data, helps prepare it to
be modeled, and utilizes supervised and unsupervised machine learning to
present the information.
Big Data is a relative term—data today are big by reference
to the past, and to the methods and devices available to deal
with them.
• Volume
• Velocity
• Variety
• Veracity
Data science is a mix of skills in the areas of statistics,
machine learning, math, programming, business, and IT.
Why there are so many different methods and
techniques of data mining for business
analytics?
Predictive analytics, the tasks of classification and
prediction that are becoming key elements of a
“business intelligence" function in most large firms.
CORE IDEAS OF DATA MINING
• Classification
• Prediction
• Association Rules
• Predictive Analytics
• Data Reduction
• Data Exploration
• Data Visualization
A common task in data mining is to examine data where the
classification is unknown or will occur in the future, with the goal
of predicting what that classification is or will be. Similar data
where the classification is known are used to develop rules,
which are then applied to the data with the unknown
classification.
• The recipient of an offer can respond or not respond.
• An applicant for a loan can repay on time, repay late, or
declare bankruptcy.
• A credit card transaction can be normal or fraudulent.
• A packet of data traveling on a network can be benign or
threatening.
• A bus in a fleet can be available for service or unavailable.
• The victim of an illness can be recovered, still be ill, or be
deceased.
Prediction refers to the value of a continuous variable.
(Sometimes in the data mining literature, the term estimation is
used to refer to the prediction of the value of a continuous
variable, and prediction may be used for both continuous and
categorical data.)
Association rules, or affinity analysis, can then be used in a variety of ways.
For example:
• Grocery stores can use such information after a customer’s purchases
have all been scanned to print discount coupons, where the items being
discounted are determined by mapping the customer’s purchases onto the
association rules.
• Online merchants such as Amazon.com and Netflix.com use these
methods as the heart of a “recommender” system that suggests new
purchases to customers.
Classification, prediction, and to some extent, affinity analysis
constitute the analytical methods employed in predictive
analytics.
Classification, prediction, and to some extent, affinity analysis
constitute the analytical methods employed in predictive
analytics.
• Sensible data analysis often requires distillation of complex
data into simpler data. Rather than dealing with thousands of
product types, an analyst might wish to group them into a
smaller number of groups.
• This process of consolidating a large number of variables (or
cases) into a smaller set is termed data reduction.
A full understanding of the data may require a reduction in its scale or
dimension to allow us to see the forest without getting lost in the trees.
Similar variables (i.e., variables that supply similar information) might be
aggregated into a single variable incorporating all the similar variables.
Analogously, records might be aggregated into groups of similar records.

Example:
• an essential part of the job is to review and examine the data to see what
messages they hold, much as a detective might survey a crime scene.
Another technique for exploring data to see what information they hold is
through graphical analysis. This includes looking at each variable separately
as well as looking at relationships between variables.
For numerical variables, we use histograms and boxplots to learn about
the distribution of their values, to detect outliers (extreme observations), and
to find other information that is relevant to the analysis task.
Similarly, for categorical variables we use bar charts. We can also look at
scatterplots of pairs of numerical variables to learn about possible
relationships, the type of relationship, and again, to detect outliers.
SUPERVISED AND
UNSUPERVISED LEARNING
Supervised learning algorithms are those used in which
the value of the outcome of interest (e.g, purchase or no
purchase) is known.

Training Validation
Test data
data data
Traning data are the data from which the classification or
prediction algorithm "learns", or is "trained," about the
relationship between predictor variables and the outcome
variable.
Once the algorithm has learned from the training data, it is
then applied to another sample of data (the validation data)
where the outcome is known, to see how well it does in
comparison to other models.
If many different models are being tried out, it is prudent to
save a third sample of known outcomes (the test data) to
use with the model finally selected to predict how well it will
do.

Simple Linear Regression Analysis


Unsupervised learning algorithms are those used where
there is no outcome variable to predict or classify. Hence,
there is no "learning" from cases where such an outcome
variable is known.

Association Rules
Dimension Reduction Methods
Clustering Techniques
Some of the most serious errors in data analysis
result from a poor understanding of the problem
STEPS IN DATA MINING
• Develop an understanding of the purpose of the data mining project
• Obtain the dataset to be used in the analysis.
• Explore, clean, and preprocess the data.
• Reduce the data, if necessary, and (where supervised training is
involved) separate them into training, validation, and test datasets.
• Determine the data mining task (classification, prediction, clustering,
etc.).
• Choose the data mining techniques to be used (regression, neural
nets, hierarchical clustering, etc.).
1. Develop an understanding of the purpose of the data
mining project (if it is a one-shot effort to answer a
question or questions) or application (if it is an ongoing
procedure)
2. Obtain the dataset to be used in the analysis. This
often involves random sampling from a large database to
capture records to be used in analysis.
3. Explore, clean, and preprocess the data. This involves
verifying that the data are in reasonable condition.
• How should missing data be handles?
• Are the values in a reasonable range, given what you would expect for each variable?
• Are there obvious outliers?
4. Reduce the data, if necessary, and (where supervised
training is involved) separate them into training, validation,
and test datasets. This can involve operations such as
eliminating unneeded variables, transforming variables, and
creating new variables.
5. Determine the data mining task. This involves
translating the general question or problem of step 1
into a more specific statistical question.
6. Choose the data mining techniques to be used
(regression, neural nets, hierarchical clustering, etc.).
7. Use algorithms to perform the task. This is typically
an iterative process –tying multiple variants, and often
using multiple variants of the same algorithm.
8. Interpret the results of the algorithms. This involves
making a choice as to the best algorithm to deploy,
and where possible, testing the final choice on the test
to get an idea as to how well it will perform.
9. Deploy the model. This involves integrating the model
into operational systems and running it on real records
to produce decisions or actions.
PRELIMINARY STEPS
• Organization of database
• Sampling from a database
• Oversampling rare events
• Preprocessing and cleaning the data
• Types of variables
• Handling categorical variables
• Variable selection
• Overfitting
• How many variables and how much data?
• Outliers
• Missing Values
• Normalizing the data
• Use and creation of partitions
• Training partition
• Validation partition
• Test partition
• Shmueli G., et al. Data Mining for Business Intelligence Concepts,
Techniques, and Applications in Microsoft Office Excel with XLMiner 2nd
Ed. A John Wiley & Sons, Inc. Publication
• Bruce P., et al. Data Mining for Business Analytics Concepts, Techniques
and Applications. John Wiley & Sons, Inc. 2020
• Shmueli G., et al. Data Mining for Business Intelligence Concepts,
Techniques, and Applications in Microsoft Office Excel with XLMiner 2nd
Ed. A John Wiley & Sons, Inc. Publication
• Bruce P., et al. Data Mining for Business Analytics Concepts, Techniques
and Applications. John Wiley & Sons, Inc. 2020

You might also like