Data Mining
Data Mining
Data Mining
10.14.1 Purpose
Data mining is used to improve decision making by finding useful patterns and
10.14.2 Description
Data mining is an analytic process that examines large amounts of data from
different perspectives and summarizes the data in such a way that useful patterns
and relationships are discovered.
The results of data mining techniques are generally mathematical models or
equations that describe underlying patterns and relationships. These models can
be deployed for human decision making through visual dashboards and reports,
or for automated decision-making systems through business rule management
systems or in-database deployments.
Data mining is a general term that covers descriptive, diagnostic, and predictive
techniques:
• Descriptive: such as clustering make it easier to see the patterns in a set of
data, such as similarities between customers.
• Diagnostic: such as decision trees or segmentation can show why a
pattern exists, such as the characteristics of an organization's most
profitable customers.
• Predictive: such as regression or neural networks can show how likely
something is to be true in the future, such as predicting the probability that
a particular claim is fraudulent.
In all cases it is important to consider the goal of the data mining exercise and to
be prepared for considerable effort in securing the right type, volume, and quality
of data with which to work.
10.14.3 Elements
.1 Requirements Elicitation
The goal and scope of data mining is established either in terms of decision
requirements for an important identified business decision, or in terms of a
functional area where relevant data will be mined for domain-specific pattern
discovery. This top-down versus a bottom-up mining strategy allows analysts to
pick the correct set of data mining techniques.
Formal decision modelling techniques (see Decision Modelling (p. 265)) are used
to define requirements for top-down data mining exercises. For bottom-up
pattern discovery exercises it is useful if the discovered insight can be placed on
existing decision models, allowing rapid use and deployment of the insight.
.3 Data Analysis
Once the data is available, it is analyzed. A wide variety of statistical measures are
typically applied and visualization tools used to see how data values are
distributed, what data is missing, and how various calculated characteristics
behave. This step is often the longest and most complex in a data mining effort
and is increasingly the focus of automation. Much of the power of a data mining
effort typically comes from identifying useful characteristics in the data. For
instance, a characteristic might be the number of times a customer has visited a
store in the last 80 days. Determining that the count over the last 80 days is more
useful than the count over the last 70 or 90 is key.
.4 Modelling Techniques
.5 Deployment
Once a model has been built, it must be deployed to be useful. Data mining
models can be deployed in a variety of ways, either to support a human decision
maker or to support automated decision-making systems. For human users, data
mining results may be presented using visual metaphors or as simple data fields.
Many data mining techniques identify potential business rules that can be
deployed using a business rules management system. Such executable business
rules can be fitted into a decision model along with expert rules as necessary.
Some data mining techniques—especially those described as predictive analytic
techniques—result in mathematical formulas. These can also be deployed as
executable business rules but can also be used to generate SQL or code for
deployment. An increasingly wide range of in-database deployment options allow
such models to be integrated into an organization's data infrastructure.
.1 Strengths
.2 Limitations