Lecture 6-Data Mining and Warehousing
Lecture 6-Data Mining and Warehousing
Lecture 6-Data Mining and Warehousing
Functionalities of data mining are used to specify the kind of patterns to be found in data mining tasks. It can be
classified into two categories such as Descriptive and Predictive. Descriptive mining task characterize the general
properties of data in the database, whereas predictive mining task perform inference on the current data in order to
make predictions.
Performance Issues
The performance issues in data mining include efficiency, scalability, and parallelization of data mining algorithms.
Data Preprocessing
Today’s real world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically
huge size, often several giga bytes or more. To improve the quality of the data and efficiency, data preprocessing is
introduced. Real world data tends to be dirty incomplete and inconsistent. This technique can improve the quality of
data, thereby improving accuracy and efficiency of the subsequent data mining process. It is an important step in the
knowledge discovery process. Since quality decisions must be based on quality data. Detecting data anomalies,
rectifying them early, and reducing the data to be analyzed can lead to huge payoffs for decision making.
There are a number of data preprocessing techniques. They are:
– Data Cleaning
– Data Integration
– Data Transformation
– Data Reduction
Data warehouses and their architectures vary depending upon the specifics of an organization’s situation. Three
common data warehouse architectures which are discussed in this section are:
The data warehouse design can be broadly classified into two categories
(1) Logical design and (2) Physical design.
Logical Design
The logical design is more conceptual and abstract than physical design In the logical design, the emphasis is on
the logical relationship among the objects. One technique that can be used to model organization’s logical
information requirements is entity-relationship modeling. Entity-relationship modeling
involves identifying the things of importance (entities), the properties of these things (attributes), and how they
are related to one another (relationships).