cst466.pptx

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Data mining and warehousing

1.Jaiwei Han and Micheline Kamber, “Data Mining Concepts and Techniques

2.M Sudeep Elayidom, “Data Mining and Warehousing”,

3.Dunham M H, “Data Mining: Introductory and Advanced Topics”,


• Data warehouse-Differences between Operational Database
Systems and Data Warehouses, Multidimensional data model-
Warehouse schema, OLAP Operations, Data Warehouse
Architecture, Data Warehousing to Data Mining, Data Mining
Concepts and Applications, Knowledge Discovery in Database
Vs Data mining, Architecture of typical data mining system,
Data Mining Functionalities, Data Mining Issues.
Definitions
• Data mining deals with the kind of patterns that can be
mined.
• Data mining (sometimes called data or knowledge discovery)
is the process of analyzing data from different perspectives
and summarizing it into useful information
• Data mining is the process of extracting hidden patterns from
data.
• Data mining refers to extracting or mining knowledge from
large amounts of data.
Similar or slightly different meaning to data
mining
• Knowledge mining from data,
• Knowledge extraction,
• Data/pattern analysis,
• Data archaeology, and data dredging.
• Knowledge Discovery from Data, or KDD.
Data, Information, and Knowledge

• Data are any facts, numbers, or text that can be processed by


a computer.

• The patterns, associations, or relationships among all


this data can provide information.

• Information can be converted into knowledge about historical


patterns and future trends.
Data mining applications
• Classification
• Prediction
• Market basket analysis(association rule mining)
• Clustering
• Business intelligence
• Web mining
• Fraud detection
• Production control and science exploration.
Data Mining Stages
Cross-Industry Standard Process for Data
Mining(CRISP-DM)
• Data cleaning (to remove noise and inconsistent data)

• Data integration (where multiple data sources may be combined)1

• Data selection (where data relevant to the analysis task are retrieved from the
database)

• Data transformation (where data are transformed or consolidated into forms


appropriate for mining by performing summary or aggregation operations, for
instance)

• Data mining (an essential process where intelligent methods are applied in order to
extract data patterns)

• Pattern evaluation to identify the truly interesting patterns representing


knowledge based on some interestingness measures;

• Knowledge presentation (where visualization and knowledge representation


tech-niques are used to present the mined knowledge to the user)
Data Mining—On What Kind of Data?
• Relational Databases.

• Data Warehouses

• Transactional Databases

• Advanced Data and Information Systems and Advanced Applications

• Spatial Databases and Spatiotemporal Databases

• Specific Applications includes Temporal Databases, Sequence Databases,


and Time-Series Databases , Multimedia databases .
Data Mining: Confluence of Multiple Disciplines

Pattern
Machine Statistics
Recogniti
Learning
on

Applicatio Visualizati
Data Mining
ns on

Database
Algorithm Technolo
gy

11
Data mining models/Data Mining
Functionalities
Descriptive mining
• It determines, what happened in the past by analyzing stored data.
• The descriptive analysis is used to mine data and provide the latest information
on past or recent events.
• Describes the characteristics of the data in a target data set.
Association analysis.
Cluster Analysis etc

Predictive Data Mining:


• It determines, what can happen in the future with the help past data analysis.
• The predictive analysis provides answers of the future queries
• Carry out the induction over the current and past data so that predictions can be
made.
Classification , Regression etc
Descriptive implicates discovering the interesting patterns or association relating
the data whereas predictive involves the prediction and classification of the
behavior of the model founded on the current and past data.
Data Warehouse
• Data warehousing provides architectures and tools for business executives
to systematically organize, understand, and use their data to make
strategic decisions.
A data warehouse is
• subject-oriented,
• integrated, time-variant,
• non-volatile
OLTP-OLAP Comparison
Basic analytical operations of OLAP

Four types of analytical operations in OLAP


• Roll-up
• Drill-down
• Slice and dice
• Pivot (rotate)
Online Analytical Processing
Roll up
Roll-up is also known as "consolidation" or "aggregation.
Drill Down
Drill-down data is fragmented into smaller parts.
Slice
One dimension is selected, and a new sub-cube is created.
Dice
The difference in dice is you select 2 or more dimensions that
result in the creation of a sub-cube.
Pivot
Rotate the data axes to provide a substitute presentation of
data.
Major Issues in Data Mining
• Mining Methodology
– Mining various and new kinds of knowledge
– Mining knowledge in multi-dimensional space
– Handling noise, uncertainty, and incompleteness of data
– Pattern evaluation and pattern- or constraint-guided mining
• User Interaction
– Interactive mining
– Incorporation of background knowledge
– Presentation and visualization of data mining results
Major Issues in Data Mining
• Efficiency and Scalability
– Efficiency and scalability of data mining algorithms
– Parallel, distributed, stream, and incremental mining methods
• Diversity of data types
– Handling complex types of data
– Mining dynamic, networked, and global data repositories
• Data mining and society
– Social impacts of data mining
– Privacy-preserving data mining
Major Issues in Data Mining
• Human Interaction
• Over fitting
• Outliers
• Interpretation
• Visualization
• Large Datasets
• High Dimensionality
• Multimedia Data
• Missing Data
• Irrelevant Data
• Noisy Data
• Changing Data
• Integration
• Application

You might also like