Data Mining: Concepts and Techniques: - Chapter 1
Data Mining: Concepts and Techniques: - Chapter 1
Data Mining: Concepts and Techniques: - Chapter 1
1
Acknowledgements
3
Where to Find the Set of Slides?
5
Motivation: “Necessity is the
Mother of Invention”
6
Evolution of Database Technology
(See Fig. 1.1)
1960s:
Data collection, database creation, IMS and network DBMS
1970s:
Relational data model, relational DBMS implementation
1980s:
RDBMS, advanced data models (extended-relational, OO,
deductive, etc.) and application-oriented DBMS (spatial, scientific,
engineering, etc.)
1990s—2000s:
Data mining and data warehousing, multimedia databases, and
Web databases
7
What Is Data Mining?
9
Market Analysis and Management (1)
10
Market Analysis and Management (2)
Customer profiling
data mining can tell you what types of customers buy what
products (clustering or classification)
Identifying customer requirements
identifying the best products for different customers
use prediction to find what factors will attract new customers
Provides summary information
various multidimensional summary reports
statistical summary information (data central tendency and
variation)
11
Corporate Analysis and Risk
Management
12
Fraud Detection and Management (1)
Applications
widely used in health care, retail, credit card services,
telecommunications (phone card fraud), etc.
Approach
use historical data to build models of fraudulent behavior and
use data mining to help identify similar instances
Examples
auto insurance: detect a group of people who stage accidents to
collect on insurance
money laundering: detect suspicious money transactions (US
Treasury's Financial Crimes Enforcement Network)
medical insurance: detect professional patients and ring of
doctors and ring of references
13
Fraud Detection and Management (2)
Sports
IBM Advanced Scout analyzed NBA game statistics (shots
blocked, assists, and fouls) to gain competitive advantage for
New York Knicks and Miami Heat
Astronomy
JPL and the Palomar Observatory discovered 22 quasars with
the help of data mining
Internet Web Surf-Aid
IBM Surf-Aid applies data mining algorithms to Web access
logs for market-related pages to discover customer preference
and behavior pages, analyzing effectiveness of Web marketing,
improving Web site organization, etc.
15
Data Mining: A KDD Process
Pattern Evaluation
Data mining: the core of
knowledge discovery
process. Data Mining
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Databases
16
Steps of a KDD Process
Data Exploration
Statistical Analysis, Querying and Reporting
Pattern evaluation
Data
Databases Warehouse
19
Data Mining: On What Kind of
Data?
Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
Object-oriented and object-relational databases
Spatial databases
Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous and legacy databases
WWW
20
Data Mining Functionalities (1)
Outlier analysis
Outlier: a data object that does not comply with the general behavior
of the data
It can be considered as noise or exception but is quite useful in fraud
detection, rare events analysis
Machine
Learning
Data Mining Visualization
Information Other
Science Disciplines
26
Data Mining: Classification Schemes
General functionality
Descriptive data mining
Predictive data mining
Different views, different classifications
Kinds of databases to be mined
Kinds of knowledge to be discovered
Kinds of techniques utilized
Kinds of applications adapted
27
A Multi-Dimensional View of Data
Mining Classification
Databases to be mined
Relational, transactional, object-oriented, object-relational, active,
Techniques utilized
Database-oriented, data warehouse (OLAP), machine learning,
Layer2
MDDB
MDDB
Meta
Data
Filtering&Integration Database API Filtering
Layer1
Data cleaning Data
Databases Data
Data integration Warehouse Repository
30
Major Issues in Data Mining (1)
33
A Brief History of Data Mining
Society
1989 IJCAI Workshop on Knowledge Discovery in Databases
(Piatetsky-Shapiro)
Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. Frawley, 1991)
1991-1994 Workshops on Knowledge Discovery in Databases
Advances in Knowledge Discovery and Data Mining (U. Fayyad, G. Piatetsky-
Shapiro, P. Smyth, and R. Uthurusamy, 1996)
1995-1998 International Conferences on Knowledge Discovery in
Databases and Data Mining (KDD’95-98)
Journal of Data Mining and Knowledge Discovery (1997)
1998 ACM SIGKDD, SIGKDD’1999-2001 conferences, and SIGKDD
Explorations
More conferences on data mining
PAKDD, PKDD, SIAM-Data Mining, (IEEE) ICDM, etc.
34
Where to Find References?
36
http://www.cs.sfu.ca/~han