Introduction To Data Mining
Introduction To Data Mining
Introduction To Data Mining
Ian H. Witten, Eibe Frank and Mark A. Hall, Data Mining Practical
Machine Learning Tools and Techniques, Morgan Kaufmann
Publishers, Elsevier, 3rd Edition, 2011.
IEEE Transactions
ACM Transactions
Information Systems
Database Systems
Internet Technology
2
Data Mining or
OBJECTIVES
Evolution of Sciences
Over the last 50 years, most disciplines have grown a third, computational
branch (e.g. empirical, theoretical, and computational ecology, or physics, or
linguistics.)
The Internet and computing Grid that makes all these archives universally
accessible
Evolution of Database
Technology
1960s:
IMS
electronic mode
hierarchical database system by IBM
network DBMS
1970s:
1980s:
RDBMS
application-oriented DBMS
Evolution of Database
Technology
1990s:
Data mining
Data warehousing
Multimedia databases
Web databases
2000s:
Web technology
XML
data integration
social networks
DM Evolution
Petabytes
Exabytes
Zitabytes
PLATO
Alternative names
knowledge extraction
data/pattern analysis
data archeology
data dredging
information harvesting
Data mining
Pattern Evaluation
core of knowledge discovery process (identify true interesting patterns representing knowledge)
Pattern
Data Mining
(intelligent methods applied to extract patterns)
Task-relevant Data
Transformation
(summary, aggregation etc.)
Selection
(retrieve relevant data)
Data Warehouse
Data Cleaning
(remove noise and inconsistent data)
Data Integration
(combine multiple data sources)
Databases
10
EXPLORE !!!!!!!!!!!!!!
R TOOL
PYTHON TOOL
WEKA TOOL
SPSS TOOL
ORANGE TOOL
CLEMENTINE TOOL
11
Decisio
n
Making
Data Presentation
Visualization Techniques
End User
Business
Analyst
Data Mining
Information Discovery
Data
Analyst
Data Exploration
Statistical Summary, Querying, and Reporting
Data Preprocessing/Integration, Data Warehouses
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
DBA
12
Machine
Learning
Pattern
Recognition
Statistics
Data Mining
Algorithm
Visualization
Other
Disciplines
13
High-dimensionality of data
General functionality
Data to be mined
Knowledge to be mined
classification,
Techniques utilized
Relational, data warehouse, transactional, stream, objectoriented/relational, active, spatial, time-series, text, multimedia, heterogeneous, legacy, WWW
Applications adapted
Data Warehousing
Roll-up
17
Multi-Tiered Architecture
other
sources
Operational
DBs
Metadata
Extract
Transform
Load
Refresh
Monitor
&
Integrator
Data
Warehouse
OLAP Server
Serve
Analysis
Query
Reports
Data mining
Data Marts
Data Sources
Data Storage
18
Publications
Tayal, D. K., Jain, A., Arora, S. , Agarwal, S., Gupta, T. and Tyagi, N., Crime
Detection and Criminal Identification in India Using Data Mining Techniques,
Artificial Intelligence & Society (AIS), SPRINGER, vol. 30, no. 1, pp. 117-127,
Feb 2015. [Indexed: Scopus, Google Scholar, EDSCO, ACM Digital Library,
DBLP]
Jain, A. Yadav, D., and Tayal, D. K., NER for Hindi Language Using Association
Rules, International Conference on Data Mining and Intelligent Computing
(ICDMIC 2014), IGDTUW Delhi, India, IEEE, 5th-6th Sept 2014. [Indexed: Scopus]
19