Introduction To Data Mining: Dr. Hany Saleeb

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 17

Introduction to Data Mining

Dr. Hany Saleeb

Why Data Mining? Potential Applications


Direct Marketing
identify which prospects should be included in a mailing list

Market segmentation
identify common characteristics of customers who buy same products

Market Basket Analysis


Identify what products are likely to be bought together

Insurance Claims Analysis


discover patterns of fraudulent transactions compare current transactions against those patterns

What Is Data Mining?


Combination of AI and statistical analysis to discover information that is hidden in the data
associations (e.g. linking purchase of pizza with beer) sequences (e.g. tying events together: marriage and purchase of furniture) classifications (e.g. recognizing patterns such as the attributes of employees that are most likely to quit) forecasting (e.g. predicting buying habits of customers based on past patterns) Expert systems or small ML/statistical programs

What can data mining do?


Classification
Classify credit applicants as low, medium, high risk Classify insurance claims as normal, suspicious

Estimation
Estimate the probability of a direct mailing response Estimate the lifetime value of a customer

Prediction
Predict which customers will leave within six months Predict the size of the balance that will be transferred by a credit card prospect

What can data mining do? (contd)


Association
Find out items customers are likely to buy together Find out what books to recommend to Amazon.com users

Clustering
Difference from classification: classes are unknown!

Market Analysis and Management


Where are the data sources for analysis?
Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies Target marketing

Find clusters of model customers who share the same characteristics: interest, income level, spending habits, etc.
Determine customer purchasing patterns over time Conversion of single to a joint bank account: marriage, etc.

Cross-market analysis
Associations/co-relations between product sales Prediction based on the association information

Data Mining: Confluence of Multiple Disciplines


Database Technology Statistics

Machine Learning

Data Mining

Visualization

Information Science

Other Disciplines

Data Mining: On What Kind of Data?


Relational databases Data warehouses Transactional databases Advanced DB and information repositories Object-oriented and object-relational databases Spatial databases Time-series data and temporal data Text databases and multimedia databases Heterogeneous and legacy databases WWW

Data Mining Process


Learning
Collecting relevant data Model building

Understanding of business Problem identification

Business strategy and evaluation

Action

Requirements/challenges in Data Mining


User interface Mining methodology Performance Data source Social and Security

Requirements/challenges in Data Mining(2)


User interface
- Data Visualization
Understandability and interpretation of results Information representation and rendering Screen real-estate

- Interactivity
Manipulation of mined knowledge focus and refine mining tasks Focus and refine mining results

Requirements/challenges in Data Mining(3)


Mining Methodology
Mining different kinds of knowledge in databases Interactive mining of knowledge at multiple levels of abstraction Incorporation of background knowledge Query languages Expression and visualization of results Handling noise and incomplete data Pattern evaluation

Requirements/challenges in Data Mining (4)


Performance
Efficiency and scalability of data mining algorithms
Linear algorithms needed

Parallel and distributed methods


Incremental methods Divide and conquer?

Requirements/challenges in Data Mining(5)


Data Source
Diversity of data types
Handling complex types of data Mining information from heterogenous data bases or information repositories Can we expect a DM algorithm to do well on all types of data ?

Data glut
Are we collecting the right data for the right answer? Distinguish between important and unimportant data

Requirements/challenges in Data Mining(6)


Social and Security -Social Impact
Private and sensitive data is gathered and mined without individuals knowledge and/or consent Appropriate use and distribution of discovered knowledge

- Regulations
Need for privacy and DM policies

Data Mining Tools

Summary
The benefits of knowing ones business is critical; technologies are coming together to support data mining. Data mining is the process and result of knowledge production, knowledge discovery and knowledge management.

You might also like