Lecture 6 Compress

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

What is Data Mining?

Data Mining is a process used by


organizations to extract specific
data from huge databases to solve
business problems. It primarily turns
raw data into useful information.
Why Data Mining?

The Explosive Growth of Data: from terabytes(10004) to yottabytes(10008)


◦ Data collection and data availability
◦ Automated data collection tools, database systems, web
◦ Major sources of abundant data
◦ Business: Web, e-commerce, transactions, stocks, …
◦ Science: bioinformatics, scientific process, medical research …
◦ Society and everyone: news, digital cameras, …

Data rich but information poor!


◦ What does those data mean?
◦ How to analyze data?

Data mining — Automated analysis of massive data sets


What Is Data Mining?
Data mining (knowledge discovery from data)
◦Extraction of interesting patterns or knowledge from huge
amount of data

Alternative names
◦Knowledge discovery (mining) in databases (KDD),
archeology, knowledge extraction, data/pattern analysis,
data dredging, information harvesting, business intelligence,
Simply put etc.

4
Potential Applications

Data analysis and decision support


◦ Market analysis and management
◦ Target marketing, customer relationship management (CRM),
market basket analysis, cross selling, market segmentation
◦ Risk analysis and management
◦ Forecasting, customer retention, quality control, competitive analysis
◦ Fraud detection and detection of unusual patterns (outliers)

Other Applications
◦ Text mining (news group, email, documents) and Web mining
◦ Stream data mining
◦ Bioinformatics and bio-data analysis
Ex.: Market Analysis and Management

Where does the data come from?—Credit card transactions, loyalty cards,
discount coupons, customer complaint calls, surveys …
Target marketing
◦ Find clusters of “model” customers who share the same characteristics: interest,
income level, spending habits, etc.,
◦ E.g. Most customers with income level 60k – 80k with food expenses $600 - $800 a month live in that
area
◦ Determine customer purchasing patterns over time
◦ E.g. Customers who are between 20 and 29 years old, with income of 20k – 29k usually buy this type of
CD player

Cross-market analysis—Find associations/co-relations between product sales,


& predict based on such association
◦ E.g. Customers who buy computer A usually buy software B

6
Ex.: Market Analysis and Management (2)

Customer requirement analysis


◦ Identify the best products for different customers
◦ Predict what factors will attract new customers

Provision of summary information


◦ Multidimensional summary reports
◦ E.g. Summarize all transactions of the first quarter from three different branches
Summarize all transactions of last year from a particular branch
Summarize all transactions of a particular product
◦ Statistical summary information
◦ E.g. What is the average age for customers who buy product A?

Fraud detection
◦ Find outliers of unusual transactions

Financial planning
◦ Summarize and compare the resources and spending

7
KDD Process: Several Key Steps

Learning the application domain


◦ relevant prior knowledge and goals of application

Identifying a target data set: data selection


Data processing
◦ Data cleaning (remove noise and inconsistent data)
◦ Data integration (multiple data sources maybe combined)
◦ Data selection (data relevant to the analysis task are retrieved from database)
◦ Data transformation (data transformed or consolidated into forms appropriate for mining)
(Done with data preprocessing)
◦ Data mining (an essential process where intelligent methods are applied to extract
data patterns)
◦ Pattern evaluation (indentify the truly interesting patterns)
◦ Knowledge presentation (mined knowledge is presented to the user with
visualization or representation techniques)

Use of discovered knowledge

8
A typical DM System Architecture

Database, data warehouse, WWW or other information


repository (store data)
Database or data warehouse server (fetch and
combine data)
Knowledge base (turn data into meaningful groups
according to domain knowledge)
Data mining engine (perform mining tasks)
Pattern evaluation module (find interesting patterns)
User interface (interact with the user)
Confluence of Multiple Disciplines

Database
Technology Statistics

Machine
Information Learning
Science Data Mining

Visualization Other
Disciplines
• Not all “Data Mining System” performs true data mining
 machine learning system, statistical analysis (small amount of data)
 Database system (information retrieval, deductive querying…)

12

You might also like