CSEE8
CSEE8
CSEE8
A Paper Presentation on
PRESENTED BY:
T.LAKSHMI KUMARI
M.MADHAVI
G.M.R.I.T.
G.M.R.I.T.
RAJAM.
RAJAM.
E-MAIL:
luck.lakshmi@gmail.com
madhavi_nvss@yahoo.co.in
Abstract :
Organisations are today suffering from a malaise of data overflow. The developments in
the transaction processing technology has given rise to a situation where the amount and rate of
data capture is very high, but the processing of this data into information that can be utilised for
decision making, is not developing at the same pace. Data warehousing and data mining (both
data & text) provide a technology that enables the decision-maker in the corporate sector/govt. to
process this huge amount of data in a reasonable amount of time, to extract
intelligence/knowledge in a near real time.
The data warehouse allows the storage of data in a format that facilitates its access, but if
the tools for deriving information and/or knowledge and presenting them in a format that is
useful for decision making are not provided the whole rationale for the existence of the
warehouse disappears. Various technologies for extracting new insight from the data warehouse
have come up which we classify loosely as "Data Mining Techniques".
Our paper focuses on the need for information repositories and discovery of knowledge
and thence the overview of , the so hyped, Data Warehousing and Data Mining.
Content Overview
Index
Page No
Introduction
What is Data-Warehousing?
Warehousing Functions
Compendium
Bibliography
10
Introduction
Knowledge [no more Information] is not only power, but also has significant
competitive advantage
Organizations have lately realized that just processing transactions and/or informations
faster and more efficiently, no longer provides them with a competitive advantage vis--vis their
competitors for achieving business excellence. Information technology (IT) tools that are
oriented towards knowledge processing can provide the edge that organizations need to survive
and thrive in the current era of fierce competition. The increasing competitive pressures and the
desire to leverage information technology techniques have led many organizations to explore the
benefits of new emerging technology viz. "Data Warehousing and Data Mining". What is
needed today is not just the latest and updated to the nano-second information, but the crossfunctional information that can help decisions making activity as "on-line" process.
Transaction
s
Processi
ng
Processing
Information
Management
Information
Processing
Knowledge
What is Data-Warehousing?
The data warehouse makes an attempt to figure out "what we need", before we know we need it.
What it actually is?
*
This data is taken from various, perhaps incompatible, sources and stored in a
uniform format
5
*
Several tools transform this data into meaningful business information for the
purpose of comparisons, trends and forecasting
Data in a warehouse is not updates or changed in any way, but is only loaded and
accessed later on
In general a database is not a data warehouse unless it has the following two features:
It collects information from a number of different disparate sources and is the place
where this disparity is reconciled, and
Information Sources always include the core operational systems which form the
backbone of day-to-day activities. It is these systems which have traditionally provided
management information to support decision making.
Decision Support Tools are used to analyze the information stored in the warehouse,
typically to identify trends and new business opportunities..
The Data Warehouse itself is the bridge between the operational systems and the
decision support tools. It holds a copy of much of the operational system data in a logical
structure which is more conducive to analysis. The Data Warehouse, which will be
6
refreshed in scheduled bursts from operational systems and from relevant external data
sources, provides a single, consistent view of corporate data, leaving operational systems
unaffected.
A front end for Decision Support System (DSS) for reporting and for structured and
unstructured analysis.
Legacy Database
Operational Database
External Data Source
Metadata
Extract
Transform
Maintain
Data
Warehous
Query and
reporting
Multidimensional
analysis
tools
Other OLAP
tools
Data mining
tools
7
Schematic view of the Data Warehouse Architecture.
Data Mining
Data base mining or Data mining (DM) (formally termed Knowledge Discovery in
Databases KDD) is a process that aims to use existing data to invent new facts and to uncover
new relationships previously unknown even to experts thoroughly familiar with the data. It is
like extracting precious metal (say gold etc.) and/or gems, hence the term mining, It is based
on filtration and assaying of mountain of data ore in order to get nuggets of knowledge. The
data mining process is diagrammatically exemplified in Figure below
Transformed Data
Extracted
Information
Selected
Data
Data
Warehouse
select
Transform
Mine
Assimilated Information
Data Sources
Assimilate
Data selection ,Data about specific items or categories of items, or from stores in a specific
region or area of the country, may be selected.
Data cleansing process then may correct invalid zip codes or eliminate records with
incorrect phone prefixes.
Prediction :Data mining can show how certain attributes within the data will behave in the
future.
Identification: Data patterns can be used to identify the existence of an item, an event, or an
activity.
Classification : Data mining can partition the data so that different classes or categories can be
identified based on combinations of parameters.
Optimization :One eventual goal of data mining may be to optimize the use of limited
resources such as time, space, money, or materials and to maximize output variables such as
sales or profits under a given set of constraints.
Compendium
A data warehouse takes the organisations operational data, historical data and external data
consolidates it into a separately designed database (which can either be relational or multidimensional in nature)
manages it into a format that is optimised for end users to access and analyse.
When a data warehouse has been constructed, it provides a complete picture of the
enterprise. It provides an unparalleled opportunity to the management to learn about their
customers.
The data warehouse technology together with online transaction processing and data
mining, allows the management to provide better customer service, create greater customer
loyalty and activity, focus customer acquisition and retention of the most profitable customer,
increase revenue, reduce operating cost; provides tools that facilitate sounder decision making;
improves worker/management knowledge and productivity; spares the operational database from
10
ad-hoc queries with the resulting performance degradation and clears the legacy database system,
while moving the corporate system architecture forward.
With the incorporation of new data delivery and presentation techniques, like hypertext
mark up language (HTML), Open Database Connectivity (ODBC) etc. the database mining (Data
& Text) operation has gained wide spread recognition as a viable tool for business intelligence
gathering. Advances in the document mining technology (database mining of free form text/data,
in contrast to the classical approach to data mining of fixed length records) are making the data
mining technology more powerful.
Last but never the least, the Internet has emerged as the largest data warehouse of
unstructured and free form data. The new technologies are geared towards mining this great data
warehouse.
Bibliography
Developer IQ Indias First Software Magazine,May 2002 Vol.2 No.6
Computers Today Smart Facts about DataWarehousing by Atanu Roy
Using Information Technology by William Sawyers Hutchinson
Data Base System Concepts by Silberschatz,Korth and Sudharshan
http://www.google.com/