Data Mining and Data Warehousing
Data Mining and Data Warehousing
Data Mining and Data Warehousing
A Paper Presentation on
AUTHORS
Abstract
Organisations are today suffering from a malaise of data overflow. The developments in the
transaction processing technology has given rise to a situation where the amount and rate of data
capture is very high, but the processing of this data into information that can be utilised for decision
making, is not developing at the same pace. Data warehousing and data mining (both data & text)
provide a technology that enables the decision-maker in the corporate sector/govt. to process this
huge amount of data in a reasonable amount of time, to extract intelligence/knowledge in a near real
time.
The data warehouse allows the storage of data in a format that facilitates its access, but if the
tools for deriving information and/or knowledge and presenting them in a format that is useful
for decision making are not provided the whole rationale for the existence of the warehouse
disappears. Various technologies for extracting new insight from the data warehouse have
come up which we classify loosely as "Data Mining Techniques".
“Our paper focuses on the need for information repositories and discovery of knowledge
and hence the overview of, the so hyped, Data Warehousing and Data Mining. “
3
Introduction:
Organizations have lately realized that just processing transactions and/or information’s
faster and more efficiently, no longer provides them with a competitive advantage vis-à-vis their
competitors for achieving business excellence. Information technology (IT) tools that are
oriented towards knowledge processing can provide the edge that organizations need to survive
and thrive in the current era of fierce competition. The increasing competitive pressures and the
desire to leverage information technology techniques have led many organizations to explore the
benefits of new emerging technology – viz. "Data Warehousing and Data Mining". What is
needed today is not just the latest and updated to the nano-second information, but the cross-
functional information that can help decisions making activity as "on-line" process.
Processing Processing
Data Information Knowledge
Transaction
Management Data Mining Tools &
s
Information On-Line Analytical
processi
Processing Tools
ng
Systems
4
The Transformation of Data into Knowledge and associated tools.
One thing that remains constant, especially in corporate world, is “Change” And, these days,
infrastructure that allows your company to rapidly respond to change. One solution to this
detail data that supports the decision-making process and provides businesses; the ability
toaccess and analyze data to increase an organization's competitive advantage. Data warehousing
is a process, not an off-the-shelf solution you buy, but hardware--database and tools integrated
into an evolving information infrastructure--that changes with the dynamics of the business.
What is Data-Warehousing?
The data warehouse makes an attempt to figure out "what we need", before we know we need it
A data warehouse stores current and historical data This data is taken from various, perhaps
incompatible, sources and stored in a uniform format Several tools transform this data into
meaningful business information for the purpose of comparisons, trends and forecasting Data in
a warehouse is not updates or changed in any way, but is only loaded and accessed later on Data
warehouse unless it has the following two features: It collects information from a number of
different disparate sources and is the place where this disparity is reconciled, and it allows
Information Sources always include the core operational systems which form the
backbone of day-to-day activities. It is these systems which have traditionally provided
management information to support decision making.
Decision Support Tools are used to analyze the information stored in the warehouse,
typically to identify trends and new business opportunities..
The Data Warehouse itself is the bridge between the operational systems and the
decision support tools. It holds a copy of much of the operational system data in a logical
structure which is more conducive to analysis. The Data Warehouse, which will be
refreshed in scheduled bursts from operational systems and from relevant external data
sources, provides a single, consistent view of corporate data, leaving operational systems
unaffected.
6
Data – Warehouse Functions:
The main function behind a data warehouse is to get the enterprise-wide data in a format
that is most useful to end-users, regardless of their locations. Data warehousing is used
for:
• Query and
Legacy Database reporting
• Multi-
Metadata
dimensional
Extract
Operational Database Transform Data analysis
Maintain Warehous tools
e • Other OLAP
External Data Source tools
• Data mining
tools
Data Mining
Data base mining or Data mining (DM) (formally termed Knowledge Discovery in
Databases – KDD) is a process that aims to use existing data to invent new facts and to uncover
new relationships previously unknown even to experts thoroughly familiar with the data. It is
like extracting precious metal (say gold etc.) and/or gems, hence the term “mining”, It is based
on filtration and assaying of mountain of data “ore” in order to get “nuggets” of knowledge. The
data mining process is diagrammatically exemplified in Figure below
Transformed Data
Data Sources
1 Extracted
Assimilated Information
Information
2 Data Selected
Warehouse Data
N
Select Transform Mine Assimilate
Prediction: Data mining can show how certain attributes within the data will behave in
the future.
Identification: Data patterns can be used to identify the existence of an item, an event, or
an activity.
Classification: Data mining can partition the data so that different classes or categories
can be identified based on combinations of parameters.
Optimization: One eventual goal of data mining may be to optimize the use of limited
resources such as time, space, money, or materials and to maximize output variables such
as sales or profits under a given set of constraints.
10
CONCLUSION
A data warehouse takes the organisations operational data, historical data and external data
consolidates it into a separately designed database (which can either be relational or multi-
dimensional in nature) manages it into a format that is optimised for end users to access and
analyse. When a data warehouse has been constructed, it provides a complete picture of the
enterprise. It provides an unparalleled opportunity to the management to learn about their
customers. The data warehouse technology together with online transaction processing and data
mining, allows the management to provide better customer service, create greater customer
loyalty and activity, focus customer acquisition and retention of the most profitable customer,
increase revenue, reduce operating cost; provides tools that facilitate sounder decision making;
improves worker/management knowledge and productivity; spares the operational database from
ad-hoc queries with the resulting performance degradation and clears the legacy database system,
while moving the corporate system architecture forward.
11
Bibliography