01 - What Is A Data Warehouse

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 16

So, what is a Data Warehouse?

Definition 1

A data warehouse is a type of information system developed to provide an organization with business intelligence to support management decision making.

The Evolution of Data Warehousing

Since 1970s, organizations gained competitive advantage through systems that automate business processes to offer more efficient and cost-effective services to the customer. This resulted in accumulation of growing amounts of data in operational databases.

The Evolution of Data Warehousing

Organizations now focus on ways to use operational data to support decisionmaking, as a means of gaining competitive advantage. However, operational systems were never designed to support such business activities. Businesses typically have numerous operational systems with overlapping and sometimes contradictory definitions.
5

The Evolution of Data Warehousing

Organizations need to turn their archives of data into a source of knowledge, so that a single integrated / consolidated view of the organizations data is presented to the user.

A data warehouse was deemed the solution to meet the requirements of a system capable of supporting decision-making, receiving data from multiple operational data sources.
6

Scenario

A marketing analyst at a large bank wants to execute a marketing campaign aimed at increasing the loyalty of the banks best customers (those with portfolios more than $500,000). The analysts wants to run queries to do the following.

Identify those customer that have more than $500,000 in loans, deposits and investments Send a letter to each customer, offering a new investment product at a preferred rate Determine who responded to the marketing offer by purchasing the investment product during the banks 3rd fiscal quarter

Scenario - Systems
Banking Systems (mix of Mainframe and Unix)
Banking Customer Master

Marketing Systems (Unix)


Marketing System Customers Prospects Offers

Loan System

Investment System

Deposit System

Questions:
1.
2. 3. 4. 5. 6.

How would the marketer construct the query to understand who has a portfolio more than 500,000 in loans, deposits and investments? How long do you think it would take someone to construct the queries? What skills/knowledge do you think would be required to execute the queries? How quickly do you think the queries would execute? What do you think the impact on the banking systems would be? What customer data should the marketer use for addressing letters to best customers How would the marketer determine who responded to their offer? How would marketer know if response was in the 3rd fiscal quarter?
8

The Answer

In reality, this scenario could not be carried out without some form of data warehouse because:

Queries would take days or weeks to create and perfect. The technical skills and knowledge required to construct the queries would require a team, not just one individual. Queries performance would be excruciatingly slow (days?) Performance of operational systems would be adversely impacted There would be numerous data integrity issues to overcome.
9

Definition 2
A Data Warehouse is a

Subject Oriented Integrated Time-variant Non-volatile

Subject Oriented

Integrated

Time-variant

Non-volatile

collection of data... in support of management decisions


W. H. Inmon, Building the Data Warehouse

10

Subject Oriented

Operational systems are functionally/process oriented:

Loans, deposits, investments

Data warehouses are organized around data subjects:

Customers, products, activities

Data warehouses only retain data for each subject that is useful for analysis purposes

11

Integrated

Data in the Data Warehouse is always integrated integration is one of the key services provided by a DW Integration means:

Consistent Consistent Consistent Consistent

definitions naming conventions measurements (inches yards centimeters) encoding (male/female)

The process of building the data warehouse corrects the inconsistencies found in operational systems often this is the biggest challenge in developing the data warehouse (customer!!!)
12

Time-Variant

The data in the data warehouse consists of a series of periodic snapshots. It is accurate as of a moment in time (the time the data was extracted from the operational environment). Detailed data in the warehouse has an element of time in the key structure Time horizon in the warehouse is years (vs. weeks or months in an operational system)

13

Non-Volatile

Operational data is very volatile (CRUD) Once data is loaded into the warehouse, it does not change (unless loaded in error). Significant implications for design (simplifies transaction management)

No danger of update anomalies, etc. Operations on the data warehouse consist of bulk load and read

14

Warehouses are different (from OLTP)


Operational System Data Warehouse

Current time value, atomic, limited Historical, summarized retention Application specific Complex; structured for update High transaction volume, small answer sets Updated constantly Serves operational personnel Cross-application; subject oriented Simple; structured for understandability Low volume, large answer sets Refreshed periodically Serves management and analysts

Scenario Warehouse Model

Customer Activity
(e.g., purchase loan)

Time

Products

Location

16

You might also like