01 - What Is A Data Warehouse
01 - What Is A Data Warehouse
01 - What Is A Data Warehouse
Definition 1
A data warehouse is a type of information system developed to provide an organization with business intelligence to support management decision making.
Since 1970s, organizations gained competitive advantage through systems that automate business processes to offer more efficient and cost-effective services to the customer. This resulted in accumulation of growing amounts of data in operational databases.
Organizations now focus on ways to use operational data to support decisionmaking, as a means of gaining competitive advantage. However, operational systems were never designed to support such business activities. Businesses typically have numerous operational systems with overlapping and sometimes contradictory definitions.
5
Organizations need to turn their archives of data into a source of knowledge, so that a single integrated / consolidated view of the organizations data is presented to the user.
A data warehouse was deemed the solution to meet the requirements of a system capable of supporting decision-making, receiving data from multiple operational data sources.
6
Scenario
A marketing analyst at a large bank wants to execute a marketing campaign aimed at increasing the loyalty of the banks best customers (those with portfolios more than $500,000). The analysts wants to run queries to do the following.
Identify those customer that have more than $500,000 in loans, deposits and investments Send a letter to each customer, offering a new investment product at a preferred rate Determine who responded to the marketing offer by purchasing the investment product during the banks 3rd fiscal quarter
Scenario - Systems
Banking Systems (mix of Mainframe and Unix)
Banking Customer Master
Loan System
Investment System
Deposit System
Questions:
1.
2. 3. 4. 5. 6.
How would the marketer construct the query to understand who has a portfolio more than 500,000 in loans, deposits and investments? How long do you think it would take someone to construct the queries? What skills/knowledge do you think would be required to execute the queries? How quickly do you think the queries would execute? What do you think the impact on the banking systems would be? What customer data should the marketer use for addressing letters to best customers How would the marketer determine who responded to their offer? How would marketer know if response was in the 3rd fiscal quarter?
8
The Answer
In reality, this scenario could not be carried out without some form of data warehouse because:
Queries would take days or weeks to create and perfect. The technical skills and knowledge required to construct the queries would require a team, not just one individual. Queries performance would be excruciatingly slow (days?) Performance of operational systems would be adversely impacted There would be numerous data integrity issues to overcome.
9
Definition 2
A Data Warehouse is a
Subject Oriented
Integrated
Time-variant
Non-volatile
10
Subject Oriented
Data warehouses only retain data for each subject that is useful for analysis purposes
11
Integrated
Data in the Data Warehouse is always integrated integration is one of the key services provided by a DW Integration means:
The process of building the data warehouse corrects the inconsistencies found in operational systems often this is the biggest challenge in developing the data warehouse (customer!!!)
12
Time-Variant
The data in the data warehouse consists of a series of periodic snapshots. It is accurate as of a moment in time (the time the data was extracted from the operational environment). Detailed data in the warehouse has an element of time in the key structure Time horizon in the warehouse is years (vs. weeks or months in an operational system)
13
Non-Volatile
Operational data is very volatile (CRUD) Once data is loaded into the warehouse, it does not change (unless loaded in error). Significant implications for design (simplifies transaction management)
No danger of update anomalies, etc. Operations on the data warehouse consist of bulk load and read
14
Current time value, atomic, limited Historical, summarized retention Application specific Complex; structured for update High transaction volume, small answer sets Updated constantly Serves operational personnel Cross-application; subject oriented Simple; structured for understandability Low volume, large answer sets Refreshed periodically Serves management and analysts
Customer Activity
(e.g., purchase loan)
Time
Products
Location
16