Data Warehousing: Data Models and OLAP Operations
Data Warehousing: Data Models and OLAP Operations
Data Warehousing: Data Models and OLAP Operations
5. ROLAP
6. MOLAP
7. HOLAP
9. Conclusion
Understanding the term Data
Warehousing
• Data Warehouse:
The term Data Warehouse was coined by Bill Inmon in 1990, which he
defined in the following way: "A warehouse is a subject-oriented,
integrated, time-variant and non-volatile collection of data in support of
management's decision making process". He defined the terms in the
sentence as follows:
• Subject Oriented:
Data that gives information about a particular subject instead of about a
company's ongoing operations.
• Integrated:
Data that is gathered into the data warehouse from a variety of sources
and merged into a coherent whole.
• Time-variant:
All data in the data warehouse is identified with a particular time period.
• Non-volatile
Data is stable in a data warehouse. More data is added but data is never
removed. This enables management to gain a consistent picture of the
business.
Data Warehouse
Architecture
Other important
terminology
• Enterprise Data warehouse
collects all information about subjects (customers,products,sales,assets,
personnel) that span the entire organization
• Data Mart
Departmental subsets that focus on selected subjects
extract Query/Reporting
transform
load serve
refresh
etc. e.g., ROLAP
Operational
DB’s Data Mining
serve
Data Marts
Approaches to OLAP
Servers
Three possibilities for OLAP servers
(1) Relational OLAP (ROLAP)
– Relational and specialized relational DBMS to store
and manage warehouse data
– OLAP middleware to support missing pieces
(2) Multidimensional OLAP (MOLAP)
– Array-based storage structures
– Direct access to array data structures
(3) Hybrid OLAP (HOLAP)
– Storing detailed data in RDBMS
– Storing aggregated data in MDBMS
– User access via MOLAP tools
The Multi-Dimensional Data
Model
“Sales by product line over the past six months”
“Sales by store between 1990 and 1995”
...
ROLAP: Dimensional Modeling Using
Relational DBMS
• Special schema design: star, snowflake
• Products
– IBM DB2, Oracle, Sybase IQ, RedBrick,
Informix
Star Schema (in RDBMS)
Star Schema Example
The “Classic” Star Schema
Store Dimension Fact Table Time Dimension
STORE KEY STORE KEY
PERIOD KEY
PRODUCTKEY
Store Description
City PERIOD KEY Period Desc A single fact table, with detail
State
Dollars
Year
Quarter
and summary data
District ID
Units
District Desc.
Price
Month Fact table primary key has
Region_ID Day
Region Desc.
Product Dimension Current Flag only one key column per
Regional Mgr.
Level PRODUCTKEY
Resolution
Sequence
dimension
Product Desc.
Brand
Each key is generated
Color
Size Each dimension is a single
Manufacturer
Level
table, highly de-normalized
Benefits: Easy to understand, easy to define hierarchies, reduces # of physical joins, low
maintenance, very simple metadata
Star Schema
with Sample
Data
The “Snowflake” Schema
Store
Dimension
STORE KEY District_ID Region_ID
Store Description District Desc. Region Desc.
City Region_ID Regional Mgr.
State
District ID
Region_ID
Regional Mgr.
Store Fact Table
STORE KEY
PRODUCT KEY
PERIOD KEY
Dollars
Units
Price
Aggregation in a Single Fact Table
Store Dimension Fact Table Time Dimension
STORE KEY STORE KEY
PERIOD KEY
Store Description PRODUCTKEY
City PERIOD KEY Period Desc
State Year
Dollars Quarter
District ID
Units
District Desc. Month
Price
Region_ID Day
Region Desc. Current Flag
Regional Mgr.
Product Dimension
Resolution
Level PRODUCTKEY Sequence
Product Desc.
Brand
Color
Size
Manufacturer
Level
Drawbacks: Summary data in the fact table yields poorer performance for
summary levels, huge dimension tables a problem
The “Fact Constellation”
Schema
Store Dimension Fact Table Time Dimension
STORE KEY STORE KEY
PERIOD KEY
Store Description PRODUCTKEY
City PERIOD KEY Period Desc
State Year
Dollars Quarter
District ID
Units
District Desc. Month
Price
Region_ID Day
Region Desc. Current Flag
Regional Mgr.
Product Dimension
Sequence
PRODUCTKEY
Product Desc.
Brand District Fact Table
Color Region Fact Table
Size District_ID
Manufacturer PRODUCT_KE Region_ID
PRODUCT_KEY
Y
PERIOD_KEY
PERIOD_KEY
Dollars
Dollars
Units Units
Price Price
“Snowflake” Schema and
Multiple Fact Tables
Store Dimension
• No LEVEL in dimension tables
STORE KEY District_ ID Region_ ID
Store Description District Desc. Region Desc.
• Dimension tables are normalized by
City Region_ ID Regional Mgr.
decomposing at the attribute level
State
District ID • Each dimension table has one key for each
District Desc. level of the dimensionís hierarchy
Region_ ID
Region Desc.
Store Fact Table District Fact Table RegionFact Table
• The lowest level key joins the dimension table
Regional Mgr.
District_ID Region_ID to both the fact table and the lower level
STORE KEY
PRODUCT_KEY
PRODUCT_KEY
PERIOD_KEY attribute table
PRODUCT KEY PERIOD_KEY Dollars
PERIOD KEY Dollars Units
Units Price
Dollars Price
Units
Price
rollup
drill-down
Points to be noticed about ROLAP
• Defines complex, multi-dimensional data with
simple model
• Reduces the number of joins a query has to
process
• Allows the data warehouse to evolve with rel.
low maintenance
• Can contain both detailed and summarized data.
• ROLAP is based on familiar, proven, and already
selected technologies.
BUT!!!
• SQL for multi-dimensional manipulation of
calculations.
MOLAP: Dimensional Modeling
Using the Multi Dimensional Model
dimensions = 2
3-D Cube
Fact table view: Multi-dimensional cube:
dimensions = 3
Example
roll-up to region
Dimensions:
NY
ore SF Time, Product, Store
St roll-up to brand
Attributes:
LA
Product (upc, price, …)
Juice 10
Store …
Product
Milk 34
56
…
Coke
32
Hierarchies:
Cream
12 Product → Brand → …
Soap
Bread 56 roll-up to week Day → Week → Quarter
M T W Th F S S Store → Region → Country
Time
56 units of bread sold in LA on M
Cube Aggregation: Roll-up
Example: computing sums
s1 s2 s3
day 2 ...
p1 44 4
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8
s1 s2 s3
sum 67 12 50
s1 s2 s3
p1 56 4 50
p2 11 8
129
rollup sum
p1 110
drill-down p2 19
Cube Operators for Roll-up
s1 s2 s3
day 2 ...
p1 44 4
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8 sale(s1,*,*)
s1 s2 s3
sum 67 12 50
s1 s2 s3
p1 56 4 50
p2 11 8
129
sum
sale(s2,p2,*) p1 110 sale(*,*,*)
p2 19
Extended Cube
* s1 s2 s3 *
p1 56 4 50 110
p2 11 8 19
day 2 *
s1 67
s2 12
s3 *50 129
p1 44 4 48
p2
s1 s2 s3 *
day 1
p1
*
12
44 4
50 62
48 sale(*,p2,*)
p2 11 8 19
* 23 8 50 81
Aggregation Using
Hierarchies
s1 s2 s3
day 2
p1 44 4 store
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8
region
country
region A region B
p1 56 54
p2 11 8
(store s1 in Region A;
stores s2, s3 in Region B)
Points to be noticed about MOLAP
• Pre-calculating or pre-consolidating transactional data improves
speed.
BUT
Fully pre-consolidating incoming data, MDDs require an
enormous amount of overhead both in processing time and in
storage. An input file of 200MB can easily expand to 5GB
• While MDD servers can handle up to 50GB of storage, RDBMS servers can handle
hundreds of gigabytes and terabytes.
An experiment with Relational and the
Multidimensional models on a data set
The analysis of the author’s example illustrates the following differences between the best
Relational alternative and the Multidimensional approach.
relationalMulti- Improvement
dimensiona
l
Disk space requirement 17 10 1.7
(Gigabytes)
Retrieve the corporate 240 1 240
measures
Actual Vs Budget, by month
(I/O’s)
Calculation of Variance 237 2* 110*
Budget/Actual for the whole
*database
This may include
(I/O the calculation
time of many other derived data without any additional I/O.
in hours)
Reference: http://dimlab.usc.edu/csci599/Fall2002/paper/I2_P064.pdf
What-if analysis
IF
A. You require write access
B. Your data is under 50 GB
C. Your timetable to implement is 60-90 days
D. Lowest level already aggregated
E. Data access on aggregated level
F. You’re developing a general-purpose application for inventory movement or assets
management
THEN
Consider an MDD /MOLAP solution for your data mart
IF
A. Your data is over 100 GB
B. You have a "read-only" requirement
C. Historical data at the lowest level of granularity
D. Detailed access, long-running queries
E. Data assigned to lowest level elements
THEN
Consider an RDBMS/ROLAP solution for your data mart.
IF
A. OLAP on aggregated and detailed data
B. Different user groups
C. Ease of use and detailed data
THEN
Consider an HOLAP for your data mart
Examples
• ROLAP
– Telecommunication startup: call data records (CDRs)
– ECommerce Site
– Credit Card Company
• MOLAP
– Analysis and budgeting in a financial department
– Sales analysis
• HOLAP
– Sales department of a multi-national company
– Banks and Financial Service Providers
Tools available
• ROLAP:
– ORACLE 8i
– ORACLE Reports; ORACLE Discoverer
– ORACLE Warehouse Builder
– Arbors Software’s Essbase
• MOLAP:
– ORACLE Express Server
– ORACLE Express Clients (C/S and Web)
– MicroStrategy’s DSS server
– Platinum Technologies’ Plantinum InfoBeacon
• HOLAP:
– ORACLE 8i
– ORACLE Express Serve
– ORACLE Relational Access Manager
– ORACLE Express Clients (C/S and Web)
Conclusion
• ROLAP: RDBMS -> star/snowflake schema
• http://www.donmeyer.com/art3.html
– Data warehousing Services, Data Mining & Analysis, LLC
• http://www.cs.man.ac.uk/~franconi/teaching/2001/CS636/CS636-olap.ppt
– Data Warehouse Models and OLAP Operations, by Enrico Franconi
• http://www.promatis.com/mediacenter/papers
- ROLAP, MOLAP, HOLAP: How to determine which to technology is
appropriate, by Holger Frietch, PROMATIS Corporation