Data Warehousing Schemas

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 17

Data Warehousing Schemas

From Tables and Spreadsheets to Data Cubes

• A data warehouse is based on a multidimensional data model which views


data in the form of a data cube
• A data cube, such as sales, allows data to be modeled and viewed in
multiple dimensions
– Dimension tables, such as item (item_name, brand, type), or time(day,
week, month, quarter, year)
– Fact table contains measures (such as dollars_sold) and keys to each of
the related dimension tables
• In data warehousing literature, an n-D base cube is called a base cuboid.
The top most 0-D cuboid, which holds the highest-level of summarization, is
called the apex cuboid. The lattice of cuboids forms a data cube.
Cube: A Lattice of
Cuboids
all
0-D(apex) cuboid

time item location supplier


1-D cuboids

time,item time,location item,location location,supplier


2-D cuboids
time,supplier item,supplier

time,location,supplier
time,item,location 3-D cuboids
time,item,supplier item,location,supplier

4-D(base) cuboid
time, item, location, supplier
Conceptual Modeling of Data Warehouses

• Modeling data warehouses: dimensions & measures


– Star schema: A fact table in the middle connected to a set of
dimension tables
– Snowflake schema: A refinement of star schema where some
dimensional hierarchy is normalized into a set of smaller
dimension tables, forming a shape similar to snowflake
– Fact constellations: Multiple fact tables share dimension tables,
viewed as a collection of stars, therefore called galaxy schema or
fact constellation
Star Schema
• 4 Components
– Facts
– Dimensions
– Attributes
– Attribute Hierarchies
Facts
• Numeric measurements (values) that represent
a specific business aspect or activity
• Stored in a fact table at the center of the star
scheme
• Contains facts that are linked through their
dimensions
• Can be computed or derived at run time
• Updated periodically with data from operational
databases
Dimensions
• Qualifying characteristics that provide
additional perspectives to a given fact
– DSS data is almost always viewed in relation
to other data
• Dimensions are normally stored in
dimension tables
Attributes
• Dimension Tables contain Attributes
• Attributes are used to search, filter, or classify
facts
• Dimensions provide descriptive characteristics
about the facts through their attributed
• Must define common business attributes that will
be used to narrow a search, group information,
or describe dimensions. (ex.: Time / Location /
Product)
• No mathematical limit to the number of
dimensions (3-D makes it easy to model)
Attribute Hierarchies
• Provides a Top-Down data organization
– Aggregation
– Drill-down / Roll-Up data analysis
• Attributes from different dimensions can
be grouped to form a hierarchy
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold state_or_province
country
avg_sales
Measures
Star in DMQL
The star schema is defined in DMQL as follows.
define cube sales star [time, item, branch, location]:
dollars sold = sum(sales in dollars), units sold = count(*)
Define dimension time as (time key, day, day of week, month,
quarter, year)
Define dimension item as (item key, item name, brand, type,
supplier type)
define dimension branch as (branch key, branch name, branch
type)
Define dimension location as (location key, street, city, province
or state, country)
Example of Snowflake
time
Schema
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key

branch_key
location
branch location_key
location_key
branch_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
avg_sales city
state_or_province
Measures country
Snowflake explained
The single dimension table for item in the star schema is
normalized in the snowake schema, resulting in new item and
supplier tables. For example, the item dimension table now
contains the attributes supplier key, type, brand, item name,
and item key, the latter of which is linked to the supplier
dimension table, containing supplier type and supplier key
information. Similarly, the single dimension table for location in
the star schema can be normalized into two tables: new
location and city. The location key of the new location table
now links to the city dimension. Notice that further
normalization can be performed on province or state and
country in the snowake schema
Snowflake in DMQL
• define cube sales snowake [time, item, branch, location]:
dollars sold = sum(sales in dollars), units sold = count(*)
• define dimension time as (time key, day, day of week,
month, quarter, year)
• define dimension item as (item key, item name, brand,
type, supplier (supplier key, supplier type))
• define dimension branch as (branch key, branch name,
branch type)
• define dimension location as (location key, street, city
(city key, city, province or state, country))
Example of Fact
time Constellation
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location

branch location_key location to_location


branch_key location_key dollars_cost
branch_name units_sold
street
branch_type dollars_sold city units_shipped
province_or_state
avg_sales country shipper
Measures shipper_key
shipper_name
location_key
shipper_type
Fact constellation explained
This schema species two fact tables, sales and
shipping. The sales table denition is identical to that
of the star schema The shipping table has ve
dimensions, or keys: time key, item key, shipper
key, from location, and to location, and two
measures: dollars cost and units shipped. A fact
constellation schema allows dimension tables to be
shared between fact tables. For example, the
dimensions tables for time, item, and location, are
shared between both the sales and shipping fact
tables. 2
Fact in DMQL
• define cube sales [time, item, branch, location]:
dollars sold = sum(sales in dollars), units sold = count(*)
• define dimension time as (time key, day, day of week, month, quarter, year)
• define dimension item as (item key, item name, brand, type)
• define dimension branch as (branch key, branch name, branch type)
• define dimension location as (location key, street, city, province or state,
country)
• define cube shipping [time, item, shipper, from location, to location]:
• dollars cost = sum(cost in dollars), units shipped = count(*)
• define dimension time as time in cube sales
• define dimension item as item in cube sales
• define dimension shipper as (shipper key, shipper name, location as
location in cube sales, shipper type)
• define dimension from location as location in cube sales
• define dimension to location as location in cube sales

You might also like