Slowly Changing Dimensions

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 26

Slowly Changing Dimensions

Presenter:
Miky Schreiber

http://www.miky-schreiber.com
Topics
What we’ll see: We won’t see:
 SCD in DWH  SCD in OLAP
 Why?
 Who? When?
 How?

http://www.miky-schreiber.com 2 Slowly Changing Dimensions


?Why
Let’s take Sales fact table for example
Every day more and more sales take place, hence:
More and more rows are added to the fact table

Very rarely are the rows in the fact table updated with changes

Also Consider...
How will we adjust the fact table when changes are made?

http://www.miky-schreiber.com 3 Slowly Changing Dimensions


…’Why? cont
Consider the dimension tables
Comapred to the fact tables, they are more stable and less volatile
However, unlike fact tables, a dimension table does not change just
through the increase of number of rows, but also through
changes to the attributes themselves

http://www.miky-schreiber.com 4 Slowly Changing Dimensions


?Who? When
Who?
Fact tables and Dimension tables
We will focus on (Slowly Changing) Dimensions

When?
Good question:
 Inside the ETL process
 After the ETL process, as a stored procedure
 Never (wait, you’ll see…)

http://www.miky-schreiber.com 5 Slowly Changing Dimensions


?How
This is the big question.
From what we discussed for now, we can derive these principles:

 Most dimensions are generally constant over time


 Many dimensions, through not constant over time, change slowly
 The product (business) key of the source record does not change
 The description and other attributes change slowly over time
 In the source OLTP system, the new values overwrite the old ones
 Overwriting of dimension table attributes is not always the
appropriate option in a data warehouse
 The ways changes are made to the dimension tables depend on the
types of changes and what information must be preserved in the
DWH

http://www.miky-schreiber.com 6 Slowly Changing Dimensions


:How? 3 Answers
The usual changes to dimension tables are classified into three types
 Type 1
 Type 2
 Type 3

We will consider the points discussed earlier when deciding which


type to use

Before going on, we must talk about one more thing.

Also Consider…
Do we have to use the same type for the entire DWH?
For the same dimension?

http://www.miky-schreiber.com 7 Slowly Changing Dimensions


Surrogate Key
• A surrogate key is a unique identifier for the entity in the modeled
world
• It is not derived from application data
• It’s not meant to be shown outside the DWH

• It’s only significance is to act as the primary key


• Frequently it’s a sequential number (Sequence in Oracle or
Identity in SQL Server)

http://www.miky-schreiber.com 8 Slowly Changing Dimensions


Surrogate Key, cont.
• Having the key independent of all other columns insulates the
database relationships from changes in the data values or
database design (making the database more agile) and guarentees
uniqueness
• For example: An employee ID is chosen as the neutral (business)
key of an employee DWH. Because of a merger with another
company, new employees from the merged company must be
inserted. There is one employee who works in both companies…

• If the key is a compound key, joining is more expensive because


there are multiple columns to compare. Surrogate keys are always
contained in a single column

http://www.miky-schreiber.com 9 Slowly Changing Dimensions


Our example
Customer
For the demonstration, we’ll use this star schema:
Customer Key
Customer Name
Product
Customer Code
Product Key Order fact Martial Status
Product Name Product Key Address
Product Code Time Key State
Customer Key Zip
Product Line
Salesperson Key
Brand Order Dollars
Time Salesperson
Cost Dollars
Time Key Margin Dollars Salesperson Key
Date Sale Units Salesperson Name
Month Territory Name
Quarter Region Name
Year 10
http://www.miky-schreiber.com Slowly Changing Dimensions
Type 1 Changes
Usually relate to corrections of errors in the source system
For example, the customer dimension: Mickey Schreiber -> Miky Schreiber

Also Consider…
What will happen when number of children is changed?

http://www.miky-schreiber.com 11 Slowly Changing Dimensions


.Type 1 Changes, cont
General Principles for Type 1 changes:
 Usually, the changes relate to correction of errors in the source system
 Sometimes the change in the source system has no significance
 The old value in the source system needs to be discarded
 The change in the source system need not be preserved in the DWH

Also Consider…
What will happen when only the last value before the change is needed?

http://www.miky-schreiber.com 12 Slowly Changing Dimensions


Applying Type 1 changes Change Box
Customer Code:
Key inRestructuring K12356
 Overwrite the attribute value the dimension table row with the new value
 The old value of the attributeK12356 ->
is not preserved Customer Name:
 No other changes are made in 33154112
the dimension table row
Miky Schreiber
 The key of this dimension table or any other key values are not affected
 Easiest to implement Before After
Customer Key: 33154112 33154112
Customer Name: Mickey Schreiber Miky Schreiber
Customer Code: K12356 K12356
Martial Status: Married Married
Address: Negba 11 ST Negba 11 ST

Also Consider…
Which indexes will help here?
How the “change box” will appear in the real world?

http://www.miky-schreiber.com 13 Slowly Changing Dimensions


Type 2 Changes
Let’s look at the martial status of Miky Schreiber
One the DWH’s requirements is to track orders by martial status (in addition to other
attributes)
All changes before 11/10/2004 will be under Martial Status = Single, and all changes after
that date will be under Martial Status = Married
We need to aggregate the orders before and after the marriage separately

Let’s make life harder:


Miky is living in Negba st., but on 30/8/2009 he moves to Avivim st.

http://www.miky-schreiber.com 14 Slowly Changing Dimensions


.Type 2 Changes, cont
General Principles for Type 2 changes:
 They usually relate to true changes in source systems
 There is a need to preserve history in the DWH
 This type of change partitions the history in the DWH
 Every change for the same attributes must be preserved

Also Consider…
 Must we track changes for all the attributes?
 For which attributes will we track changes? What are the considerations?

http://www.miky-schreiber.com 15 Slowly Changing Dimensions


Change Box
Customer Code:
Applying Type 2 changes K12356
Martial Status
Key Restructuring (11/10/2004):
K12356 -> 33154112 Married
51141234 Address (30/8/2009):
52789342 Avivim st.
Before After 11/10/2004 After 30/8/2009
Customer Key: 33154112 51141234 52789342
Customer Name: Miky Schreiber Miky Schreiber Miky Schreiber
Customer Code: K12356 K12356 K12356
Martial Status: Single Married Married
Address: Negba 11 ST Negba 11 ST Avivim st.
Also Consider…
 What will happen if in addition to Address we also have State, zip code?
 What will happen if the customer code will change?

http://www.miky-schreiber.com 16 Slowly Changing Dimensions


Type 2 concluded
The steps:
 Add a new dimension table row with the new value of the changed attribute
 An effective date will be included in the dimension table
 There are no changes to the original row in the dimension table
 The key of the original row is not affected
 The new row is inserted with a new surrogate key

Also Consider…
 What is the data type of the effective date column? Must it contain both date and time?
 How will the surrogate key be built?
 Advantages? Disadvantages?

http://www.miky-schreiber.com 17 Slowly Changing Dimensions


Type 3 Changes
Not common at all
Complex queries on type 2 changes may be
 Hard to implement
 Time-consuming
 Hard to maintain

We want to track history without lifting heavy burden


There are many soft changes and we don’t care for the “far” history

http://www.miky-schreiber.com 18 Slowly Changing Dimensions


Type 3 Changes
General Principles:
 They usually relate to “soft” or tentative changes in the source systems
 There is a need to keep track of history with old and new values of the changes
attribute
 They are used to compare performances across the transition
 They provide the ability to track forward and backward

http://www.miky-schreiber.com 19 Slowly Changing Dimensions


Salesperson ID:
Applying Type 3 changes RS199701
Territory Name:
Key Restructuring
Netanya
RS199701 -> 12345
(12/1/2000)

Before After
Salesperson Key: 12345 12345
Salesperson Name: Boris Kavkaz Boris Kavkaz
Old Territory Name: (null) Ra’anana
Current Territory Name: Ra’anana Netanya
Effective Date: 1/1/1998 12/1/2000
Also Consider…
 What is the effective date before the change?
 Can the old terriroty column contain null? What about the current territory?

http://www.miky-schreiber.com 20 Slowly Changing Dimensions


Type 3 concluded
 No new dimension row is needed
 The existing queries will seamlessly switch to the current value
 Any queries that need to use the old value must be revised accordingly
 The technique works best for one soft change at a time
 If there is a succession of changes, more sophisticated techniques must be advised

http://www.miky-schreiber.com 21 Slowly Changing Dimensions


There’s even more
 Type 0 changes
 Type 4 – using history tables
 Type 6 – Hybrid (what about 5?)
 Type 6 – Alternative implementation
 SCD in OLAP

http://www.miky-schreiber.com 22 Slowly Changing Dimensions


Conclusions
 3 Main ways of history tracking
 Choose the way you’d like for every dimension table
 You may combine the types
 It all depends on the system’s requirements

http://www.miky-schreiber.com 23 Slowly Changing Dimensions


Bibliography
 Data Warehousing Fundamentals, Paulraj Ponniah, John
Wiley Publication
 Wikipedia (Slowly Changing Dimension)

http://www.miky-schreiber.com 24 Slowly Changing Dimensions


Questions?

http://www.miky-schreiber.com 25 Slowly Changing Dimensions


Thank you

http://www.miky-schreiber.com 26 Slowly Changing Dimensions

You might also like