0% found this document useful (0 votes)
34 views3 pages

DataWarehousing - Testing Made Easy

The document discusses data warehousing concepts including dimensional modeling, fact and dimension tables, and schemas like star, snowflake, and hybrid. It defines dimensional modeling as subject-oriented, integrated, time-variant, and non-volatile. Fact tables contain numeric facts and dimension tables contain descriptive attributes used to analyze facts. Dimension tables are typically de-normalized while fact tables are highly normalized.

Uploaded by

RajeshCuddapah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views3 pages

DataWarehousing - Testing Made Easy

The document discusses data warehousing concepts including dimensional modeling, fact and dimension tables, and schemas like star, snowflake, and hybrid. It defines dimensional modeling as subject-oriented, integrated, time-variant, and non-volatile. Fact tables contain numeric facts and dimension tables contain descriptive attributes used to analyze facts. Dimension tables are typically de-normalized while fact tables are highly normalized.

Uploaded by

RajeshCuddapah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

According to D.W.

Inmon :
DWH

Subject Oriented
Integrated
Time Varient
Non Volatile

D.W Implementation
Approach

Top Down Approach : D.W.Inmon


Bottom up Approach : Ralph Kimball

SQL Set Operators

Desgined Subject oriented to design


Analysis
Business info collected from various
sources
Allows to anlysis the data with time Eg:
Month, YOY
Once the data entered in DW cannot
change
First Develop EDW then Devlop
Datamarts
First Develop Datamarts then Develp
EDW

Syntax

UNION[ALL]
,MINUS,INTERSECT
String Functions
CONCAT

CONCAT(String1,String2 [..])

RTRIM

RTRIM('String')

LTRIM

LTRIN('String')

TRIM

TRIM('String')
SUBSTRING('String', Start integer, Length of
String)

SUBSTRING
Analytic function
COALESCE Similar to Case
IsNULL (allows only two
arguments)

Returns the first non-null expression in


the list
Replaces NULL with the specified
replacement value.

ROW_NUMBER

COALESCE ( expression [ ,...n ] )


ISNULL ( check_expression,
replacement_value )
ROW_NUMBER() OVER(ORDER BY Column
nam)

RANK

RANK() OVER (ORDER BY Col )

DENSE_RANK
ROW_NUMBER with
PARTITION

DENSE_RANK() OVER (ORDER BY Col )


ROW_NUMBER() OVER(PARTITION BY Col
ORDER BY Col ASC)

Ranking within your ordered partition


No ranks are skipped if there are ranks
with
multiple items
This is kinda like using a Row_number
with Group by

eg:CONVERT(VARCHAR(19),GETDATE())

General function that


converts an expression of one data
type to another

CONVERT()

Returns the sequential number

Null Functions
ISNULL(), NVL(), IFNULL() and
COALESCE()
Joins
Inner Join

Returns all rows when there is at least


one match in BOTH tables

Dimension Table

Return all rows from the left table, and


the matched rows from the right table
Return all rows from the right table,
and the matched rows from the left
table
Return all rows when there is a match
in ONE of the tables
If a table contains primary keys and it
gives the detailed info about business
then such a table
- Entry Points to the fact tables
- Typically in De-Normalized form
- Generally Static and descriptive fields
- Typically used by Group by in SQL
- Typically either Primarkey and
Dimensional attribu

Fact Table

A fact table which contains foreign keys


to dimension tables and numeric facts
- The term FACT represents a single
business measure. E.g. Sales, Qty Sold
- Facts can be detailed level facts or
summarized facts
- Typically the MOST NORMALIZED
TABLE in a dimensional model
- contain HUGE DATA VOLUMES
running into millions of rows

LEFT JOIN

RIGHT JOIN
FULL JOIN

Types of Dimension tables


Degnerated Dimension

Junk Dimeonsions

Slowley Changing Dimension


Conformed Dimension
Fast Chaning Dimensions

data that is dimensional in nature but


stored in a fact table
contain miscellaneous data like flags,
gender, text values etc which is not
useful for reporting
If the data values are changed slowly in
a column or in a row over the period of
time then that dimension table
If Dimension table shared with multiple
fact tables
Changes very fast Eg: Acc Bal,Income
etc

Types of Facts
Additive
Semi Additive
Non Additive
Transformations

Measures that can be added across all


dimensions
Measures that can be added across few
dimensions and not with others
Measures that cannot be added across
all dimensions
It is the process of transforming the
data into a required business format

Data Aggregation

process of integrating the data from


multiple input
Process of removing unwanted/error
out/inaccurate data
multiple detailed values are
summarized into a single unit

Data Purging

Earsing of the data completely

Data Profiling

examining data available from an


existing information source,collecting
stastics and summerise the data with
Aggrate functions (Min,MAX,AVG)

Data Merging
Data Cleansing/Scrubbing

Schemas

Star Schema

Snowflake Schema
Hybrid Schema/Galaxy
Schema/
Fact constellation

It has single fact table connected to


dimension tables like a star.
- The star schema is highly
denormalized
- Simple structure -> easy to
understand schema
- Relatively long time of loading data
into dimension tables (reduendent of
data)
- Performance less compare to Snow
flake
It is an extension of the star schema.In
snowflake schema, very large
dimension tables are normalized into
multiple tables. It is used when a
dimensional table becomes very big
- Highly normalized
- Complex compare to star schema
- Less time to load (due to normalized
data)
- Very good in performance
Combination of Star and Snowflake
schema

You might also like