Define Data Warehouse. Differentiate Between OLTP and OLAP Databases

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

1. Define Data Warehouse.

Differentiate between OLTP and OLAP databases

Datawarehouse provides architectures and tools for businesses executives or managers to


systematically organize, understand and use their data to make strategic decisions.
Many industries spend a lot of amount in building DWH.

DWH is large marketing weapon which can be used to retain users. It is subject oriented,
integrated, time variant and non-volatile.

OLTP vs OLAP databases:

 OLTP are used for online transactional and query processing focus on customers with
current data using ER database design
 OLAP is used for analysis purposes for knowledge workers focus on market data
which historical and categorical data using start+subject database design

S.No. Feature OLTP OLAP


1 Definition Online Transactional Online analytical
Processing Processing
2 Characteristic Operational processing Informational
processing
3 Orientation Transaction Analysis
4 users DBA, DB professional Knowledge workers
(Manager)
5 Function Day to day operation Historical (or) Decision
Support
6 DB Design ER based Design & Star/Snowflake
schemas &
Application oriented
Subject-oriented
7 Data Dynamic, current, Static, historical,
relational, detailed, summarized,
up to date consolidated,
multidimensional
8 View Detailed view Summarized view
9 Unit of work Short & simple Complex queries
transaction required
10 Access Read / Write Mostly read
11 No. Records Tens/Hundreds Millions
12 Users Thousands Hundreds

13 Db size 100MB to GB 100 GB to TB


14 Priority High performance High flexibility
15 Metrics Transaction throughput Query throughput
16 Focus data in Information out
17 Operations Indexing, hashing Lots of scans for
analysis

2. Compare star and snow flake schema representation of multi-dimensional model.

Star schema is the type of multidimensional model which is used for data warehouse. In
star schema, The fact tables and the dimension tables are contained. In this schema
fewer foreign-key join is used. This schema forms a star with fact table and dimension
tables.

Snowflake Schema is also the type of multidimensional model which is used for data
warehouse. In snowflake schema, The fact tables, dimension tables as well as sub
dimension tables are contained. This schema forms a snowflake with fact tables,
dimension tables as well as sub-dimension tables.

The differences are as follows:

S.NO STAR SCHEMA SNOWFLAKE SCHEMA

While in snowflake schema, The


In star schema, The fact fact tables, dimension tables as
tables and the dimension well as sub dimension tables
1. tables are contained. are contained.

Star schema is a top-down


2. model. While it is a bottom-up model.

Star schema uses more


3. space. While it uses less space.

While it takes more time than star


It takes less time for the schema for the execution of
4. execution of queries. queries.

In star schema,
Normalization is not While in this, Both normalization
5. used. and denormalization are used.

6. It’s design is very simple. While it’s design is complex.

7. The query complexity of While the query complexity of


snowflake schema is higher
star schema is low. than star schema.

It’s understanding is very While it’s understanding is


8. simple. difficult.

It has less number of foreign While it has more number of


9. keys. foreign keys.

10. It has high data redundancy. While it has low data redundancy.

3. Explain about OLAP operations with suitable examples.

OLAP operations allow user-friendly environment for interactive data analysis of Data
Cube. A number of OLAP operations existed to provide flexibility to view the data in
different perspective by allowing interactive querying & analysis of data.

 Ex:- Consider All Electronics Sales. The data cube contains item, time, and location
dimensions. Time is aggregated to Quarters and location is aggregated to city values.
The measure used is dollars-sold. There are 5 popular OLAP operations performed on
data cube.
 Slice:- Performs a selection on one dimension of given data cube resulting sub-cube.
Ex:- slice for time = “Q1”
 Dice:- Performs a selection on two/more dimensions of given data cube, resulting
sub-cube. Ex:- dice for (location = “Toronto” or “Vancouver”) and (time = “Q1” or
“Q2”)and (item=“Home Entertainment” or “computer”)
 Roll up (drill-up): Performs aggregation on data cube either by climbing up a concept
hierarchy for a dimension or dimension reduction. Ex:- roll-up on location (from
cities to countries)
 Drill down (roll down): reverse of roll-up, either by stepping down a concept
hierarchy for a dimension or introducing additional dimensions. Ex:- drill-down on
time (from quarters to months)
 Pivot:- is a visualization operation that rotates the data axes in view in order to
provide an alternative presentation of data. Ex:- pivot on item and location ( these
axes re rotated)

Other operations on data cube:

 Drill across: executes queries involving (across) more than one fact table
 Drill through: through the bottom level of the cube to its back-end relational tables
(using SQL).
 Some OLAP operations also used for ranking items in the list, currency conversions,
growth rates etc.

4. Draw a neat sketch of 3-tier Data Warehouse architecture and explain all the
components in it.

Data warehouses often adopt a three-tier architecture:

1. The bottom tier is a warehouse database server that is almost always a relational
database system. Back-end tools and utilities are used to feed data into the bottom tier
from operational databases or other external sources (such as customer profile
information provided by external consultants). These tools and utilities perform data
extraction, cleaning, and transformation (e.g., to merge similar data from different
sources into a unified format), as well as load and refresh functions to update the data
warehouse. The data are extracted using application program interfaces known as
gateways. A gateway is supported by the underlying DBMS and allows client
programs to generate SQL code to be executed at a server. Examples of gateways
include ODBC and OLEDB.
2. Middle Tier :- It is an OLAP server presents multidimensional data from DWH/Data
Marts. It includes ROLAP/MOLAP/HOLAP servers:

i) ROLAP: - Use relational DBMS to store and manage warehouse data and include
optimization of DBMS backend, implementation of aggregation navigation logic, and
additional tools and services. Greater Scalability. Ex:- Informix, Informatica
ii) MOLAP: - It is a special purpose server that directly implements multidimensional
model and operations with multidimensional storage engine. It allows fast indexing to
precomputed, summarized data. It uses sparse matrix compression techniques to store
data Ex: - Essbase of Arbor
iii) HOLAP: - The hybrid OLAP servers combines both ROLAP & MOLAP. It uses the
greater scalability of ROLAP and fast computation of MOLAP. Ex: - MS-SQL server
7.0

3. The top tier is a front-end client layer, which contains query and reporting tools,
analysis tools, and/or data mining tools (e.g., trend analysis, prediction, and so on).
4. Data extraction get data from multiple, heterogeneous, and external sources
5. Data cleaning detect errors in the data and rectify them when possible
6. Data transformation convert data from legacy or host format to warehouse format 
Load  sort, summarize, consolidate, compute views, check integrity, and build
indices and partitions  Refresh  propagate the updates from the data sources to the
warehouse
7. Meta data is the data defining warehouse objects. It stores: Description of the
structure of the data warehouse schema, view, dimensions, hierarchies, derived data
definition, data mart locations and contents Operational meta-data data lineage
(history of migrated data and transformation path), currency of data (active, archived,
or purged), monitoring information (warehouse usage statistics, error reports, audit
trails) The algorithms used for summarization The mapping from operational
environment to the data warehouse Data related to system performance warehouse
schema, view and derived data definitions Business data business terms and
definitions, ownership of data, charging policies
5. What is materialization? What are different types of materialization?

Typically, data flows from one or more online transaction processing (OLTP) database
into a data warehouse on a monthly, weekly, or daily basis. The data is normally
processed in a staging file before being added to the data warehouse. Data
warehouses commonly range in size from tens of gigabytes to a few terabytes.
Usually, the vast majority of the data is stored in a few very large fact tables.

One technique employed in data warehouses to improve performance is the creation of


summaries. Summaries are special types of aggregate views that improve query
execution times by pre calculating expensive joins and aggregation operations prior to
execution and storing the results in a table in the database. For example, you can
create a table to contain the sums of sales by region and by product.

The summaries or aggregates that are referred to in this book and in literature on data
warehousing are created in Oracle Database using a schema object called
a materialized view. Materialized views can perform a number of roles, such as
improving query performance or providing replicated data. The process of setting up
a materialized view is sometimes called materialization.

There are three choices for data cube materialization given a base cuboid:

1. No materialization: Do not precompute any of the “non-base” cuboids. This leads to


computing expensive multidimensional aggregates on the fly, which can be extremely
slow.
2. Full materialization: Precompute all of the cuboids. The resulting lattice of computed
cuboids is referred to as the full cube. This choice typically requires huge amounts of
memory space in order to store all of the precomputed cuboids.
3. Partial materialization: Selectively compute a proper subset of the whole set of
possible cuboids. Alternatively, we may compute a subset of the cube, which contains
only those cells that satisfy some user-specified criterion, such as where the tuple
count of each cell is above some threshold. We will use the term sub-cube to refer to
the latter case, where only some of the cells may be precomputed for various cuboids.
Partial materialization represents an interesting trade-off between storage space and
response time. The partial materialization of cuboids or sub-cubes should consider
three factors:
1. Identify the subset of cuboids or sub-cubes to materialize
2. Exploit the materialized cuboids or sub-cubes during query processing
3. Efficiently update the materialized cuboids or sub-cubes during load and
refresh.

6. Discuss about OLAM architecture and its benefits?

OLAM server performs Analytical Mining like how OLAP performs Analytical
Processing. An integrated OLAM & OLAP Architecture is shown below.

 Both OLAM and OLAP servers accept user queries via GUI API and work with data
cube via cube API.
 A meta data directory used to guide the access of data cube.
 The data cube constructed by accessing/ integrating multiple data bases via MDDB
API which support OLEDB (or) ODBC connections.
 OLAM server may perform multiple data mining tasks. This consisting of integrated
DM modules and these are more sophisticated than OLAP server.

You might also like