Define Data Warehouse. Differentiate Between OLTP and OLAP Databases
Define Data Warehouse. Differentiate Between OLTP and OLAP Databases
Define Data Warehouse. Differentiate Between OLTP and OLAP Databases
DWH is large marketing weapon which can be used to retain users. It is subject oriented,
integrated, time variant and non-volatile.
OLTP are used for online transactional and query processing focus on customers with
current data using ER database design
OLAP is used for analysis purposes for knowledge workers focus on market data
which historical and categorical data using start+subject database design
Star schema is the type of multidimensional model which is used for data warehouse. In
star schema, The fact tables and the dimension tables are contained. In this schema
fewer foreign-key join is used. This schema forms a star with fact table and dimension
tables.
Snowflake Schema is also the type of multidimensional model which is used for data
warehouse. In snowflake schema, The fact tables, dimension tables as well as sub
dimension tables are contained. This schema forms a snowflake with fact tables,
dimension tables as well as sub-dimension tables.
In star schema,
Normalization is not While in this, Both normalization
5. used. and denormalization are used.
10. It has high data redundancy. While it has low data redundancy.
OLAP operations allow user-friendly environment for interactive data analysis of Data
Cube. A number of OLAP operations existed to provide flexibility to view the data in
different perspective by allowing interactive querying & analysis of data.
Ex:- Consider All Electronics Sales. The data cube contains item, time, and location
dimensions. Time is aggregated to Quarters and location is aggregated to city values.
The measure used is dollars-sold. There are 5 popular OLAP operations performed on
data cube.
Slice:- Performs a selection on one dimension of given data cube resulting sub-cube.
Ex:- slice for time = “Q1”
Dice:- Performs a selection on two/more dimensions of given data cube, resulting
sub-cube. Ex:- dice for (location = “Toronto” or “Vancouver”) and (time = “Q1” or
“Q2”)and (item=“Home Entertainment” or “computer”)
Roll up (drill-up): Performs aggregation on data cube either by climbing up a concept
hierarchy for a dimension or dimension reduction. Ex:- roll-up on location (from
cities to countries)
Drill down (roll down): reverse of roll-up, either by stepping down a concept
hierarchy for a dimension or introducing additional dimensions. Ex:- drill-down on
time (from quarters to months)
Pivot:- is a visualization operation that rotates the data axes in view in order to
provide an alternative presentation of data. Ex:- pivot on item and location ( these
axes re rotated)
Drill across: executes queries involving (across) more than one fact table
Drill through: through the bottom level of the cube to its back-end relational tables
(using SQL).
Some OLAP operations also used for ranking items in the list, currency conversions,
growth rates etc.
4. Draw a neat sketch of 3-tier Data Warehouse architecture and explain all the
components in it.
1. The bottom tier is a warehouse database server that is almost always a relational
database system. Back-end tools and utilities are used to feed data into the bottom tier
from operational databases or other external sources (such as customer profile
information provided by external consultants). These tools and utilities perform data
extraction, cleaning, and transformation (e.g., to merge similar data from different
sources into a unified format), as well as load and refresh functions to update the data
warehouse. The data are extracted using application program interfaces known as
gateways. A gateway is supported by the underlying DBMS and allows client
programs to generate SQL code to be executed at a server. Examples of gateways
include ODBC and OLEDB.
2. Middle Tier :- It is an OLAP server presents multidimensional data from DWH/Data
Marts. It includes ROLAP/MOLAP/HOLAP servers:
i) ROLAP: - Use relational DBMS to store and manage warehouse data and include
optimization of DBMS backend, implementation of aggregation navigation logic, and
additional tools and services. Greater Scalability. Ex:- Informix, Informatica
ii) MOLAP: - It is a special purpose server that directly implements multidimensional
model and operations with multidimensional storage engine. It allows fast indexing to
precomputed, summarized data. It uses sparse matrix compression techniques to store
data Ex: - Essbase of Arbor
iii) HOLAP: - The hybrid OLAP servers combines both ROLAP & MOLAP. It uses the
greater scalability of ROLAP and fast computation of MOLAP. Ex: - MS-SQL server
7.0
3. The top tier is a front-end client layer, which contains query and reporting tools,
analysis tools, and/or data mining tools (e.g., trend analysis, prediction, and so on).
4. Data extraction get data from multiple, heterogeneous, and external sources
5. Data cleaning detect errors in the data and rectify them when possible
6. Data transformation convert data from legacy or host format to warehouse format
Load sort, summarize, consolidate, compute views, check integrity, and build
indices and partitions Refresh propagate the updates from the data sources to the
warehouse
7. Meta data is the data defining warehouse objects. It stores: Description of the
structure of the data warehouse schema, view, dimensions, hierarchies, derived data
definition, data mart locations and contents Operational meta-data data lineage
(history of migrated data and transformation path), currency of data (active, archived,
or purged), monitoring information (warehouse usage statistics, error reports, audit
trails) The algorithms used for summarization The mapping from operational
environment to the data warehouse Data related to system performance warehouse
schema, view and derived data definitions Business data business terms and
definitions, ownership of data, charging policies
5. What is materialization? What are different types of materialization?
Typically, data flows from one or more online transaction processing (OLTP) database
into a data warehouse on a monthly, weekly, or daily basis. The data is normally
processed in a staging file before being added to the data warehouse. Data
warehouses commonly range in size from tens of gigabytes to a few terabytes.
Usually, the vast majority of the data is stored in a few very large fact tables.
The summaries or aggregates that are referred to in this book and in literature on data
warehousing are created in Oracle Database using a schema object called
a materialized view. Materialized views can perform a number of roles, such as
improving query performance or providing replicated data. The process of setting up
a materialized view is sometimes called materialization.
There are three choices for data cube materialization given a base cuboid:
OLAM server performs Analytical Mining like how OLAP performs Analytical
Processing. An integrated OLAM & OLAP Architecture is shown below.
Both OLAM and OLAP servers accept user queries via GUI API and work with data
cube via cube API.
A meta data directory used to guide the access of data cube.
The data cube constructed by accessing/ integrating multiple data bases via MDDB
API which support OLEDB (or) ODBC connections.
OLAM server may perform multiple data mining tasks. This consisting of integrated
DM modules and these are more sophisticated than OLAP server.