Ba Unit 2
Ba Unit 2
A data warehouse is a centralized repository for storing and managing large amounts of data
from various sources for analysis and reporting. It is optimized for fast querying and analysis,
enabling organizations to make informed decisions by providing a single source of truth for
data. Data warehousing typically involves transforming and integrating data from multiple
sources into a unified, organized, and consistent format.
The main job of a data warehouse is to synthesize the high amount of data produced by all of
a business’s systems into one accessible location. In a data warehouse, information flows in
continuously while analysts review it. This makes it possible for businesses to create reports
and dashboards that continuously monitor and improve business functions.
• Independent data marts act as a standalone system that doesn't rely on a data
warehouse. Analysts can extract data on a particular subject or business process 2from
internal or external data sources, process it, and then store it in a data mart repository
until the team needs it.
• Hybrid data marts combine data from existing data warehouses and other
operational sources. This unified approach leverages the speed and user-friendly
interface of a top-down approach and also offers the enterprise-level integration of the
independent method.
Snowflake
A snowflake schema is a logical extension of a star schema, building out the blueprint with
additional dimension tables. The dimension tables are normalized to protect data integrity
and minimize data redundancy.
While this method requires less space to store dimension tables, it is a complex structure
that can be difficult to maintain. The main benefit of using snowflake schema is the low
demand for disk space, but the caveat is a negative impact on performance due to the
additional tables.
Vault
Data vault is a modern database modeling technique that enables IT professionals to design
agile enterprise data warehouses. This approach enforces a layered structure and has been
developed specifically to combat issues with agility, flexibility, and scalability that arise
when using the other schema models.
Data vault eliminates star schema's need for cleansing and streamlines the addition of new
data sources without any disruption to existing schema.
Types of knowledge
Tacit knowledge: This type of knowledge is typically acquired through experience, and it
is intuitively understood. As a result, it is challenging to articulate and codify, making it
difficult to transfer this information to other individuals. Examples of tacit knowledge can
include language, facial recognition, or leadership skills.
Explicit knowledge: Explicit knowledge is captured within various document types such as
manuals, reports, and guides, allowing organizations to easily share knowledge across
teams. This type of knowledge is perhaps the most well-known and examples of it include
knowledge assets such as databases, white papers, and case studies. This form of knowledge
is important to retain intellectual capital within an organization as well as facilitate
successful knowledge transfer to new employees.
Implicit knowledge is the application of learned (explicit) knowledge. An example of
implicit knowledge includes a webinar about video conferencing software and then trying it
out with clients.
Declarative knowledge is factual information or static principles. An example of
declarative knowledge includes the date your company was founded.
Procedural knowledge is information describing how to do something. An example of
procedural knowledge includes how-to articles about setting up your email on a new device.
2. Marginal Analysis
A popular technique is known as marginal analysis. It involves weighing the costs and
benefits of each option to choose the one that will create the greatest value.
3. Pareto Analysis
Pareto analysis is a decision-making technique that can be used to identify the most
important factors in a given situation. Named after Italian economist Vilfredo Pareto, the
technique is based on the principle that 20% of the causes will produce 80% of the results.
4. Decision Matrix
Finally, the decision matrix is a tool that can be used to compare different options side -by-
side. By using these techniques, individuals can be sure that they are making sound decisions
that will lead to positive outcomes.
Classification of DSS
• Text Oriented DSS − It contains textually represented information that could have a
bearing on decision. It allows documents to be electronically created, revised and
viewed as needed.
• Database Oriented DSS − Database plays a major role here; it contains organized and
highly structured data.
• Spreadsheet Oriented DSS − It contains information in spread sheets that allows
create, view, modify procedural knowledge and also instructs the system to execute
self-contained instructions. The most popular tool is Excel and Lotus 1-2-3.
• Solver Oriented DSS − It is based on a solver, which is an algorithm or procedure
written for performing certain calculations and particular program type.
• Rules Oriented DSS − Procedures are adopted in rules oriented DSS. Export system
is the example.
OLAP
Online analytical processing (OLAP) is software technology you can use to analyze business
data from different points of view. Organizations collect and store data from multiple data
sources, such as websites, applications, smart meters, and internal systems. OLAP combines and
groups this data into categories to provide actionable insights for strategic planning. For
example, a retailer stores data about all the products it sells, such as color, size, cost, and location.
The retailer also collects customer purchase data, such as the name of the items ordered and total
sales value, in a different system. OLAP combines the datasets to answer questions such as which
color products are more popular or how product placement impacts sales.
Types of OLAP
MOLAP
Multidimensional online analytical processing (MOLAP) involves creating a data cube that
represents multidimensional data from a data warehouse. The MOLAP system stores
precalculated data in the hypercube. Data engineers use MOLAP because this type of OLAP
technology provides fast analysis.
ROLAP
Instead of using a data cube, relational online analytical processing (ROLAP) allows data
engineers to perform multidimensional data analysis on a relational database. ROLAP depends
on the data size and it takes higher processing time. ROLAP is suitable for analyzing extensive
and detailed data. However, ROLAP has slow query performance compared to MOLAP.
HOLAP
Hybrid online analytical processing (HOLAP) combines MOLAP and ROLAP to provide the
best of both architectures. HOLAP allows data engineers to quickly retrieve analytical results
from a data cube and extract detailed information from relational databases.
OLAP Architecture
Online analytical processing (OLAP) systems store multidimensional data by representing
information in more than two dimensions, or categories. Two-dimensional data involves
columns and rows, but multidimensional data has multiple characteristics.
Data engineers build a multidimensional OLAP system that consists of the following elements.
Data warehouse
A data warehouse collects information from different sources, including applications, files, and
databases. It processes the information using various tools so that the data is ready for analytical
purposes. For example, the data warehouse might collect information from a relational database
that stores data in tables of rows and columns.
ETL tools
Extract, transform, and load (ETL) tools are database processes that automatically retrieve,
change, and prepare the data to a format fit for analytical purposes. Data warehouses use ETL
to convert and standardize information from various sources before making it available to
OLAP tools.
OLAP server
An OLAP server is the underlying machine that powers the OLAP system. It uses ETL tools
to transform information in the relational databases and prepare them for OLAP operations.
OLAP database
An OLAP database is a separate database that connects to the data warehouse. Data engineers
sometimes use an OLAP database to prevent the data warehouse from being burdened by
OLAP analysis. They also use an OLAP database to make it easier to create OLAP data models.
OLAP cubes
A data cube is a model representing a multidimensional array of information. While it’s easier
to visualize it as a three-dimensional data model, most data cubes have more than three
dimensions. An OLAP cube, or hypercube, is the term for data cubes in an OLAP system.
OLAP cubes are rigid because you can't change the dimensions and underlying data once you
model it. For example, if you add the warehouse dimension to a cube with product, location,
and time dimensions, you have to remodel the entire cube.
OLAP analytic tools
Business analysts use OLAP tools to interact with the OLAP cube. They perform operations
such as slicing, dicing, and pivoting to gain deeper insights into specific information within the
OLAP cube.
OLAP operations
Business analysts perform several basic analytical operations with a multidimensional online analytical
processing (MOLAP) cube.
Roll up
In roll up, the online analytical processing (OLAP) system summarizes the data for specific attributes. In
other words, it shows less-detailed data. For example, you might view product sales according to New York,
California, London, and Tokyo. A roll-up operation would provide a view of the sales data based on
countries, such as the US, the UK, and Japan.
Drill down
Drill down is the opposite of the roll-up operation. Business analysts move downward in the concept
hierarchy and extract the details they require. For example, they can move from viewing sales data by years
to visualizing it by months.
Slice
Data engineers use the slice operation to create a two-dimensional view from the OLAP cube. For example,
a MOLAP cube sorts data according to products, cities, and months. By slicing the cube, data engineers can
create a spreadsheet-like table consisting of products and cities for a specific month.
Dice
Data engineers use the dice operation to create a smaller subcube from an OLAP cube. They determine the
required dimensions and build a smaller cube from the original hypercube.
Pivot
The pivot operation involves rotating the OLAP cube along one of its dimensions to get a different
perspective on the multidimensional data model. For example, a three-dimensional OLAP cube has the
following dimensions on the respective axes:
• X-axis—product
• Y-axis—location
• Z-axis—time
Upon a pivot, the OLAP cube has the following configuration:
• X-axis—location
• Y-axis—time
• Z-axis—product
Benefits of OLAP
Online analytical processing (OLAP) helps organizations process and benefit from a growing amount of
digital information. Some benefits of OLAP include the following.
Faster decision making
Businesses use OLAP to make quick and accurate decisions to remain competitive in a fast-paced economy.
Performing analytical queries on multiple relational databases is time consuming because the computer
system searches through multiple data tables. On the other hand, OLAP systems precalculate and integrate
data so business analysts can generate reports faster when needed.
Non-technical user support
OLAP systems make complex data analysis easier for non-technical business users. Business users can create
complex analytical calculations and generate reports instead of learning how to operate databases.
Integrated data view
OLAP provides a unified platform for marketing, finance, production, and other business units. Managers
and decision makers can see the bigger picture and effectively solve problems. They can perform what-if
analysis, which shows the impact of decisions taken by one department on other areas of the business.