0% found this document useful (0 votes)
2 views

Section+2_Basics+of+Data+Analytics

The document provides an overview of data analytics, detailing its importance in transforming raw data into meaningful insights for decision-making. It covers key concepts such as data preparation, modeling, visualization, and analysis, along with various analytics techniques including descriptive, diagnostic, predictive, prescriptive, and cognitive analytics. Additionally, it outlines the roles of data professionals like data analysts, data engineers, and data scientists in the data analysis process.

Uploaded by

Nam Lương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Section+2_Basics+of+Data+Analytics

The document provides an overview of data analytics, detailing its importance in transforming raw data into meaningful insights for decision-making. It covers key concepts such as data preparation, modeling, visualization, and analysis, along with various analytics techniques including descriptive, diagnostic, predictive, prescriptive, and cognitive analytics. Additionally, it outlines the roles of data professionals like data analysts, data engineers, and data scientists in the data analysis process.

Uploaded by

Nam Lương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Basics of Data Analytics

Parwaaz Solutions
PS _ Basics of Data Analytics

Contents
1. Data Analytics Basics .................................................................................................................................................... 2
What Is Data Analysis? ............................................................................................................................................ 2
Overview of data analysis ....................................................................................................................................... 4
Roles in data ............................................................................................................................................................ 6
Data Analytics Techniques....................................................................................................................................... 8
Exploratory Data Analysis (EDA).............................................................................................................................. 8
What is Data Mining? .............................................................................................................................................. 9
What is Data Profiling? ............................................................................................................................................ 9
2. Data Integration ........................................................................................................................................................... 9
What Is Data Integration? ....................................................................................................................................... 9
ETL Vs ELT .............................................................................................................................................................. 10
Database ............................................................................................................................................................... 10
Object Storage....................................................................................................................................................... 10
Data Lake ............................................................................................................................................................... 10
OLTP Vs OLAP ........................................................................................................................................................ 10
3. Data Modelling and Data Preparation ....................................................................................................................... 10
Data Warehouse.................................................................................................................................................... 10
Business Intelligent ............................................................................................................................................... 11
KPI ......................................................................................................................................................................... 11
Master Data Vs Transactional Data ....................................................................................................................... 11
Star Schema Vs Snowflake Schema ....................................................................................................................... 11
Normalization vs. denormalization ....................................................................................................................... 12
Elements of Data Model: ...................................................................................................................................... 13
What Is Data Blending? ......................................................................................................................................... 14
Data Blending Vs Data Joining ............................................................................................................................... 14
Data Model Vs Data Sets ....................................................................................................................................... 14
Data Preparation: .................................................................................................................................................. 15
4. Reporting ................................................................................................................................................................... 16
Types of Reports.................................................................................................................................................... 16
Reporting Elements: .............................................................................................................................................. 17
Type of Visualisation: ............................................................................................................................................ 20
Data Calculations:.................................................................................................................................................. 23
Best Practices for Visualisation: ............................................................................................................................ 26
Colour Standards: .................................................................................................................................................. 27

Page 1 of 28
PS _ Basics of Data Analytics

1. Data Analytics Basics


What Is Data Analysis?

Data analysts are essential in helping organizations gain valuable insights into the expanse of data that
they have, and they work closely with others in the organization to help reveal valuable information.

The following figure shows the five key areas that you'll engage in during the data analysis process.

Prepare

Before a report can be created, data must be prepared. Data preparation is the process of profiling,
cleaning, and transforming your data to get it ready to model and visualize.

Data preparation is the process of taking raw data and turning it into information that is trusted and
understandable. It involves, among other things, ensuring the integrity of the data, correcting wrong
or inaccurate data, identifying missing data, converting data from one structure to another or from one
type to another, or even a task as simple as making data more readable.

Data preparation also involves understanding how you're going to get and connect to the data and the
performance implications of the decisions. When connecting to data, you need to make decisions to
ensure that models and reports meet, and perform to, acknowledged requirements and expectations.

Model

When the data is in a proper state, it's ready to be modelled. Data modeling is the process of
determining how your tables are related to each other. This process is done by defining and creating
relationships between the tables. From that point, you can enhance the model by defining metrics and
adding custom calculations to enrich your data.

An effective data model makes reports more accurate, allows the data to be explored faster and
efficiently, decreases time for the report writing process, and simplifies future report maintenance.

The model is another critical component that has a direct effect on the performance of your report and
overall data analysis. A poorly designed model can have a drastically negative impact on the general
accuracy and performance of your report. Conversely, a well-designed model with well-prepared data

Page 2 of 28
PS _ Basics of Data Analytics

will ensure a properly efficient and trusted report. This notion is more prevalent when you are working
with data at scale.

From a Power BI perspective, if your report is performing slowly, or your refreshes are taking a long
time, you will likely need to revisit the data preparation and modeling tasks to optimize your report.

The process of preparing data and modeling data is an iterative process. Data preparation is the first
task in data analysis. Understanding and preparing your data before you model it will make the
modeling step much easier.

Visualize

The visualization task is where you get to bring your data to life. A well-designed report should tell a
compelling story about that data, which will enable business decision makers to quickly gain needed
insights. By using appropriate visualizations and interactions, you can provide an effective report that
guides the reader through the content quickly and efficiently, therefore allowing the reader to follow
a narrative into the data.

With the built-in AI capabilities in Power BI, data analysts can build powerful reports, without writing
any code, that enable users to get insights and answers and find actionable objectives. The AI
capabilities in Power BI, such as the built-in AI visuals, enable the discovering of data by asking
questions, using the Quick Insights feature, or creating machine learning models directly within Power
BI.

An important aspect of visualizing data is designing and creating reports for accessibility. As you build
reports, it is important to think about people who will be accessing and reading the reports. Reports
should be designed with accessibility in mind from the outset so that no special modifications are
needed in the future.

Many components of your report will help with storytelling. From a color scheme that is
complementary and accessible, to fonts and sizing, to picking the right visuals for what is being
displayed, they all come together to tell that story.

Analyze

The analyze task is the important step of understanding and interpreting the information that is
displayed on the report. In your role as a data analyst, you should understand the analytical capabilities
of Power BI and use those capabilities to find insights, identify patterns and trends, predict outcomes,
and then communicate those insights in a way that everyone can understand.

Advanced analytics enables businesses and organizations to ultimately drive better decisions
throughout the business and create actionable insights and meaningful results. With advanced
analytics, organizations can drill into the data to predict future patterns and trends, identify activities
and behaviours, and enable businesses to ask the appropriate questions about their data.

Page 3 of 28
PS _ Basics of Data Analytics

This feature is another area where AI integrations within Power BI can take your analysis to the next
level. Integrations with Azure Machine Learning, cognitive services, and built-in AI visuals will help to
enrich your data and analysis.

Manage

Power BI consists of many components, including reports, dashboards, workspaces, datasets, and
more. As a data analyst, you are responsible for the management of these Power BI assets, overseeing
the sharing and distribution of items, such as reports and dashboards, and ensuring the security of
Power BI assets.

The management of your content helps to foster collaboration between teams and individuals. Sharing
and discovery of your content is important for the right people to get the answers that they need. It is
also important to help ensure that items are secure. You want to make sure that the right people have
access and that you are not leaking data past the correct stakeholders.

Proper management can also help reduce data silos within your organization. Data duplication can
make managing and introducing data latency difficult when resources are overused. Power BI helps
reduce data silos with the use of shared datasets, and it allows you to reuse data that you have
prepared and modelled. For key business data, endorsing a dataset as certified can help to ensure trust
in that data.

Overview of data analysis


Before data can be used to tell a story, it must be run through a process that makes it usable in the story.

Data analysis is the process of identifying, cleaning, transforming, and modelling data to discover meaningful
and useful information. The data is then crafted into a story through reports for analysis to support the critical
decision-making process.

Data-driven businesses make decisions based on the story that their data tells, and in today's data-driven world,
data is not being used to its full potential, a challenge that most businesses face. Data analysis is, and should be,
a critical aspect of all organizations to help determine the impact to their business, including evaluating customer
sentiment, performing market and product research, and identifying trends or other data insights.

While the process of data analysis focuses on the tasks of cleaning, modeling, and visualizing data, the concept
of data analysis and its importance to business should not be understated. To analyze data, core components of
analytics are divided into the following categories:

Descriptive

Diagnostic

Predictive

Prescriptive

Cognitive

Page 4 of 28
PS _ Basics of Data Analytics

Descriptive analytics

Descriptive analytics help answer questions about what has happened based on historical data. Descriptive
analytics techniques summarize large datasets to describe outcomes to stakeholders.

By developing key performance indicators (KPIs), these strategies can help track the success or failure of key
objectives. Metrics such as return on investment (ROI) are used in many industries, and specialized metrics are
developed to track performance in specific industries.

An example of descriptive analytics is generating reports to provide a view of an organization's sales and financial
data.

Diagnostic analytics

Diagnostic analytics help answer questions about why events happened. Diagnostic analytics techniques
supplement basic descriptive analytics, and they use the findings from descriptive analytics to discover the cause
of these events. Then, performance indicators are further investigated to discover why these events improved
or became worse. Generally, this process occurs in three steps:

Identify anomalies in the data. These anomalies might be unexpected changes in a metric or a particular market.

Collect data that's related to these anomalies.

Use statistical techniques to discover relationships and trends that explain these anomalies.

Predictive analytics

Predictive analytics help answer questions about what will happen in the future. Predictive analytics techniques
use historical data to identify trends and determine if they're likely to recur. Predictive analytical tools provide
valuable insight into what might happen in the future. Techniques include a variety of statistical and machine
learning techniques such as neural networks, decision trees, and regression.

Prescriptive analytics

Prescriptive analytics help answer questions about which actions should be taken to achieve a goal or target. By
using insights from prescriptive analytics, organizations can make data-driven decisions. This technique allows
businesses to make informed decisions in the face of uncertainty. Prescriptive analytics techniques rely on
machine learning as one of the strategies to find patterns in large datasets. By analysing past decisions and
events, organizations can estimate the likelihood of different outcomes.

Cognitive analytics

Cognitive analytics attempt to draw inferences from existing data and patterns, derive conclusions based on
existing knowledge bases, and then add these findings back into the knowledge base for future inferences, a self-
learning feedback loop. Cognitive analytics help you learn what might happen if circumstances change and
determine how you might handle these situations.

Example

By enabling reporting and data visualizations, a retail business uses descriptive analytics to look at patterns of
purchases from previous years to determine what products might be popular next year. The company might also
look at supporting data to understand why a particular product was popular and if that trend is continuing, which
will help them determine whether to continue stocking that product.

Page 5 of 28
PS _ Basics of Data Analytics

A business might determine that a certain product was popular over a specific timeframe. Then, they can use
this analysis to determine whether certain marketing efforts or online social activities contributed to the sales
increase.

An underlying facet of data analysis is that a business needs to trust its data. As a practice, the data analysis
process will capture data from trusted sources and shape it into something that is consumable, meaningful, and
easily understood to help with the decision-making process. Data analysis enables businesses to fully understand
their data through data-driven processes and decisions, allowing them to be confident in their decisions.

As the amount of data grows, so does the need for data analysts. A data analyst knows how to organize
information and distill it into something relevant and comprehensible. A data analyst knows how to gather the
right data and what to do with it, in other words, making sense of the data in your data overload.

Roles in data

Telling a story with the data is a journey that usually doesn't start with you. The data must come from
somewhere. Getting that data into a place that is usable by you takes effort that is likely out of your scope,
especially in consideration of the enterprise.

In the recent past, roles such as business analysts and business intelligence developers were the standard for
data processing and understanding. However, excessive expansion of the size and different types of data has
caused these roles to evolve into more specialized sets of skills that modernize and streamline the processes of
data engineering and analysis.

The following sections highlight these different roles in data and the specific responsibility in the overall
spectrum of data discovery and understanding:

Business analyst

While some similarities exist between a data analyst and business analyst, the key differentiator between the
two roles is what they do with data. A business analyst is closer to the business and is a specialist in interpreting
the data that comes from the visualization. Often, the roles of data analyst and business analyst could be the
responsibility of a single person.

Data analyst

A data analyst enables businesses to maximize the value of their data assets through visualization and reporting
tools such as Microsoft Power BI. Data analysts are responsible for profiling, cleaning, and transforming data.
Their responsibilities also include designing and building scalable and effective data models, and enabling and
implementing the advanced analytics capabilities into reports for analysis. A data analyst works with the
pertinent stakeholders to identify appropriate and necessary data and reporting requirements, and then they
are tasked with turning raw data into relevant and meaningful insights.

Page 6 of 28
PS _ Basics of Data Analytics

A data analyst is also responsible for the management of Power BI assets, including reports, dashboards,
workspaces, and the underlying datasets that are used in the reports. They are tasked with implementing and
configuring proper security procedures, in conjunction with stakeholder requirements, to ensure the
safekeeping of all Power BI assets and their data.

Data engineer

Data engineers provision and set up data platform technologies that are on-premises and in the cloud. They
manage and secure the flow of structured and unstructured data from multiple sources. The data platforms that
they use can include relational databases, nonrelational databases, data streams, and file stores. Data engineers
also ensure that data services securely and seamlessly integrate across data platforms.

Primary responsibilities of data engineers include the use of on-premises and cloud data services and tools to
ingest, egress, and transform data from multiple sources. Data engineers collaborate with business stakeholders
to identify and meet data requirements. They design and implement solutions.

While some alignment might exist in the tasks and responsibilities of a data engineer and a database
administrator, a data engineer's scope of work goes well beyond looking after a database and the server where
it's hosted and likely doesn't include the overall operational data management.

As a data analyst, you would work closely with a data engineer in making sure that you can access the variety of
structured and unstructured data sources because they will support you in optimizing data models, which are
typically served from a modern data warehouse or data lake.

Data scientist

Data scientists perform advanced analytics to extract value from data. Their work can vary from descriptive
analytics to predictive analytics. Descriptive analytics evaluate data through a process known as exploratory data
analysis (EDA). Predictive analytics are used in machine learning to apply modeling techniques that can detect
anomalies or patterns. These analytics are important parts of forecast models.

Descriptive and predictive analytics are only partial aspects of data scientists' work. Some data scientists might
work in the realm of deep learning, performing iterative experiments to solve a complex data problem by using
customized algorithms.

Anecdotal evidence suggests that most of the work in a data science project is spent on data wrangling and
feature engineering. Data scientists can speed up the experimentation process when data engineers use their
skills to successfully wrangle data.

On the surface, it might seem that a data scientist and data analyst are far apart in the work that they do, but
this conjecture is untrue. A data scientist looks at data to determine the questions that need answers and will
often devise a hypothesis or an experiment and then turn to the data analyst to assist with the data visualization
and reporting.

Database administrator

A database administrator is responsible for the overall availability and consistent performance and optimizations
of the database solutions. They work with stakeholders to identify and implement the policies, tools, and
processes for data backup and recovery plans.

Page 7 of 28
PS _ Basics of Data Analytics

The role of a database administrator is different from the role of a data engineer. A database administrator
monitors and manages the overall health of a database and the hardware that it resides on, whereas a data
engineer is involved in the process of data wrangling, in other words, ingesting, transforming, validating, and
cleaning data to meet business needs and requirements.

Data Analytics Techniques


There are several different analytical methods and techniques data analysts can use to process data and extract
information. Some of the most popular methods are listed below.
Regression analysis entails analysing the relationship between dependent variables to determine how a change
in one may affect the change in another.
Factor analysis entails taking a large data set and shrinking it to a smaller data set. The goal of this manoeuvre
is to attempt to discover hidden trends that would otherwise have been more difficult to see.
Cohort analysis is the process of breaking a data set into groups of similar data, often broken into a customer
demographic. This allows data analysts and other users of data analytics to further dive into the numbers relating
to a specific subset of data.
Monte Carlo simulations model the probability of different outcomes happening. Often used for risk mitigation
and loss prevention, these simulations incorporate multiple values and variables and often have greater
forecasting capabilities than other data analytics approaches.
Time series analysis tracks data over time and solidifies the relationship between the value of a data point and
the occurrence of the data point. This data analysis technique is usually used to spot cyclical trends or to project
financial forecasts.

Exploratory Data Analysis (EDA)


Let’s suppose we want to make a data science project on the employee churn rate of a company. But before we
make a model on this data we have to analyze all the information which is present across the dataset like as what
is the salary distribution of employees, what is the bonus they are getting, what is their starting time, and the
assigned team. These all steps of analyzing and modifying the data come under EDA.
Exploratory Data Analysis (EDA) is an approach that is used to analyze the data and discover trends, patterns, or
check assumptions in data with the help of statistical summaries and graphical representations.
Types of EDA
Depending on the number of columns we are analysing we can divide EDA into two types.
Univariate Analysis – In univariate analysis, we analyse or deal with only one variable at a time. The analysis of
univariate data is thus the simplest form of analysis since the information deals with only one quantity that
changes. It does not deal with causes or relationships and the main purpose of the analysis is to describe the
data and find patterns that exist within it.
Bi-Variate analysis – This type of data involves two different variables. The analysis of this type of data deals
with causes and relationships and the analysis is done to find out the relationship between the two variables.
Multivariate Analysis – When the data involves three or more variables, it is categorized under multivariate.
Depending on the type of analysis we can also subcategorize EDA into two parts.
Non-graphical Analysis – In non-graphical analysis, we analyse data using statistical tools like mean median or
mode or skewness
Graphical Analysis – In graphical analysis, we use visualizations charts to visualize trends and patterns in the data

Page 8 of 28
PS _ Basics of Data Analytics

What is Data Mining?


Data Mining is a process of identifying patterns and correlations within large datasets to derive more useful bits
of knowledge. These meaningful bits of knowledge can then be fed into the more general areas of Business
Intelligence. The need to understand the large, complex datasets is common to virtually all fields of business,
science and engineering. The whole process of applying computer-based methodologies, including new
technologies, to extract useful information hidden in the data is called data mining. It simply evaluates a large
collection of raw data and turns them into information. Data mining is a search for new, valuable and non-trivial
knowledge in large datasets and then using the information to uncover relationships and hidden patterns in
those datasets. Simply put, data mining is knowledge mining from data.

What is Data Profiling?


Data Profiling is a process of analysing raw data from existing datasets for the purpose of collecting statistics or
informative summaries about the data. It refers to a set of activities designed to determine the metadata of a
given dataset when it is not available and to validate metadata when it is available within a dataset. These
metadata, such as statistics about the data or dependencies among columns, can help understand and manage
new datasets. Some data profiling can be applied to just any data type, while some are type-specific. This is very
different from data analysis which is rather used to derive business information from data. Data profiling is used
to derive information about the data itself and assess the quality of the data in order to discover anomalies in
the dataset. Moreover, it helps understand and prepare data for subsequent cleansing, integration, and analysis.

2. Data Integration
What Is Data Integration?
Data integration combines various types and formats of data from any source across an organization into a data
lake or data warehouse to provide a unified fact base for analytics. Working from this one data set allows
businesses to make better decisions, aligns departments to work better together, and drives better customer
experience.

What is meant by data integration?

Data integration means consolidating data from multiple sources into a single dataset to be used for consistent
business intelligence or analytics.

This is a very simple explanation for a complex topic that has evolved over its 30 year history. Understanding
how data integration has transitioned from a backend, retrospective process into core real-time infrastructure
starts with an examination of how data integration works.

How Does Data Integration Work?

Page 9 of 28
PS _ Basics of Data Analytics

To move data from one system to another requires a data pipeline that understands the structure and meaning
of the data as well as defines the path it will take through the technical systems. A relatively simple and common
type of data integration is data ingestion, where data from one system is integrated on a regular basis into
another system. Data integration may also include cleansing, sorting, enrichment and other processes to make
the data ready for use at its final destination.

ETL Vs ELT
Sometimes this happens before the data is stored and the process is called ETL (extract, transform, load). Other
times it makes more sense to store the data first, then prepare it for use known as ELT (extract, load, transform).
And in yet other cases, data is transformed and conformed where it is stored without moving it per se.

The steps, in part, depend on how the data will be stored. The most common types of data storage are:

ETL Tools: Informitica, Talend, MS Azure, IBM InfoSphere DataStage, Oracle Data Integrator,

Database
The simplest and most familiar way to store data includes both relational databases and NoSQL data stores and
may not require data transformation at all.

Object Storage
Stores large amounts of unstructured data, such as sensor data, audio and video files, photos, etc., in their native
format in simple, self-contained repositories that include the data, metadata, and a unique ID number. The
metadata and ID number allow applications to locate and access the data.

Data Lake
Collects raw and unstructured data in a single storage system, often object storage, to be transformed and used
later. Data lakes hold vast amounts of a wide variety of data types and make processing big data and applying
machine learning and AI possible.

OLTP Vs OLAP

Category OLAP (Online Analytical Processing) OLTP (Online Transaction Processing)


Definition It is well-known as an online database It is well-known as an online database modifying
query management system. system.
Data source Consists of historical data from various Consists of only operational current data.
Databases.
Method It makes use of a data warehouse. It makes use of a standard database management
used system (DBMS).
Application It is subject-oriented. Used for Data It is application-oriented. Used for business tasks.
Mining, Analytics, Decisions making,
etc.

3. Data Modelling and Data Preparation


Data Warehouse
A data warehouse, or enterprise data warehouse (EDW), is a system that aggregates data from different sources
into a single, central, consistent data store to support data analysis, data mining, artificial intelligence (AI), and
machine learning. A data warehouse system enables an organization to run powerful analytics on huge volumes
(petabytes and petabytes) of historical data in ways that a standard database cannot.

Page 10 of 28
PS _ Basics of Data Analytics

Business Intelligent
Business intelligence combines business analytics, data mining, data visualization, data tools and infrastructure,
and best practices to help organizations make more data-driven decisions.

The overall objective of business intelligence is to allow a business to make informed decisions.

KPI
Key performance indicators (KPIs) refer to a set of quantifiable measurements used to gauge a company’s overall
long-term performance. KPIs specifically help determine a company’s strategic, financial, and operational
achievements, especially compared to those of other businesses within the same sector.

Example: “targeted new customers per month”.

Master Data Vs Transactional Data

Star Schema Vs Snowflake Schema


Star schema is the type of multidimensional model which is used for data warehouse. In star schema, The fact
tables and the dimension tables are contained. In this schema fewer foreign-key join is used. This schema forms
a star with fact table and dimension tables.

Page 11 of 28
PS _ Basics of Data Analytics

Snowflake Schema: Snowflake Schema is also the


type of multidimensional model which is used for
data warehouse. In snowflake schema, The fact
tables, dimension tables as well as sub dimension
tables are contained. This schema forms a snowflake
with fact tables, dimension tables as well as sub-
dimension tables.

Normalization vs. denormalization

To understand some star schema concepts described in this article, it's important to know two terms:
normalization and denormalization.

Normalization is the term used to describe data that's stored in a way that reduces repetitious data. Consider a
table of products that has a unique key value column, like the product key, and additional columns describing
product characteristics, including product name, category, color, and size. A sales table is considered
normalized when it stores only keys, like the product key. In the following image, notice that only
the ProductKey column records the product.

If, however, the sales table stores product details beyond the key, it's considered denormalized. In the
following image, notice that the ProductKey and other product-related columns record the product.

Page 12 of 28
PS _ Basics of Data Analytics

When you source data from an export file or data extract, it's likely that it represents a denormalized set of
data. In this case, use Power Query to transform and shape the source data into multiple normalized tables.

As described in this article, you should strive to develop optimized Power BI data models with tables that
represent normalized fact and dimension data. However, there's one exception where a snowflake
dimension should be denormalized to produce a single model table.

Elements of Data Model:


Fact (Measures/Key Figure)
Facts are the measurements/metrics or facts from your business process. For a Sales business process, a
measurement would be quarterly sales number
Dimension
Dimension provides the context surrounding a business process event. In simple terms, they give who, what,
where of a fact. In the Sales business process, for the fact quarterly sales number, dimensions would be
Who – Customer Names
Where – Location
What – Product Name
In other words, a dimension is a window to view information in the facts.
Attributes
The Attributes are the various characteristics of the dimension in dimensional data modelling.
In the Location dimension, the attributes can be
State
Country
Zipcode etc.
Attributes are used to search, filter, or classify facts. Dimension Tables contain Attributes
Fact Table
A fact table is a primary table in dimension modelling.
A Fact Table contains

Page 13 of 28
PS _ Basics of Data Analytics

• Measurements/facts
• Foreign key to dimension table

What Is Data Blending?


Data blending is the process of combining data from multiple sources to create an actionable analytic dataset
for business decision-making or for driving a specific business process. This process allows organizations to
obtain value from a variety of sources and create deeper analyses.

Data blending differs from data integration and data warehousing in that its primary use is not to create a single
version of the truth that’s stored in data warehouses or other systems of record within an organization. Instead,
this process is conducted by a business or data analyst with the goal of building an analytic dataset to help
answer specific business questions.

Merge

Data merging is the process of combining two or more similar


records into a single one. Merging is done to add variables to a
dataset, append or add cases or observations to a dataset, or
remove duplicates and other incorrect information.

Merging data into a single point is necessary in certain situations,


especially when an organization needs to add new cases,
variables, or data based on the lookup values

Append

A data append is a process that involves adding new data elements to an existing database. An example of a
common data append would be the enhancement of a company’s customer files. Companies often collect basic
information on their clients such as phone numbers, emails, or addresses. A data append takes the information
they have, matches it against a larger database of business data, allowing the desired missing data fields to be
added.

Data Blending Vs Data Joining


Data Blending: Combining the data from two or more different sources is Data Blending. We can combine data
between two more variety of sources such as Oracle, Excel, SQL Server and others.

Example: Combining the Oracle Table with SQL Server Table (OR) Combining Two Tables from Oracle Database
(OR) Combining Two Sheets from Excel (OR) Combining Excel Sheet and Oracle Table and so on.

Data Joining: Combining the data between two or more tables or sheets within the same Data Source is Data
Joining.

Example: Combining Two Tables from the same Oracle Database (or) DB2 (or) Sybase (or) SQL Server and others.
Combining two or more worksheets from the same Excel file.

In Data Joining, all the combined tables or sheets contains common set of Dimensions and Measures.

Data Model Vs Data Sets


Data Models

Page 14 of 28
PS _ Basics of Data Analytics

A data model is a representation of the business data of an organization or business segment. Located on top of
the physical table. You can use a model as the basis for your story.

There are following types of data models:

Acquired: Data is imported (copied) and stored in Cloud. Changes made to the data in the source system don’t
affect the imported data.

Live or Federated: Data is stored in the source system. It isn’t copied to Cloud, so any changes in the source data
are available immediately if no structural changes are brought to the table or SQL view.

Datasets

A dataset is a simple collection of data, usually presented in a table. You can use a dataset as the basis for your
story, and as a data source.

Data Models complement datasets or vice versa. Datasets are more suitable for ad-hoc analysis, while data
models are more suitable for governed-data use cases.

Data Preparation:
Data preparation, also known as data preprocessing, is a crucial step in the data analysis and machine learning
pipeline. It involves cleaning, transforming, and organizing raw data into a format that is suitable for analysis or
machine learning model training. Data preparation is essential because real-world data is often messy,
incomplete, and unstructured, and it may contain errors, outliers, or irrelevant information.
Data Cleansing
Data cleaning is the process that removes data that does not belong in your dataset. Data transformation is the
process of converting data from one format or structure into another. Transformation processes can also be
referred to as data wrangling, or data munging, transforming and mapping data from one "raw" data form into
another format for warehousing and analyzing. This article focuses on the processes of cleaning that data.
• Remove unnecessary values.
• Remove duplicate data.
• Avoid typos.
• Convert data types.
• Search for missing values.
• Use a clear format.
• Translate language.
• Remove unwanted outliers.
Data Wrangling: (Concatenate, split, extract, replace, change)

Data wrangling is the process of converting raw data into a usable form. It may also be called data munging or
data remediation.

You'll typically go through the data wrangling process prior to conducting any data analysis in order to ensure
your data is reliable and complete. This way, you can be confident that the insights you draw are accurate and
valuable.

Data wrangling steps

There are four broad steps in the munging process:

• Discovery
• Transformation

Page 15 of 28
PS _ Basics of Data Analytics

• Validation
• Publishing

Data Validation

Data validation is the practice of checking the integrity, accuracy and structure of data before it is used for a
business operation. Data validation operation results can provide data used for data analytics, business
intelligence or training a machine learning model. It can also be used to ensure the integrity of data for financial
accounting or regulatory compliance.

Data can be examined as part of a validation process in a variety of ways, including data type, constraint,
structured, consistency and code validation. Each type of data validation is designed to make sure the data meets
the requirements to be useful.

4. Reporting
Types of Reports
Dashboard:

A dashboard is a visual representation of data that provides a consolidated view of key information and metrics.
It is typically presented in the form of charts, graphs, tables, and other visual elements to help users quickly
understand and analyse complex data sets.

Story:

Story is a presentation-style report that uses charts, visualizations, text, images, and pictograms to describe data.

Application:

Reports:

A report is a document that presents information in an organized format for a specific audience and purpose.
Although summaries of reports may be delivered orally, complete reports are almost always in the form of
written documents

Charts:

A chart is a visual representation of data or information that helps in presenting it in a clear, organized, and easily
understandable manner. It is a graphical representation that allows for quick comprehension of complex data
sets, patterns, trends, and relationships.

Page 16 of 28
PS _ Basics of Data Analytics

Charts are commonly used in various fields, including business, finance, economics, science, and statistics. They
serve the purpose of conveying information visually, making it easier for people to grasp the data and draw
insights from it.

There are several types of charts, each suitable for representing different types of data and serving different
purposes. Some commonly used charts include:

Reporting Elements:
Restricted Measures:

As the name implies, restricted measures work by restricting measure values to only show certain data.

Example: Total sales for Mumbai City

In this example sales value is restricted for only Mumbai City. In this care one measure value is restricted for one
or more dimensions.

Calculated Measures:

A calculated measure is based on other measures. For example, one measure might be defined as a second
measure divided by a third measure.

For example, suppose member A refers to 150 records in the fact table, and member B refers to 300 records in
the fact table. Suppose that you create a member C that aggregates A and B together. Then member C refers to
the relevant 450 records in the fact table.

Conditional Formatting: (Exception/Alerts)

Conditional formatting can help make patterns and trends


in your data more apparent. To use it, you create rules that
determine the format of cells based on their values, such
as the following monthly temperature data with cell
colours tied to cell values.

Aggregation:

In the context of data, aggregation refers to the process of combining and summarizing individual data points
into a single representation or summary. It involves gathering data from multiple sources or multiple instances
and applying a specific operation or function to derive meaningful insights or metrics.

Aggregation is commonly used in data analysis and reporting to condense large volumes of data into more
manageable and understandable forms. It helps to identify patterns, trends, and overall characteristics of the
data. By aggregating data, analysts can extract higher-level information that provides a broader perspective and
facilitates decision-making.

The choice of aggregation method depends on the nature of the data and the desired outcome. Some common
aggregation operations include:

Summation: Adding together numeric values, such as calculating the total sales for a given period.

Page 17 of 28
PS _ Basics of Data Analytics

Counting: Determining the number of occurrences or items in a dataset, like counting the number of customers
or transactions.

Averaging: Calculating the mean or average value of a set of data points, useful for finding the average score or
rating.

Minimum and maximum: Finding the smallest or largest value in a dataset, which can be useful for identifying
outliers or extremes.

Grouping: Organizing data into categories or groups based on certain criteria, such as grouping sales by product
category or customer segment.

Aggregation can be performed at various levels of granularity, such as at the individual level, by time intervals
(e.g., hourly, daily, monthly), or by specific dimensions or attributes (e.g., by region, by product). The choice of
aggregation level depends on the analysis objectives and the level of detail required.

Overall, aggregation plays a crucial role in summarizing and extracting meaningful insights from large datasets,
enabling efficient analysis and decision-making processes.

Link Analysis:

Link analysis in data refers to the process of examining and analysing relationships or connections between
various entities or elements within a dataset. It focuses on understanding the links, associations, or
dependencies among different data points to uncover patterns, structures, and insights that may not be apparent
from the individual data elements alone.

Link analysis is commonly used in fields such as network analysis, social network analysis, fraud detection,
cybersecurity, and recommendation systems. It helps to reveal the underlying relationships and interactions
between entities, allowing for a deeper understanding of complex systems and phenomena.

Ranking:

Ranking in data analysis refers to the process of ordering or sorting data elements based on a specific criterion
or measure of importance. It assigns a numerical or ordinal position to each data point, indicating its relative
standing or value within a dataset.

Ranking is commonly used to identify the top or bottom performers, prioritize items, or understand the relative
importance or significance of data elements. It can be applied to various types of data, including numeric values,
categorical variables, or qualitative attributes.

Here are a few key points about ranking in data analysis:

Criteria for Ranking: Ranking is based on a specific criterion or metric that defines the basis for ordering the data
elements. The choice of criterion depends on the analysis objectives and the nature of the data. For example, in
sales data, ranking may be based on revenue generated, while in a survey dataset, it could be based on customer
satisfaction scores.

Ascending or Descending Order: Rankings can be assigned in ascending or descending order. Ascending order
ranks data elements from the smallest or least significant to the largest or most significant, while descending
order ranks them in the opposite direction.

Ties or Rank Equivalents: Ties occur when multiple data elements have the same value for the ranking criterion.
In such cases, various methods can be used to handle ties, including assigning the same rank to tied elements or
using tie-breaking rules to differentiate between them.

Page 18 of 28
PS _ Basics of Data Analytics

Rank Numbering: Each data element is assigned a unique rank number based on its position in the ranking order.
For example, the top-ranked element may have a rank of 1, the second-ranked element may have a rank of 2,
and so on.

Rank-Based Analysis: Once the data elements are ranked, further analysis can be performed based on their ranks.
This may include identifying top performers, outliers, trends across ranks, or segmenting the data based on rank
ranges.

Ranking can be applied in various scenarios, such as ranking products based on sales performance, ranking
employees based on performance evaluations, ranking search results based on relevance, or ranking stocks
based on market capitalization. It provides a structured and intuitive way to compare and understand the relative
importance or performance of different data elements within a dataset.

Sorting:

Sorting in reporting refers to the process of arranging data in a specific order to enhance its readability and
facilitate analysis. It involves reordering data based on one or more criteria to present information in a structured
and meaningful manner.

When generating reports, sorting is often applied to tables, lists, or charts to present data in a logical sequence.
It helps users quickly identify trends, patterns, or outliers and make data-driven decisions. The sorting process
can be performed in ascending or descending order, depending on the desired presentation or analysis
requirements.

Here are a few key points about sorting in reporting:

Sorting Criteria: The choice of sorting criteria depends on the nature of the data and the objectives of the report.
Common criteria include alphabetical order for text data, numerical order for numeric values, chronological
order for dates or timestamps, or custom criteria based on specific attributes or metrics.

Single or Multiple Columns: Sorting can be applied to a single column or multiple columns in a table or list. Single-
column sorting rearranges the data based on the values in a single column, while multiple-column sorting
considers multiple criteria to establish the order.

Ascending and Descending Order: Ascending order sorts data in the lowest-to-highest or A-to-Z sequence, while
descending order sorts data in the highest-to-lowest or Z-to-A sequence. The choice of order depends on the
report's requirements and the interpretation of the data.

Interactive Sorting: In some reporting tools or software, users may have the option to interactively sort data on-
demand. This allows users to dynamically change the sorting criteria or order while exploring the report, enabling
them to focus on specific aspects of the data.

Hierarchical Sorting: In reports with hierarchical data structures, such as organizational charts or product
categories, sorting can be applied to establish the hierarchical order. This ensures that data elements are
displayed in a logical and hierarchical manner, facilitating understanding and navigation.

Sorting is a fundamental operation in reporting that helps to organize and present data in a coherent and
meaningful way. It allows users to quickly identify trends, outliers, or patterns, making it easier to derive insights
and make informed decisions based on the reported data.

Filters in Report:

Filters help users focus on specific data subsets and gain insights from their data.

Page 19 of 28
PS _ Basics of Data Analytics

Data Selection: Filters enable users to select a subset of data based on specific criteria. For example, in a sales
report, you might apply filters to display data only for a particular time period, a specific region, or a particular
product category.

Criteria and Parameters: Users can define filter criteria or parameters according to their reporting needs. These
criteria can include date ranges, numerical thresholds, categories, or any other relevant data attributes.

Dynamic Filtering: Dynamic filters allow users to interactively adjust the filter criteria to see real-time updates in
the report. This is particularly useful for exploring data and conducting ad-hoc analysis.

Default Filters: Reports often come with default filters that are applied when the report is first opened. These
defaults can be set to display the most relevant or commonly used data.

Type of Visualisation:
Data visualization (or ‘data viz’) is one of the most important aspects of data analytics. Mapping raw data using
graphical elements is great for aiding pattern-spotting and it’s useful for sharing findings in an easily digestible,
eye-catching way. And while the priority should always be the integrity of your data, if done well, data
visualization can also be a lot of fun.

"visuals" or "visualizations" refer to graphical representations of data that are designed to make complex
information more understandable and accessible. Visuals are used to present data in a way that allows viewers
to quickly grasp patterns, trends, and insights that might be less apparent when examining raw data.

Master the art of data viz and you’ll soon be spotting trends and correlations, all while flexing your creative
muscle. But before we can unlock all these benefits, we first need to understand the basics, including the
different types of data visualization and how they’re used.

Bar Chart/Column Chart: A chart that uses rectangular bars or columns of varying heights or lengths to represent
data values. It is often used to compare and display categorical data.

Line Chart: A chart that uses a series of data points


connected by lines to show trends and changes over time.
It is particularly useful for illustrating the progression or
fluctuation of quantitative data.

Page 20 of 28
PS _ Basics of Data Analytics

Pie Chart: A circular chart divided into sectors, each


representing a proportion of the whole. It is commonly used to
represent percentages or proportions and is useful for
showing the composition of a whole.

Scatter Plot: A chart that uses Cartesian coordinates to display the


relationship between two variables. It is effective in showing
correlations or patterns between variables.

Area Chart: Similar to a line chart, but with the area between the lines and the axis filled with color. It is useful
for comparing the cumulative values of multiple data series over time.

Histogram: A chart that displays the distribution of a continuous data set by dividing it into intervals or bins and
representing the frequency or count of data points within each interval.

Page 21 of 28
PS _ Basics of Data Analytics

Gantt Chart: A chart used in project management to illustrate the schedule of tasks or activities, their durations,
and their dependencies. It helps in visualizing the timeline and progress of a project.

Funnel Chart:

A funnel chart is a type of data visualization used to represent a sequential process or a series of stages where
data is progressively filtered or narrowed down. It is called a "funnel" chart because its shape resembles a funnel,
with the widest part at the top and narrowing down towards the bottom. Funnel charts are typically used in
business and marketing to visualize the stages of a sales process, customer conversion journey, or any other
sequential process with decreasing data points at each stage.

A treemap is a visual method for displaying hierarchical data that uses nested rectangles to represent the
branches of a tree diagram.

Page 22 of 28
PS _ Basics of Data Analytics

Heat Map:

A heatmap (or heat map) is a graphical representation of data where values are depicted by color. They are
essential in detecting what does or doesn't work on a website or page, and which parts and elements of a page
users engage with.

Data Calculations:
Restricted Measures:

As the name implies, restricted measures work by restricting measure values to only show certain data.

Example: Total sales for Mumbai City

In this example sales value is restricted for only Mumbai City. In this care one measure value is restricted for one
or more dimensions.

Calculated Measures:

A calculated measure is based on other measures. For example, one measure might be defined as a second
measure divided by a third measure.

For example, suppose member A refers to 150 records in the fact table, and member B refers to 300 records in
the fact table. Suppose that you create a member C that aggregates A and B together. Then member C refers to
the relevant 450 records in the fact table.

Page 23 of 28
PS _ Basics of Data Analytics

Conditional Formatting: (Exception/Alerts)

Conditional formatting can help make patterns and trends


in your data more apparent. To use it, you create rules that
determine the format of cells based on their values, such
as the following monthly temperature data with cell
colours tied to cell values.

Aggregation:

In the context of data, aggregation refers to the process of combining and summarizing individual data points
into a single representation or summary. It involves gathering data from multiple sources or multiple instances
and applying a specific operation or function to derive meaningful insights or metrics.

Aggregation is commonly used in data analysis and reporting to condense large volumes of data into more
manageable and understandable forms. It helps to identify patterns, trends, and overall characteristics of the
data. By aggregating data, analysts can extract higher-level information that provides a broader perspective and
facilitates decision-making.

The choice of aggregation method depends on the nature of the data and the desired outcome. Some common
aggregation operations include:

Summation: Adding together numeric values, such as calculating the total sales for a given period.

Counting: Determining the number of occurrences or items in a dataset, like counting the number of customers
or transactions.

Averaging: Calculating the mean or average value of a set of data points, useful for finding the average score or
rating.

Minimum and maximum: Finding the smallest or largest value in a dataset, which can be useful for identifying
outliers or extremes.

Grouping: Organizing data into categories or groups based on certain criteria, such as grouping sales by product
category or customer segment.

Aggregation can be performed at various levels of granularity, such as at the individual level, by time intervals
(e.g., hourly, daily, monthly), or by specific dimensions or attributes (e.g., by region, by product). The choice of
aggregation level depends on the analysis objectives and the level of detail required.

Overall, aggregation plays a crucial role in summarizing and extracting meaningful insights from large datasets,
enabling efficient analysis and decision-making processes.

Link Analysis:

Link analysis in data refers to the process of examining and analysing relationships or connections between
various entities or elements within a dataset. It focuses on understanding the links, associations, or
dependencies among different data points to uncover patterns, structures, and insights that may not be apparent
from the individual data elements alone.

Page 24 of 28
PS _ Basics of Data Analytics

Link analysis is commonly used in fields such as network analysis, social network analysis, fraud detection,
cybersecurity, and recommendation systems. It helps to reveal the underlying relationships and interactions
between entities, allowing for a deeper understanding of complex systems and phenomena.

Ranking:

Ranking in data analysis refers to the process of ordering or sorting data elements based on a specific criterion
or measure of importance. It assigns a numerical or ordinal position to each data point, indicating its relative
standing or value within a dataset.

Ranking is commonly used to identify the top or bottom performers, prioritize items, or understand the relative
importance or significance of data elements. It can be applied to various types of data, including numeric values,
categorical variables, or qualitative attributes.

Here are a few key points about ranking in data analysis:

Criteria for Ranking: Ranking is based on a specific criterion or metric that defines the basis for ordering the data
elements. The choice of criterion depends on the analysis objectives and the nature of the data. For example, in
sales data, ranking may be based on revenue generated, while in a survey dataset, it could be based on customer
satisfaction scores.

Ascending or Descending Order: Rankings can be assigned in ascending or descending order. Ascending order
ranks data elements from the smallest or least significant to the largest or most significant, while descending
order ranks them in the opposite direction.

Ties or Rank Equivalents: Ties occur when multiple data elements have the same value for the ranking criterion.
In such cases, various methods can be used to handle ties, including assigning the same rank to tied elements or
using tie-breaking rules to differentiate between them.

Rank Numbering: Each data element is assigned a unique rank number based on its position in the ranking order.
For example, the top-ranked element may have a rank of 1, the second-ranked element may have a rank of 2,
and so on.

Rank-Based Analysis: Once the data elements are ranked, further analysis can be performed based on their ranks.
This may include identifying top performers, outliers, trends across ranks, or segmenting the data based on rank
ranges.

Ranking can be applied in various scenarios, such as ranking products based on sales performance, ranking
employees based on performance evaluations, ranking search results based on relevance, or ranking stocks
based on market capitalization. It provides a structured and intuitive way to compare and understand the relative
importance or performance of different data elements within a dataset.

Sorting:

Sorting in reporting refers to the process of arranging data in a specific order to enhance its readability and
facilitate analysis. It involves reordering data based on one or more criteria to present information in a structured
and meaningful manner.

When generating reports, sorting is often applied to tables, lists, or charts to present data in a logical sequence.
It helps users quickly identify trends, patterns, or outliers and make data-driven decisions. The sorting process
can be performed in ascending or descending order, depending on the desired presentation or analysis
requirements.

Page 25 of 28
PS _ Basics of Data Analytics

Here are a few key points about sorting in reporting:

Sorting Criteria: The choice of sorting criteria depends on the nature of the data and the objectives of the report.
Common criteria include alphabetical order for text data, numerical order for numeric values, chronological
order for dates or timestamps, or custom criteria based on specific attributes or metrics.

Single or Multiple Columns: Sorting can be applied to a single column or multiple columns in a table or list. Single-
column sorting rearranges the data based on the values in a single column, while multiple-column sorting
considers multiple criteria to establish the order.

Ascending and Descending Order: Ascending order sorts data in the lowest-to-highest or A-to-Z sequence, while
descending order sorts data in the highest-to-lowest or Z-to-A sequence. The choice of order depends on the
report's requirements and the interpretation of the data.

Interactive Sorting: In some reporting tools or software, users may have the option to interactively sort data on-
demand. This allows users to dynamically change the sorting criteria or order while exploring the report, enabling
them to focus on specific aspects of the data.

Hierarchical Sorting: In reports with hierarchical data structures, such as organizational charts or product
categories, sorting can be applied to establish the hierarchical order. This ensures that data elements are
displayed in a logical and hierarchical manner, facilitating understanding and navigation.

Sorting is a fundamental operation in reporting that helps to organize and present data in a coherent and
meaningful way. It allows users to quickly identify trends, outliers, or patterns, making it easier to derive insights
and make informed decisions based on the reported data.

Best Practices for Visualisation:


When creating data visualizations, there are several thumb rules or best practices that can help ensure effective
and impactful representations of data. Here are some key guidelines to consider:

Understand Your Audience: Start by understanding who will be consuming your visualization and their level of
familiarity with the data. Tailor your visualization to their needs, knowledge, and preferences to make it more
relevant and accessible.

Choose the Right Chart Type: Select a chart type that best represents the relationships and patterns in your data.
Consider factors such as the data's nature (e.g., categorical, numerical, time-series), the variables you want to
compare, and the story you want to convey. Common chart types include bar charts, line charts, pie charts,
scatter plots, and maps.

Simplify and Declutter: Keep your visualization clean and uncluttered by removing unnecessary elements. Avoid
excessive decorations, gridlines, or non-essential labels. Use whitespace effectively to enhance clarity and focus
attention on the important aspects of the data.

Highlight Key Insights: Identify the key insights or messages you want to convey and emphasize them in your
visualization. Use visual cues such as colour, size, or annotations to draw attention to critical data points or
trends.

Ensure Data Accuracy: Ensure that your data is accurate, reliable, and properly represented in the visualization.
Validate your data sources and calculations to maintain credibility and prevent misinterpretation.

Use Appropriate Colours: Choose colours thoughtfully to enhance readability and interpretation. Use a limited
colour palette that is visually appealing and provides clear contrasts. Ensure color choices are accessible for color-
blind viewers.

Page 26 of 28
PS _ Basics of Data Analytics

Provide Context and Labels: Provide sufficient context to help viewers understand the data. Include clear and
descriptive titles, axis labels, legends, and units of measurement. Provide contextual information or explanations
as necessary.

Use Intuitive Design: Design your visualization with a user-cantered approach. Arrange data elements logically
and intuitively, considering reading patterns (e.g., left-to-right, top-to-bottom). Align elements consistently to
create a harmonious and organized display.

Use Interactivity Wisely: If your visualization is interactive, ensure that the interactivity adds value and doesn't
overwhelm or distract the viewer. Provide intuitive and meaningful interactions that allow users to explore the
data and gain deeper insights.

Test and Iterate: Test your visualization with representative users to gather feedback and identify areas for
improvement. Iterate and refine your design based on user feedback and data storytelling objectives.

Remember, these are general guidelines, and the specific requirements of your data and audience may warrant
deviations from these rules. The key is to understand the principles behind effective data visualization and adapt
them to your specific context to create impactful and meaningful representations of your data.

Colour Standards:
Color plays a crucial role in data visualization, and following best practices for color selection can enhance the
clarity, accessibility, and effectiveness of your visualizations. Here are some color standard best practices to
consider:

Use a Limited Color Palette: Limit the number of colors used in your visualization to maintain simplicity and avoid
visual clutter. A small, carefully chosen color palette improves visual coherence and reduces confusion. Using too
many colors can make it difficult to distinguish between data categories or values.

Choose High Contrast Colors: Select colors that provide sufficient contrast to ensure readability and accessibility.
High contrast between data elements and background or between different data categories improves visibility
for all users, including those with color vision deficiencies. Test your color choices in grayscale or with color blind
simulations to ensure they remain distinguishable.

Consider Color Meaning and Associations: Colors can carry cultural, psychological, or contextual meanings. Be
mindful of the associations or connotations that certain colors may have in your target audience or specific
context. For example, using red to represent positive values might be counterintuitive in many cultural contexts.
Use color in a way that aligns with the intended message or narrative of your visualization.

Account for Color Vision Deficiencies: Around 8% of men and 0.5% of women have some form of color vision
deficiency. Consider designing your visualizations to accommodate these users. Avoid relying solely on color to
convey information, and use additional visual cues, such as patterns, labels, or shapes, to differentiate data
elements.

Leverage Color Scales: When visualizing continuous data or gradients, use color scales that represent a clear
progression from low to high values. Gradual changes in hue, saturation, or brightness can effectively convey
magnitude or intensity. Consider using color scales that are perceptually uniform, such as those based on the
CIELAB or HCL color models, to ensure that equal steps in the color scale correspond to perceptually equal
changes.

Be Mindful of Color Combination: Ensure that your chosen colours work well together and do not cause visual
discomfort or confusion. Avoid combinations that create strong vibrations, such as red and blue, or combinations

Page 27 of 28
PS _ Basics of Data Analytics

that make it difficult to differentiate data elements. Tools like color harmony or color palette generators can assist
in finding pleasing and harmonious color combinations.

Test for Accessibility: Test your visualizations for accessibility by checking color contrast ratios. Ensure that the
color contrast between text and background meets accessibility standards, such as those outlined in the Web
Content Accessibility Guidelines (WCAG). Various online tools and color contrast checkers are available to assist
in verifying the accessibility of your color choices.

Document and Share Color Usage: Maintain a clear and consistent documentation of the colours used in your
visualizations. This helps ensure continuity across different charts or reports and enables others to reproduce
your visualizations accurately.

Remember, the best color choices depend on the specific context, audience, and objectives of your visualization.
Applying these color standard best practices can help create visually appealing, accessible, and effective data
visualizations.

Page 28 of 28

You might also like