Section+2_Basics+of+Data+Analytics
Section+2_Basics+of+Data+Analytics
Parwaaz Solutions
PS _ Basics of Data Analytics
Contents
1. Data Analytics Basics .................................................................................................................................................... 2
What Is Data Analysis? ............................................................................................................................................ 2
Overview of data analysis ....................................................................................................................................... 4
Roles in data ............................................................................................................................................................ 6
Data Analytics Techniques....................................................................................................................................... 8
Exploratory Data Analysis (EDA).............................................................................................................................. 8
What is Data Mining? .............................................................................................................................................. 9
What is Data Profiling? ............................................................................................................................................ 9
2. Data Integration ........................................................................................................................................................... 9
What Is Data Integration? ....................................................................................................................................... 9
ETL Vs ELT .............................................................................................................................................................. 10
Database ............................................................................................................................................................... 10
Object Storage....................................................................................................................................................... 10
Data Lake ............................................................................................................................................................... 10
OLTP Vs OLAP ........................................................................................................................................................ 10
3. Data Modelling and Data Preparation ....................................................................................................................... 10
Data Warehouse.................................................................................................................................................... 10
Business Intelligent ............................................................................................................................................... 11
KPI ......................................................................................................................................................................... 11
Master Data Vs Transactional Data ....................................................................................................................... 11
Star Schema Vs Snowflake Schema ....................................................................................................................... 11
Normalization vs. denormalization ....................................................................................................................... 12
Elements of Data Model: ...................................................................................................................................... 13
What Is Data Blending? ......................................................................................................................................... 14
Data Blending Vs Data Joining ............................................................................................................................... 14
Data Model Vs Data Sets ....................................................................................................................................... 14
Data Preparation: .................................................................................................................................................. 15
4. Reporting ................................................................................................................................................................... 16
Types of Reports.................................................................................................................................................... 16
Reporting Elements: .............................................................................................................................................. 17
Type of Visualisation: ............................................................................................................................................ 20
Data Calculations:.................................................................................................................................................. 23
Best Practices for Visualisation: ............................................................................................................................ 26
Colour Standards: .................................................................................................................................................. 27
Page 1 of 28
PS _ Basics of Data Analytics
Data analysts are essential in helping organizations gain valuable insights into the expanse of data that
they have, and they work closely with others in the organization to help reveal valuable information.
The following figure shows the five key areas that you'll engage in during the data analysis process.
Prepare
Before a report can be created, data must be prepared. Data preparation is the process of profiling,
cleaning, and transforming your data to get it ready to model and visualize.
Data preparation is the process of taking raw data and turning it into information that is trusted and
understandable. It involves, among other things, ensuring the integrity of the data, correcting wrong
or inaccurate data, identifying missing data, converting data from one structure to another or from one
type to another, or even a task as simple as making data more readable.
Data preparation also involves understanding how you're going to get and connect to the data and the
performance implications of the decisions. When connecting to data, you need to make decisions to
ensure that models and reports meet, and perform to, acknowledged requirements and expectations.
Model
When the data is in a proper state, it's ready to be modelled. Data modeling is the process of
determining how your tables are related to each other. This process is done by defining and creating
relationships between the tables. From that point, you can enhance the model by defining metrics and
adding custom calculations to enrich your data.
An effective data model makes reports more accurate, allows the data to be explored faster and
efficiently, decreases time for the report writing process, and simplifies future report maintenance.
The model is another critical component that has a direct effect on the performance of your report and
overall data analysis. A poorly designed model can have a drastically negative impact on the general
accuracy and performance of your report. Conversely, a well-designed model with well-prepared data
Page 2 of 28
PS _ Basics of Data Analytics
will ensure a properly efficient and trusted report. This notion is more prevalent when you are working
with data at scale.
From a Power BI perspective, if your report is performing slowly, or your refreshes are taking a long
time, you will likely need to revisit the data preparation and modeling tasks to optimize your report.
The process of preparing data and modeling data is an iterative process. Data preparation is the first
task in data analysis. Understanding and preparing your data before you model it will make the
modeling step much easier.
Visualize
The visualization task is where you get to bring your data to life. A well-designed report should tell a
compelling story about that data, which will enable business decision makers to quickly gain needed
insights. By using appropriate visualizations and interactions, you can provide an effective report that
guides the reader through the content quickly and efficiently, therefore allowing the reader to follow
a narrative into the data.
With the built-in AI capabilities in Power BI, data analysts can build powerful reports, without writing
any code, that enable users to get insights and answers and find actionable objectives. The AI
capabilities in Power BI, such as the built-in AI visuals, enable the discovering of data by asking
questions, using the Quick Insights feature, or creating machine learning models directly within Power
BI.
An important aspect of visualizing data is designing and creating reports for accessibility. As you build
reports, it is important to think about people who will be accessing and reading the reports. Reports
should be designed with accessibility in mind from the outset so that no special modifications are
needed in the future.
Many components of your report will help with storytelling. From a color scheme that is
complementary and accessible, to fonts and sizing, to picking the right visuals for what is being
displayed, they all come together to tell that story.
Analyze
The analyze task is the important step of understanding and interpreting the information that is
displayed on the report. In your role as a data analyst, you should understand the analytical capabilities
of Power BI and use those capabilities to find insights, identify patterns and trends, predict outcomes,
and then communicate those insights in a way that everyone can understand.
Advanced analytics enables businesses and organizations to ultimately drive better decisions
throughout the business and create actionable insights and meaningful results. With advanced
analytics, organizations can drill into the data to predict future patterns and trends, identify activities
and behaviours, and enable businesses to ask the appropriate questions about their data.
Page 3 of 28
PS _ Basics of Data Analytics
This feature is another area where AI integrations within Power BI can take your analysis to the next
level. Integrations with Azure Machine Learning, cognitive services, and built-in AI visuals will help to
enrich your data and analysis.
Manage
Power BI consists of many components, including reports, dashboards, workspaces, datasets, and
more. As a data analyst, you are responsible for the management of these Power BI assets, overseeing
the sharing and distribution of items, such as reports and dashboards, and ensuring the security of
Power BI assets.
The management of your content helps to foster collaboration between teams and individuals. Sharing
and discovery of your content is important for the right people to get the answers that they need. It is
also important to help ensure that items are secure. You want to make sure that the right people have
access and that you are not leaking data past the correct stakeholders.
Proper management can also help reduce data silos within your organization. Data duplication can
make managing and introducing data latency difficult when resources are overused. Power BI helps
reduce data silos with the use of shared datasets, and it allows you to reuse data that you have
prepared and modelled. For key business data, endorsing a dataset as certified can help to ensure trust
in that data.
Data analysis is the process of identifying, cleaning, transforming, and modelling data to discover meaningful
and useful information. The data is then crafted into a story through reports for analysis to support the critical
decision-making process.
Data-driven businesses make decisions based on the story that their data tells, and in today's data-driven world,
data is not being used to its full potential, a challenge that most businesses face. Data analysis is, and should be,
a critical aspect of all organizations to help determine the impact to their business, including evaluating customer
sentiment, performing market and product research, and identifying trends or other data insights.
While the process of data analysis focuses on the tasks of cleaning, modeling, and visualizing data, the concept
of data analysis and its importance to business should not be understated. To analyze data, core components of
analytics are divided into the following categories:
Descriptive
Diagnostic
Predictive
Prescriptive
Cognitive
Page 4 of 28
PS _ Basics of Data Analytics
Descriptive analytics
Descriptive analytics help answer questions about what has happened based on historical data. Descriptive
analytics techniques summarize large datasets to describe outcomes to stakeholders.
By developing key performance indicators (KPIs), these strategies can help track the success or failure of key
objectives. Metrics such as return on investment (ROI) are used in many industries, and specialized metrics are
developed to track performance in specific industries.
An example of descriptive analytics is generating reports to provide a view of an organization's sales and financial
data.
Diagnostic analytics
Diagnostic analytics help answer questions about why events happened. Diagnostic analytics techniques
supplement basic descriptive analytics, and they use the findings from descriptive analytics to discover the cause
of these events. Then, performance indicators are further investigated to discover why these events improved
or became worse. Generally, this process occurs in three steps:
Identify anomalies in the data. These anomalies might be unexpected changes in a metric or a particular market.
Use statistical techniques to discover relationships and trends that explain these anomalies.
Predictive analytics
Predictive analytics help answer questions about what will happen in the future. Predictive analytics techniques
use historical data to identify trends and determine if they're likely to recur. Predictive analytical tools provide
valuable insight into what might happen in the future. Techniques include a variety of statistical and machine
learning techniques such as neural networks, decision trees, and regression.
Prescriptive analytics
Prescriptive analytics help answer questions about which actions should be taken to achieve a goal or target. By
using insights from prescriptive analytics, organizations can make data-driven decisions. This technique allows
businesses to make informed decisions in the face of uncertainty. Prescriptive analytics techniques rely on
machine learning as one of the strategies to find patterns in large datasets. By analysing past decisions and
events, organizations can estimate the likelihood of different outcomes.
Cognitive analytics
Cognitive analytics attempt to draw inferences from existing data and patterns, derive conclusions based on
existing knowledge bases, and then add these findings back into the knowledge base for future inferences, a self-
learning feedback loop. Cognitive analytics help you learn what might happen if circumstances change and
determine how you might handle these situations.
Example
By enabling reporting and data visualizations, a retail business uses descriptive analytics to look at patterns of
purchases from previous years to determine what products might be popular next year. The company might also
look at supporting data to understand why a particular product was popular and if that trend is continuing, which
will help them determine whether to continue stocking that product.
Page 5 of 28
PS _ Basics of Data Analytics
A business might determine that a certain product was popular over a specific timeframe. Then, they can use
this analysis to determine whether certain marketing efforts or online social activities contributed to the sales
increase.
An underlying facet of data analysis is that a business needs to trust its data. As a practice, the data analysis
process will capture data from trusted sources and shape it into something that is consumable, meaningful, and
easily understood to help with the decision-making process. Data analysis enables businesses to fully understand
their data through data-driven processes and decisions, allowing them to be confident in their decisions.
As the amount of data grows, so does the need for data analysts. A data analyst knows how to organize
information and distill it into something relevant and comprehensible. A data analyst knows how to gather the
right data and what to do with it, in other words, making sense of the data in your data overload.
Roles in data
Telling a story with the data is a journey that usually doesn't start with you. The data must come from
somewhere. Getting that data into a place that is usable by you takes effort that is likely out of your scope,
especially in consideration of the enterprise.
In the recent past, roles such as business analysts and business intelligence developers were the standard for
data processing and understanding. However, excessive expansion of the size and different types of data has
caused these roles to evolve into more specialized sets of skills that modernize and streamline the processes of
data engineering and analysis.
The following sections highlight these different roles in data and the specific responsibility in the overall
spectrum of data discovery and understanding:
Business analyst
While some similarities exist between a data analyst and business analyst, the key differentiator between the
two roles is what they do with data. A business analyst is closer to the business and is a specialist in interpreting
the data that comes from the visualization. Often, the roles of data analyst and business analyst could be the
responsibility of a single person.
Data analyst
A data analyst enables businesses to maximize the value of their data assets through visualization and reporting
tools such as Microsoft Power BI. Data analysts are responsible for profiling, cleaning, and transforming data.
Their responsibilities also include designing and building scalable and effective data models, and enabling and
implementing the advanced analytics capabilities into reports for analysis. A data analyst works with the
pertinent stakeholders to identify appropriate and necessary data and reporting requirements, and then they
are tasked with turning raw data into relevant and meaningful insights.
Page 6 of 28
PS _ Basics of Data Analytics
A data analyst is also responsible for the management of Power BI assets, including reports, dashboards,
workspaces, and the underlying datasets that are used in the reports. They are tasked with implementing and
configuring proper security procedures, in conjunction with stakeholder requirements, to ensure the
safekeeping of all Power BI assets and their data.
Data engineer
Data engineers provision and set up data platform technologies that are on-premises and in the cloud. They
manage and secure the flow of structured and unstructured data from multiple sources. The data platforms that
they use can include relational databases, nonrelational databases, data streams, and file stores. Data engineers
also ensure that data services securely and seamlessly integrate across data platforms.
Primary responsibilities of data engineers include the use of on-premises and cloud data services and tools to
ingest, egress, and transform data from multiple sources. Data engineers collaborate with business stakeholders
to identify and meet data requirements. They design and implement solutions.
While some alignment might exist in the tasks and responsibilities of a data engineer and a database
administrator, a data engineer's scope of work goes well beyond looking after a database and the server where
it's hosted and likely doesn't include the overall operational data management.
As a data analyst, you would work closely with a data engineer in making sure that you can access the variety of
structured and unstructured data sources because they will support you in optimizing data models, which are
typically served from a modern data warehouse or data lake.
Data scientist
Data scientists perform advanced analytics to extract value from data. Their work can vary from descriptive
analytics to predictive analytics. Descriptive analytics evaluate data through a process known as exploratory data
analysis (EDA). Predictive analytics are used in machine learning to apply modeling techniques that can detect
anomalies or patterns. These analytics are important parts of forecast models.
Descriptive and predictive analytics are only partial aspects of data scientists' work. Some data scientists might
work in the realm of deep learning, performing iterative experiments to solve a complex data problem by using
customized algorithms.
Anecdotal evidence suggests that most of the work in a data science project is spent on data wrangling and
feature engineering. Data scientists can speed up the experimentation process when data engineers use their
skills to successfully wrangle data.
On the surface, it might seem that a data scientist and data analyst are far apart in the work that they do, but
this conjecture is untrue. A data scientist looks at data to determine the questions that need answers and will
often devise a hypothesis or an experiment and then turn to the data analyst to assist with the data visualization
and reporting.
Database administrator
A database administrator is responsible for the overall availability and consistent performance and optimizations
of the database solutions. They work with stakeholders to identify and implement the policies, tools, and
processes for data backup and recovery plans.
Page 7 of 28
PS _ Basics of Data Analytics
The role of a database administrator is different from the role of a data engineer. A database administrator
monitors and manages the overall health of a database and the hardware that it resides on, whereas a data
engineer is involved in the process of data wrangling, in other words, ingesting, transforming, validating, and
cleaning data to meet business needs and requirements.
Page 8 of 28
PS _ Basics of Data Analytics
2. Data Integration
What Is Data Integration?
Data integration combines various types and formats of data from any source across an organization into a data
lake or data warehouse to provide a unified fact base for analytics. Working from this one data set allows
businesses to make better decisions, aligns departments to work better together, and drives better customer
experience.
Data integration means consolidating data from multiple sources into a single dataset to be used for consistent
business intelligence or analytics.
This is a very simple explanation for a complex topic that has evolved over its 30 year history. Understanding
how data integration has transitioned from a backend, retrospective process into core real-time infrastructure
starts with an examination of how data integration works.
Page 9 of 28
PS _ Basics of Data Analytics
To move data from one system to another requires a data pipeline that understands the structure and meaning
of the data as well as defines the path it will take through the technical systems. A relatively simple and common
type of data integration is data ingestion, where data from one system is integrated on a regular basis into
another system. Data integration may also include cleansing, sorting, enrichment and other processes to make
the data ready for use at its final destination.
ETL Vs ELT
Sometimes this happens before the data is stored and the process is called ETL (extract, transform, load). Other
times it makes more sense to store the data first, then prepare it for use known as ELT (extract, load, transform).
And in yet other cases, data is transformed and conformed where it is stored without moving it per se.
The steps, in part, depend on how the data will be stored. The most common types of data storage are:
ETL Tools: Informitica, Talend, MS Azure, IBM InfoSphere DataStage, Oracle Data Integrator,
Database
The simplest and most familiar way to store data includes both relational databases and NoSQL data stores and
may not require data transformation at all.
Object Storage
Stores large amounts of unstructured data, such as sensor data, audio and video files, photos, etc., in their native
format in simple, self-contained repositories that include the data, metadata, and a unique ID number. The
metadata and ID number allow applications to locate and access the data.
Data Lake
Collects raw and unstructured data in a single storage system, often object storage, to be transformed and used
later. Data lakes hold vast amounts of a wide variety of data types and make processing big data and applying
machine learning and AI possible.
OLTP Vs OLAP
Page 10 of 28
PS _ Basics of Data Analytics
Business Intelligent
Business intelligence combines business analytics, data mining, data visualization, data tools and infrastructure,
and best practices to help organizations make more data-driven decisions.
The overall objective of business intelligence is to allow a business to make informed decisions.
KPI
Key performance indicators (KPIs) refer to a set of quantifiable measurements used to gauge a company’s overall
long-term performance. KPIs specifically help determine a company’s strategic, financial, and operational
achievements, especially compared to those of other businesses within the same sector.
Page 11 of 28
PS _ Basics of Data Analytics
To understand some star schema concepts described in this article, it's important to know two terms:
normalization and denormalization.
Normalization is the term used to describe data that's stored in a way that reduces repetitious data. Consider a
table of products that has a unique key value column, like the product key, and additional columns describing
product characteristics, including product name, category, color, and size. A sales table is considered
normalized when it stores only keys, like the product key. In the following image, notice that only
the ProductKey column records the product.
If, however, the sales table stores product details beyond the key, it's considered denormalized. In the
following image, notice that the ProductKey and other product-related columns record the product.
Page 12 of 28
PS _ Basics of Data Analytics
When you source data from an export file or data extract, it's likely that it represents a denormalized set of
data. In this case, use Power Query to transform and shape the source data into multiple normalized tables.
As described in this article, you should strive to develop optimized Power BI data models with tables that
represent normalized fact and dimension data. However, there's one exception where a snowflake
dimension should be denormalized to produce a single model table.
Page 13 of 28
PS _ Basics of Data Analytics
• Measurements/facts
• Foreign key to dimension table
Data blending differs from data integration and data warehousing in that its primary use is not to create a single
version of the truth that’s stored in data warehouses or other systems of record within an organization. Instead,
this process is conducted by a business or data analyst with the goal of building an analytic dataset to help
answer specific business questions.
Merge
Append
A data append is a process that involves adding new data elements to an existing database. An example of a
common data append would be the enhancement of a company’s customer files. Companies often collect basic
information on their clients such as phone numbers, emails, or addresses. A data append takes the information
they have, matches it against a larger database of business data, allowing the desired missing data fields to be
added.
Example: Combining the Oracle Table with SQL Server Table (OR) Combining Two Tables from Oracle Database
(OR) Combining Two Sheets from Excel (OR) Combining Excel Sheet and Oracle Table and so on.
Data Joining: Combining the data between two or more tables or sheets within the same Data Source is Data
Joining.
Example: Combining Two Tables from the same Oracle Database (or) DB2 (or) Sybase (or) SQL Server and others.
Combining two or more worksheets from the same Excel file.
In Data Joining, all the combined tables or sheets contains common set of Dimensions and Measures.
Page 14 of 28
PS _ Basics of Data Analytics
A data model is a representation of the business data of an organization or business segment. Located on top of
the physical table. You can use a model as the basis for your story.
Acquired: Data is imported (copied) and stored in Cloud. Changes made to the data in the source system don’t
affect the imported data.
Live or Federated: Data is stored in the source system. It isn’t copied to Cloud, so any changes in the source data
are available immediately if no structural changes are brought to the table or SQL view.
Datasets
A dataset is a simple collection of data, usually presented in a table. You can use a dataset as the basis for your
story, and as a data source.
Data Models complement datasets or vice versa. Datasets are more suitable for ad-hoc analysis, while data
models are more suitable for governed-data use cases.
Data Preparation:
Data preparation, also known as data preprocessing, is a crucial step in the data analysis and machine learning
pipeline. It involves cleaning, transforming, and organizing raw data into a format that is suitable for analysis or
machine learning model training. Data preparation is essential because real-world data is often messy,
incomplete, and unstructured, and it may contain errors, outliers, or irrelevant information.
Data Cleansing
Data cleaning is the process that removes data that does not belong in your dataset. Data transformation is the
process of converting data from one format or structure into another. Transformation processes can also be
referred to as data wrangling, or data munging, transforming and mapping data from one "raw" data form into
another format for warehousing and analyzing. This article focuses on the processes of cleaning that data.
• Remove unnecessary values.
• Remove duplicate data.
• Avoid typos.
• Convert data types.
• Search for missing values.
• Use a clear format.
• Translate language.
• Remove unwanted outliers.
Data Wrangling: (Concatenate, split, extract, replace, change)
Data wrangling is the process of converting raw data into a usable form. It may also be called data munging or
data remediation.
You'll typically go through the data wrangling process prior to conducting any data analysis in order to ensure
your data is reliable and complete. This way, you can be confident that the insights you draw are accurate and
valuable.
• Discovery
• Transformation
Page 15 of 28
PS _ Basics of Data Analytics
• Validation
• Publishing
Data Validation
Data validation is the practice of checking the integrity, accuracy and structure of data before it is used for a
business operation. Data validation operation results can provide data used for data analytics, business
intelligence or training a machine learning model. It can also be used to ensure the integrity of data for financial
accounting or regulatory compliance.
Data can be examined as part of a validation process in a variety of ways, including data type, constraint,
structured, consistency and code validation. Each type of data validation is designed to make sure the data meets
the requirements to be useful.
4. Reporting
Types of Reports
Dashboard:
A dashboard is a visual representation of data that provides a consolidated view of key information and metrics.
It is typically presented in the form of charts, graphs, tables, and other visual elements to help users quickly
understand and analyse complex data sets.
Story:
Story is a presentation-style report that uses charts, visualizations, text, images, and pictograms to describe data.
Application:
Reports:
A report is a document that presents information in an organized format for a specific audience and purpose.
Although summaries of reports may be delivered orally, complete reports are almost always in the form of
written documents
Charts:
A chart is a visual representation of data or information that helps in presenting it in a clear, organized, and easily
understandable manner. It is a graphical representation that allows for quick comprehension of complex data
sets, patterns, trends, and relationships.
Page 16 of 28
PS _ Basics of Data Analytics
Charts are commonly used in various fields, including business, finance, economics, science, and statistics. They
serve the purpose of conveying information visually, making it easier for people to grasp the data and draw
insights from it.
There are several types of charts, each suitable for representing different types of data and serving different
purposes. Some commonly used charts include:
Reporting Elements:
Restricted Measures:
As the name implies, restricted measures work by restricting measure values to only show certain data.
In this example sales value is restricted for only Mumbai City. In this care one measure value is restricted for one
or more dimensions.
Calculated Measures:
A calculated measure is based on other measures. For example, one measure might be defined as a second
measure divided by a third measure.
For example, suppose member A refers to 150 records in the fact table, and member B refers to 300 records in
the fact table. Suppose that you create a member C that aggregates A and B together. Then member C refers to
the relevant 450 records in the fact table.
Aggregation:
In the context of data, aggregation refers to the process of combining and summarizing individual data points
into a single representation or summary. It involves gathering data from multiple sources or multiple instances
and applying a specific operation or function to derive meaningful insights or metrics.
Aggregation is commonly used in data analysis and reporting to condense large volumes of data into more
manageable and understandable forms. It helps to identify patterns, trends, and overall characteristics of the
data. By aggregating data, analysts can extract higher-level information that provides a broader perspective and
facilitates decision-making.
The choice of aggregation method depends on the nature of the data and the desired outcome. Some common
aggregation operations include:
Summation: Adding together numeric values, such as calculating the total sales for a given period.
Page 17 of 28
PS _ Basics of Data Analytics
Counting: Determining the number of occurrences or items in a dataset, like counting the number of customers
or transactions.
Averaging: Calculating the mean or average value of a set of data points, useful for finding the average score or
rating.
Minimum and maximum: Finding the smallest or largest value in a dataset, which can be useful for identifying
outliers or extremes.
Grouping: Organizing data into categories or groups based on certain criteria, such as grouping sales by product
category or customer segment.
Aggregation can be performed at various levels of granularity, such as at the individual level, by time intervals
(e.g., hourly, daily, monthly), or by specific dimensions or attributes (e.g., by region, by product). The choice of
aggregation level depends on the analysis objectives and the level of detail required.
Overall, aggregation plays a crucial role in summarizing and extracting meaningful insights from large datasets,
enabling efficient analysis and decision-making processes.
Link Analysis:
Link analysis in data refers to the process of examining and analysing relationships or connections between
various entities or elements within a dataset. It focuses on understanding the links, associations, or
dependencies among different data points to uncover patterns, structures, and insights that may not be apparent
from the individual data elements alone.
Link analysis is commonly used in fields such as network analysis, social network analysis, fraud detection,
cybersecurity, and recommendation systems. It helps to reveal the underlying relationships and interactions
between entities, allowing for a deeper understanding of complex systems and phenomena.
Ranking:
Ranking in data analysis refers to the process of ordering or sorting data elements based on a specific criterion
or measure of importance. It assigns a numerical or ordinal position to each data point, indicating its relative
standing or value within a dataset.
Ranking is commonly used to identify the top or bottom performers, prioritize items, or understand the relative
importance or significance of data elements. It can be applied to various types of data, including numeric values,
categorical variables, or qualitative attributes.
Criteria for Ranking: Ranking is based on a specific criterion or metric that defines the basis for ordering the data
elements. The choice of criterion depends on the analysis objectives and the nature of the data. For example, in
sales data, ranking may be based on revenue generated, while in a survey dataset, it could be based on customer
satisfaction scores.
Ascending or Descending Order: Rankings can be assigned in ascending or descending order. Ascending order
ranks data elements from the smallest or least significant to the largest or most significant, while descending
order ranks them in the opposite direction.
Ties or Rank Equivalents: Ties occur when multiple data elements have the same value for the ranking criterion.
In such cases, various methods can be used to handle ties, including assigning the same rank to tied elements or
using tie-breaking rules to differentiate between them.
Page 18 of 28
PS _ Basics of Data Analytics
Rank Numbering: Each data element is assigned a unique rank number based on its position in the ranking order.
For example, the top-ranked element may have a rank of 1, the second-ranked element may have a rank of 2,
and so on.
Rank-Based Analysis: Once the data elements are ranked, further analysis can be performed based on their ranks.
This may include identifying top performers, outliers, trends across ranks, or segmenting the data based on rank
ranges.
Ranking can be applied in various scenarios, such as ranking products based on sales performance, ranking
employees based on performance evaluations, ranking search results based on relevance, or ranking stocks
based on market capitalization. It provides a structured and intuitive way to compare and understand the relative
importance or performance of different data elements within a dataset.
Sorting:
Sorting in reporting refers to the process of arranging data in a specific order to enhance its readability and
facilitate analysis. It involves reordering data based on one or more criteria to present information in a structured
and meaningful manner.
When generating reports, sorting is often applied to tables, lists, or charts to present data in a logical sequence.
It helps users quickly identify trends, patterns, or outliers and make data-driven decisions. The sorting process
can be performed in ascending or descending order, depending on the desired presentation or analysis
requirements.
Sorting Criteria: The choice of sorting criteria depends on the nature of the data and the objectives of the report.
Common criteria include alphabetical order for text data, numerical order for numeric values, chronological
order for dates or timestamps, or custom criteria based on specific attributes or metrics.
Single or Multiple Columns: Sorting can be applied to a single column or multiple columns in a table or list. Single-
column sorting rearranges the data based on the values in a single column, while multiple-column sorting
considers multiple criteria to establish the order.
Ascending and Descending Order: Ascending order sorts data in the lowest-to-highest or A-to-Z sequence, while
descending order sorts data in the highest-to-lowest or Z-to-A sequence. The choice of order depends on the
report's requirements and the interpretation of the data.
Interactive Sorting: In some reporting tools or software, users may have the option to interactively sort data on-
demand. This allows users to dynamically change the sorting criteria or order while exploring the report, enabling
them to focus on specific aspects of the data.
Hierarchical Sorting: In reports with hierarchical data structures, such as organizational charts or product
categories, sorting can be applied to establish the hierarchical order. This ensures that data elements are
displayed in a logical and hierarchical manner, facilitating understanding and navigation.
Sorting is a fundamental operation in reporting that helps to organize and present data in a coherent and
meaningful way. It allows users to quickly identify trends, outliers, or patterns, making it easier to derive insights
and make informed decisions based on the reported data.
Filters in Report:
Filters help users focus on specific data subsets and gain insights from their data.
Page 19 of 28
PS _ Basics of Data Analytics
Data Selection: Filters enable users to select a subset of data based on specific criteria. For example, in a sales
report, you might apply filters to display data only for a particular time period, a specific region, or a particular
product category.
Criteria and Parameters: Users can define filter criteria or parameters according to their reporting needs. These
criteria can include date ranges, numerical thresholds, categories, or any other relevant data attributes.
Dynamic Filtering: Dynamic filters allow users to interactively adjust the filter criteria to see real-time updates in
the report. This is particularly useful for exploring data and conducting ad-hoc analysis.
Default Filters: Reports often come with default filters that are applied when the report is first opened. These
defaults can be set to display the most relevant or commonly used data.
Type of Visualisation:
Data visualization (or ‘data viz’) is one of the most important aspects of data analytics. Mapping raw data using
graphical elements is great for aiding pattern-spotting and it’s useful for sharing findings in an easily digestible,
eye-catching way. And while the priority should always be the integrity of your data, if done well, data
visualization can also be a lot of fun.
"visuals" or "visualizations" refer to graphical representations of data that are designed to make complex
information more understandable and accessible. Visuals are used to present data in a way that allows viewers
to quickly grasp patterns, trends, and insights that might be less apparent when examining raw data.
Master the art of data viz and you’ll soon be spotting trends and correlations, all while flexing your creative
muscle. But before we can unlock all these benefits, we first need to understand the basics, including the
different types of data visualization and how they’re used.
Bar Chart/Column Chart: A chart that uses rectangular bars or columns of varying heights or lengths to represent
data values. It is often used to compare and display categorical data.
Page 20 of 28
PS _ Basics of Data Analytics
Area Chart: Similar to a line chart, but with the area between the lines and the axis filled with color. It is useful
for comparing the cumulative values of multiple data series over time.
Histogram: A chart that displays the distribution of a continuous data set by dividing it into intervals or bins and
representing the frequency or count of data points within each interval.
Page 21 of 28
PS _ Basics of Data Analytics
Gantt Chart: A chart used in project management to illustrate the schedule of tasks or activities, their durations,
and their dependencies. It helps in visualizing the timeline and progress of a project.
Funnel Chart:
A funnel chart is a type of data visualization used to represent a sequential process or a series of stages where
data is progressively filtered or narrowed down. It is called a "funnel" chart because its shape resembles a funnel,
with the widest part at the top and narrowing down towards the bottom. Funnel charts are typically used in
business and marketing to visualize the stages of a sales process, customer conversion journey, or any other
sequential process with decreasing data points at each stage.
A treemap is a visual method for displaying hierarchical data that uses nested rectangles to represent the
branches of a tree diagram.
Page 22 of 28
PS _ Basics of Data Analytics
Heat Map:
A heatmap (or heat map) is a graphical representation of data where values are depicted by color. They are
essential in detecting what does or doesn't work on a website or page, and which parts and elements of a page
users engage with.
Data Calculations:
Restricted Measures:
As the name implies, restricted measures work by restricting measure values to only show certain data.
In this example sales value is restricted for only Mumbai City. In this care one measure value is restricted for one
or more dimensions.
Calculated Measures:
A calculated measure is based on other measures. For example, one measure might be defined as a second
measure divided by a third measure.
For example, suppose member A refers to 150 records in the fact table, and member B refers to 300 records in
the fact table. Suppose that you create a member C that aggregates A and B together. Then member C refers to
the relevant 450 records in the fact table.
Page 23 of 28
PS _ Basics of Data Analytics
Aggregation:
In the context of data, aggregation refers to the process of combining and summarizing individual data points
into a single representation or summary. It involves gathering data from multiple sources or multiple instances
and applying a specific operation or function to derive meaningful insights or metrics.
Aggregation is commonly used in data analysis and reporting to condense large volumes of data into more
manageable and understandable forms. It helps to identify patterns, trends, and overall characteristics of the
data. By aggregating data, analysts can extract higher-level information that provides a broader perspective and
facilitates decision-making.
The choice of aggregation method depends on the nature of the data and the desired outcome. Some common
aggregation operations include:
Summation: Adding together numeric values, such as calculating the total sales for a given period.
Counting: Determining the number of occurrences or items in a dataset, like counting the number of customers
or transactions.
Averaging: Calculating the mean or average value of a set of data points, useful for finding the average score or
rating.
Minimum and maximum: Finding the smallest or largest value in a dataset, which can be useful for identifying
outliers or extremes.
Grouping: Organizing data into categories or groups based on certain criteria, such as grouping sales by product
category or customer segment.
Aggregation can be performed at various levels of granularity, such as at the individual level, by time intervals
(e.g., hourly, daily, monthly), or by specific dimensions or attributes (e.g., by region, by product). The choice of
aggregation level depends on the analysis objectives and the level of detail required.
Overall, aggregation plays a crucial role in summarizing and extracting meaningful insights from large datasets,
enabling efficient analysis and decision-making processes.
Link Analysis:
Link analysis in data refers to the process of examining and analysing relationships or connections between
various entities or elements within a dataset. It focuses on understanding the links, associations, or
dependencies among different data points to uncover patterns, structures, and insights that may not be apparent
from the individual data elements alone.
Page 24 of 28
PS _ Basics of Data Analytics
Link analysis is commonly used in fields such as network analysis, social network analysis, fraud detection,
cybersecurity, and recommendation systems. It helps to reveal the underlying relationships and interactions
between entities, allowing for a deeper understanding of complex systems and phenomena.
Ranking:
Ranking in data analysis refers to the process of ordering or sorting data elements based on a specific criterion
or measure of importance. It assigns a numerical or ordinal position to each data point, indicating its relative
standing or value within a dataset.
Ranking is commonly used to identify the top or bottom performers, prioritize items, or understand the relative
importance or significance of data elements. It can be applied to various types of data, including numeric values,
categorical variables, or qualitative attributes.
Criteria for Ranking: Ranking is based on a specific criterion or metric that defines the basis for ordering the data
elements. The choice of criterion depends on the analysis objectives and the nature of the data. For example, in
sales data, ranking may be based on revenue generated, while in a survey dataset, it could be based on customer
satisfaction scores.
Ascending or Descending Order: Rankings can be assigned in ascending or descending order. Ascending order
ranks data elements from the smallest or least significant to the largest or most significant, while descending
order ranks them in the opposite direction.
Ties or Rank Equivalents: Ties occur when multiple data elements have the same value for the ranking criterion.
In such cases, various methods can be used to handle ties, including assigning the same rank to tied elements or
using tie-breaking rules to differentiate between them.
Rank Numbering: Each data element is assigned a unique rank number based on its position in the ranking order.
For example, the top-ranked element may have a rank of 1, the second-ranked element may have a rank of 2,
and so on.
Rank-Based Analysis: Once the data elements are ranked, further analysis can be performed based on their ranks.
This may include identifying top performers, outliers, trends across ranks, or segmenting the data based on rank
ranges.
Ranking can be applied in various scenarios, such as ranking products based on sales performance, ranking
employees based on performance evaluations, ranking search results based on relevance, or ranking stocks
based on market capitalization. It provides a structured and intuitive way to compare and understand the relative
importance or performance of different data elements within a dataset.
Sorting:
Sorting in reporting refers to the process of arranging data in a specific order to enhance its readability and
facilitate analysis. It involves reordering data based on one or more criteria to present information in a structured
and meaningful manner.
When generating reports, sorting is often applied to tables, lists, or charts to present data in a logical sequence.
It helps users quickly identify trends, patterns, or outliers and make data-driven decisions. The sorting process
can be performed in ascending or descending order, depending on the desired presentation or analysis
requirements.
Page 25 of 28
PS _ Basics of Data Analytics
Sorting Criteria: The choice of sorting criteria depends on the nature of the data and the objectives of the report.
Common criteria include alphabetical order for text data, numerical order for numeric values, chronological
order for dates or timestamps, or custom criteria based on specific attributes or metrics.
Single or Multiple Columns: Sorting can be applied to a single column or multiple columns in a table or list. Single-
column sorting rearranges the data based on the values in a single column, while multiple-column sorting
considers multiple criteria to establish the order.
Ascending and Descending Order: Ascending order sorts data in the lowest-to-highest or A-to-Z sequence, while
descending order sorts data in the highest-to-lowest or Z-to-A sequence. The choice of order depends on the
report's requirements and the interpretation of the data.
Interactive Sorting: In some reporting tools or software, users may have the option to interactively sort data on-
demand. This allows users to dynamically change the sorting criteria or order while exploring the report, enabling
them to focus on specific aspects of the data.
Hierarchical Sorting: In reports with hierarchical data structures, such as organizational charts or product
categories, sorting can be applied to establish the hierarchical order. This ensures that data elements are
displayed in a logical and hierarchical manner, facilitating understanding and navigation.
Sorting is a fundamental operation in reporting that helps to organize and present data in a coherent and
meaningful way. It allows users to quickly identify trends, outliers, or patterns, making it easier to derive insights
and make informed decisions based on the reported data.
Understand Your Audience: Start by understanding who will be consuming your visualization and their level of
familiarity with the data. Tailor your visualization to their needs, knowledge, and preferences to make it more
relevant and accessible.
Choose the Right Chart Type: Select a chart type that best represents the relationships and patterns in your data.
Consider factors such as the data's nature (e.g., categorical, numerical, time-series), the variables you want to
compare, and the story you want to convey. Common chart types include bar charts, line charts, pie charts,
scatter plots, and maps.
Simplify and Declutter: Keep your visualization clean and uncluttered by removing unnecessary elements. Avoid
excessive decorations, gridlines, or non-essential labels. Use whitespace effectively to enhance clarity and focus
attention on the important aspects of the data.
Highlight Key Insights: Identify the key insights or messages you want to convey and emphasize them in your
visualization. Use visual cues such as colour, size, or annotations to draw attention to critical data points or
trends.
Ensure Data Accuracy: Ensure that your data is accurate, reliable, and properly represented in the visualization.
Validate your data sources and calculations to maintain credibility and prevent misinterpretation.
Use Appropriate Colours: Choose colours thoughtfully to enhance readability and interpretation. Use a limited
colour palette that is visually appealing and provides clear contrasts. Ensure color choices are accessible for color-
blind viewers.
Page 26 of 28
PS _ Basics of Data Analytics
Provide Context and Labels: Provide sufficient context to help viewers understand the data. Include clear and
descriptive titles, axis labels, legends, and units of measurement. Provide contextual information or explanations
as necessary.
Use Intuitive Design: Design your visualization with a user-cantered approach. Arrange data elements logically
and intuitively, considering reading patterns (e.g., left-to-right, top-to-bottom). Align elements consistently to
create a harmonious and organized display.
Use Interactivity Wisely: If your visualization is interactive, ensure that the interactivity adds value and doesn't
overwhelm or distract the viewer. Provide intuitive and meaningful interactions that allow users to explore the
data and gain deeper insights.
Test and Iterate: Test your visualization with representative users to gather feedback and identify areas for
improvement. Iterate and refine your design based on user feedback and data storytelling objectives.
Remember, these are general guidelines, and the specific requirements of your data and audience may warrant
deviations from these rules. The key is to understand the principles behind effective data visualization and adapt
them to your specific context to create impactful and meaningful representations of your data.
Colour Standards:
Color plays a crucial role in data visualization, and following best practices for color selection can enhance the
clarity, accessibility, and effectiveness of your visualizations. Here are some color standard best practices to
consider:
Use a Limited Color Palette: Limit the number of colors used in your visualization to maintain simplicity and avoid
visual clutter. A small, carefully chosen color palette improves visual coherence and reduces confusion. Using too
many colors can make it difficult to distinguish between data categories or values.
Choose High Contrast Colors: Select colors that provide sufficient contrast to ensure readability and accessibility.
High contrast between data elements and background or between different data categories improves visibility
for all users, including those with color vision deficiencies. Test your color choices in grayscale or with color blind
simulations to ensure they remain distinguishable.
Consider Color Meaning and Associations: Colors can carry cultural, psychological, or contextual meanings. Be
mindful of the associations or connotations that certain colors may have in your target audience or specific
context. For example, using red to represent positive values might be counterintuitive in many cultural contexts.
Use color in a way that aligns with the intended message or narrative of your visualization.
Account for Color Vision Deficiencies: Around 8% of men and 0.5% of women have some form of color vision
deficiency. Consider designing your visualizations to accommodate these users. Avoid relying solely on color to
convey information, and use additional visual cues, such as patterns, labels, or shapes, to differentiate data
elements.
Leverage Color Scales: When visualizing continuous data or gradients, use color scales that represent a clear
progression from low to high values. Gradual changes in hue, saturation, or brightness can effectively convey
magnitude or intensity. Consider using color scales that are perceptually uniform, such as those based on the
CIELAB or HCL color models, to ensure that equal steps in the color scale correspond to perceptually equal
changes.
Be Mindful of Color Combination: Ensure that your chosen colours work well together and do not cause visual
discomfort or confusion. Avoid combinations that create strong vibrations, such as red and blue, or combinations
Page 27 of 28
PS _ Basics of Data Analytics
that make it difficult to differentiate data elements. Tools like color harmony or color palette generators can assist
in finding pleasing and harmonious color combinations.
Test for Accessibility: Test your visualizations for accessibility by checking color contrast ratios. Ensure that the
color contrast between text and background meets accessibility standards, such as those outlined in the Web
Content Accessibility Guidelines (WCAG). Various online tools and color contrast checkers are available to assist
in verifying the accessibility of your color choices.
Document and Share Color Usage: Maintain a clear and consistent documentation of the colours used in your
visualizations. This helps ensure continuity across different charts or reports and enables others to reproduce
your visualizations accurately.
Remember, the best color choices depend on the specific context, audience, and objectives of your visualization.
Applying these color standard best practices can help create visually appealing, accessible, and effective data
visualizations.
Page 28 of 28