Data Analytics Using Spreadsheets I-Reference Notes
Data Analytics Using Spreadsheets I-Reference Notes
1|Page
Data Analytics using Spreadsheets I Theory
It cannot be used to find unknown relations. With the use of this, one might discover new
relationships.
Data analysis needs to be defined in the Data analytics consists of multiple stages
beginning as it involves cleaning and including collecting data and evaluating
transforming the raw data. business data.
Example: Example:
A new trader in stock market, researching As a result of the stock traders newfound
share-market and trend records to get a understanding of the stock pattern, he can
sense of what’s going on in the market. This now estimate the stock’s future market
technique includes data analysis. price and purchase some shares. This serves
as an example of a data analytics process.
• Descriptive analytics is a simple, surface-level type of analysis that looks at what has
happened in the past. The two main techniques used in descriptive analytics are data
aggregation and data mining—so, the data analyst first gathers the data and presents
it in a summarized format (that’s the aggregation part) and then “mines” the data to
discover patterns. The data is then presented in a way that can be easily understood
by a wide audience (not just data experts). It’s important to note that descriptive
analytics doesn’t try to explain the historical data or establish cause-and-effect
relationships; at this stage, it’s simply a case of determining and describing the “what”.
Descriptive analytics draws on the concept of descriptive statistics.
• Diagnostic analytics: While descriptive analytics looks at the “what”, diagnostic
analytics explores the “why”. When running diagnostic analytics, data analysts will
first seek to identify anomalies within the data—that is, anything that cannot be
explained by the data in front of them. For example: If the data shows that there was
a sudden drop in sales for the month of March, the data analyst will need to
investigate the cause. To do this, they’ll embark on what’s known as the discovery
phase, identifying any additional data sources that might tell them more about why
such anomalies arose. Finally, the data analyst will try to uncover causal
relationships—for example, looking at any events that may correlate or correspond
with the decrease in sales. At this stage, data analysts may use probability theory,
regression analysis, filtering, and time-series data analytics.
• Predictive analytics: Just as the name suggests, predictive analytics tries to predict
what is likely to happen in the future. This is where data analysts start to come up with
actionable, data-driven insights that the company can use to inform their next steps.
Predictive analytics estimates the likelihood of a future outcome based on historical
data and probability theory, and while it can never be completely accurate, it does
eliminate much of the guesswork from key business decisions. Predictive analytics can
be used to forecast all sorts of outcomes—from what products will be most popular
at a certain time, to how much the company revenue is likely to increase or decrease
2|Page
Data Analytics using Spreadsheets I Theory
3|Page
Data Analytics using Spreadsheets I Theory
on how to move forwards. This is also a good time to highlight any limitations to your data
analysis and to consider what further analysis might be conducted.
• Pivot tables and pivot charts: Pivot tables provide a simple approach to reformatting
columns and rows, transforming them into groupings, statistics, or summaries. Pivot
charts visualize the data expressed in a pivot table, giving us insight at a glance.
• Conditional formatting: Conditional formatting allows you to highlight or hide cells
based on a rule you specify. It is useful for highlighting outliers, duplicates, or patterns
in data.
• Charts: Charts allow to illustrate workbook data graphically, which makes it easy to
visualize comparisons and trends.
• Remove duplicates: Data is often messy, so it is important that you know how to
remove duplicates. Using conditional formatting rules, you can highlight the duplicate
data to review it before deleting it. Spreadsheets have a remove duplicate feature.
• LOOKUP is used to lookup a value from a range. It essentially allows you to use a
selected range as a lookup table and return a “looked up” result to a cell. This is
essential to combine data from different data sources.
• IFERROR: The IFERROR function is used to create a custom error message when a
formula results in an error. This is useful in cleaning data.
• COUNTBLANK: The COUNTBLANK function is an important function for data cleaning
in analytics because many machine learning algorithms are sensitive to blank values.
By knowing how many values are blank, you have a better understanding of how to
approach them. For example, if a lot of values are blank you should drop the column.
If few values are blank you should assign a value to fill the blank. COUNTBLANK counts
the number of empty cells in a range.
4|Page
Data Analytics using Spreadsheets I Theory
Data Element: A data element is simply the recorded observation of a specific property
possessed by a member of a particular group of individuals or objects. For Example: Male is a
data element which corresponds to the gender of the second person Rahul Singh; 19 is a data
element which corresponds to the Age of the third person Sarita Fernandes.
Record: The data corresponding to a row or line is called a record. For Example: (2, Rahul
Singh, 2303035, 18, Male) is one record.
Field variable: The header of a column is referred to as a field or attribute. For Example: Name
is a field, Roll No is a field, Age is a field, Gender is a field.
Data Set: The data elements are consistently gathered together to obtain a data set. Here the
data elements are corresponding to a certain object. For example, details corresponding to a
student can be one data set. Details corresponding to car details could be another data set.
One should not combine details corresponding to a person and car within a data set as that
may not make sense. The elements of a data set are related to each other in some way such
as they may be attributes (such as name, roll no, age, gender) corresponding to a person.
5|Page
Data Analytics using Spreadsheets I Theory
6|Page
Data Analytics using Spreadsheets I Theory
• Categorical (also called nominal) data is used to label or categorize certain variables
without giving them any type of quantitative value. For example, if you were collecting
data about your target audience, you might want to know where they live. Are they
based in the UK, the USA, Asia, or Australia? Each of these geographical classifications
count as categorical data. Another simple example could be the use of labels like
“blue,” “brown,” and “green” to describe eye color.
• Ordinal data is when the categories used to classify your qualitative data fall into a
natural order or hierarchy. For example, if you wanted to explore customer
satisfaction, you might ask each customer to select whether their experience with
your product was “poor,” “satisfactory,” “good,” or “outstanding.” It’s clear that
“outstanding” is better than “poor,” but there’s no way of measuring or quantifying
the “distance” between the two categories.
Categorical and ordinal data tends to come up within the context of conducting
questionnaires and surveys. However, qualitative data is not just limited to labels and
categories; it also includes unstructured data such as what people say in an interview, what
they write in a product review, or what they post on social media.
7|Page
Data Analytics using Spreadsheets I Theory
8|Page
Data Analytics using Spreadsheets I Theory
9|Page
Data Analytics using Spreadsheets I Theory
• Logical functions: These are used to compared data in different cells. Depending on
the logical functions used, spreadsheets populates the cell with the logical formula as
TRUE or FALSE depending on the calculation of the formula.
• Text functions: These are powerful components of spreadsheets that convert
numbers into letters and can also remove or copy letters or numbers from other cells
into the current cell.
• Date & Time functions: These format numbers into dates. There are many options
available to return the date as desired from a variety of data sets. Date functions work
with dates and times. Each function performs a simple operation and by combining
several functions within one formula you can solve more complex and challenging
tasks.
• Lookup and Reference formulas: These allow us to work with large sets of data, and
especially useful when you need to reference between multiple data sets. They can
provide information about a range of data, find the location of a given address or
value, or look up certain values in a large set of data.
• Statistical Functions: These are responsible for statistical analysis calculating items like
mean, median, mode, etc.
10 | P a g e
Data Analytics using Spreadsheets I Theory
11 | P a g e
Data Analytics using Spreadsheets I Theory
For Example: If Reema is an art teacher, who uses her Google Drive to organize letters, lesson
plans, and more. Reema has many files. She decides if she wants to share a file or keep it
private without sharing. Some examples of files and how she controls the access are as
follows:
• Her spreadsheet with classroom expenses she decides to keep private and not share.
• Lesson-planning documents she creates, she shares with her co-teacher and lets her
edit.
• Newsletters and announcements she shares with her students and their parents
publicly but doesn't let others edit.
Others also share files with Reema. These include ones she can edit, like her co-teacher's
supply inventory; and ones she can't, like a schedule sent to her by the principal
As you can tell, no single sharing setting would be right for all of Reema's files. The settings
we choose for each of our shared files will probably depend on why we are sharing it in the
first place. When we share a file with a limited group of people, our collaborators must sign
in with a Google account to view or edit the file. However, when we share with a larger group
or make the file public, our collaborators will not need a Google account to access the file.
We can easily share a file with a larger group of people by providing a link to any file in our
Google Drive. A link is basically a URL or web address for any file we want to share. This can
be especially helpful for files that would be too large to send as an email attachment, like
music or video files. We can also share a file by posting the link to a public webpage. Anyone
who clicks the link will be redirected to the file.
12 | P a g e
Data Analytics using Spreadsheets I Theory
Spreadsheets like Excel provides some techniques and functions to clean the data. The most
widely used techniques and functions are:
• Removing Duplicates: Duplicate data refers to two or more entries that share the same
values in key fields. Identifying and handling duplicate data is essential for maintaining
data quality and ensuring accurate analyses.
• TRIM Function: The TRIM function is used to remove extra spaces from a text string,
leaving only a single space between words and no leading or trailing spaces.
• Convert Numbers Stored as Text into Numbers: It refers to the process of changing
numerical data that is stored as text in a digital format into actual numeric values.
Sometimes the numeric data is stored as text due to formatting issues or data
import/export processes. This can lead to issues when performing calculations or
analyses that require numeric data. The VALUE function is used to convert numbers
stored as text into numbers.
• Highlight Errors: In spreadsheets, you can easily highlight errors in your spreadsheet
to quickly identify and correct them. Errors can include things like #DIV/0!, #VALUE!,
#REF!, #NAME?, #NUM!, #N/A, or #NULL!. These errors can cause issues when
performing calculations or analyses that require numeric data. It is better to deal with
these errors before proceeding with further analysis. The errors can be highlighted
using conditional formatting and choosing “Highlight Cells With” and “Errors” under
Rule Type.
• Change Text to Lower/Upper/Proper Case: We can easily change the case (lowercase,
uppercase, or proper case) of text using built-in functions or formulas. This improves
the readability of the data. Use the UPPER function to convert text to uppercase. Use
the LOWER function to convert it to lowercase. To convert text to proper case
(capitalizing the first letter of each word), use the PROPER function.
13 | P a g e
Data Analytics using Spreadsheets I Theory
Unsorted Data
14 | P a g e
Data Analytics using Spreadsheets I Theory
15 | P a g e
Data Analytics using Spreadsheets I Theory
16 | P a g e
Data Analytics using Spreadsheets I Theory
In addition, we could also filter the data based on colour or icons to narrow down and see
only the good performers or bad performers as shown below:
17 | P a g e
Data Analytics using Spreadsheets I Theory
case, you might use contrasting colors that are also associated with the two different
platforms — light blue and pink-purple.
• Use color to make important information stand out: When you’re trying to highlight
something important, such as data relevant to a particular county or zip code, a bright
or saturated color can help it stand out. For example, you may choose to use gray for
less-important variables and a deep red or orange for the most important variable.
You could also use muted colors for the less-important ones and a bright color for the
most important one.
• Don’t pick colors that aren’t easily distinguishable: If you can’t distinguish between
colours easily then the data can be confusing to understand.
• Don’t Use Too Many Colors: Because the brain struggles to process many different
things at once, using a limited color set in your visualizations will improve speed to
insight.
18 | P a g e
Data Analytics using Spreadsheets I Theory
Principles of charting
Charts is a type of data visualization which takes a bunch of numbers and information and
turns it into pictures or any kind of charts that are easier to understand. It takes a big pile of
information and sorts it into pictures (like bar charts, line graphs, or pie charts) that make it
easier to understand or see patterns and trends. Data can be a jumble of numbers and facts.
Charts and graphs turn that jumble into pictures that make sense. However, if the charts are
poorly prepared then the charts can be confusing and counter-productive.
Types of basic charts
Some types of basic charts are as follows:
• Column Charts: Column charts use vertical bars to represent data. They can work with
many different types of data. They are mostly used for comparing information.
• Line Charts: Line graphs are used to display data over time or continuous intervals.
Line charts are ideal for showing trends. The data points are connected with lines,
making it easy to see whether values are increasing or decreasing over time.
19 | P a g e
Data Analytics using Spreadsheets I Theory
• Pie Charts: Pie charts are circular graphs divided into sectors, where each sector
represents a proportion of the whole. The size of each sector corresponds to the
percentage or proportion of the total data it represents. Pie charts make it easy to
compare proportions. Each value is shown as a slice of the pie, so it is easy to see
which values make up the percentage of a whole.
• Bar Charts: Bar charts work just like column charts, but they use horizontal rather
than vertical bars.
• Area Charts: Area charts are similar to line graphs but with the area below the line
filled in with colour. They are used to represent cumulative totals or stacked data over
time. Area charts are effective for showing changes in composition over time and
comparing the contributions of different categories to the total.
20 | P a g e
Data Analytics using Spreadsheets I Theory
Sep, ₹ May, ₹
21,672.00, 12% 21,497.00, 12%
Aug, ₹ Jun, ₹
51,878.00, 29% 58,050.00, 33%
Jul, ₹ 23,737.00,
14%
21 | P a g e
Data Analytics using Spreadsheets I Theory
If we want to compare the different types of sales we could use a column chart.
Sales Comparison
₹ 40,000.00
₹ 35,000.00
₹ 30,000.00
₹ 25,000.00 In Store Sales
₹ 20,000.00
Mail Order Sales
₹ 15,000.00
Web Site Sales
₹ 10,000.00
₹ 5,000.00
₹ 0.00
May Jun Jul Aug Sep
Or
Sales Comparison
₹ 40,000.00
May
₹ 30,000.00
Jun
₹ 20,000.00
Jul
₹ 10,000.00 Aug
₹ 0.00 Sep
In Store Sales Mail Order Sales Web Site Sales
Based on how we want to present the data we have to decide what we want to show on the
horizontal axis as shown above. In the first chart we can compare the different types of sales
within a month as horizontal axis has the month. However, in the second chart we can
compare sales performance for different months within a given type of sales. We could
choose a stacked column chart if we want compare both proportions across months in a
type of sale as well as performance across different types of sales as shown below.
22 | P a g e
Data Analytics using Spreadsheets I Theory
• Colour Scales change the colour of each cell based on its value. Each colour scale uses
a two or three colour gradient. For example, in the Green-Yellow-Red colour scale, the
highest values are green, the average values are yellow, and the lowest values are red.
• Icon Sets add a specific icon to each cell based on its value.
23 | P a g e
Data Analytics using Spreadsheets I Theory
• SUMIFS: The SUMIFS function is a premade function in Excel, which calculates the sum
of a range based on one or more conditions.
The syntax of the function is
=SUMIFS(sum_range, criteria_range1, criteria1, [criteria_range2, criteria2] ...)
The conditions are referred to as criteria1, criteria2, and so on, which can check things
like:
o If a number is greater than another number >
o If a number is smaller than another number <
o If a number or text is equal to something =
The criteria_range1, criteria_range2, and so on, are the ranges where the function
check for the conditions. The [sum_range] is the range where the function calculates
the sum.
Example of SUMIFS in Pictures:
24 | P a g e
Data Analytics using Spreadsheets I Theory
• COUNTIFS: The COUNTIFS function is a function in Excel, which counts cells in a range
based on one or more conditions.
The syntax of the function is
=COUNTIFS(criteria_range1, criteria1, [criteria_range2, criteria2], ...)
The conditions are referred to as criteria1, criteria2, and so on, which can check things
like:
o If a number is greater than another number >
o If a number is smaller than another number <
o If a number or text is equal to something =
The criteria_range1, criteria_range2, and so on, are the ranges where the function
check for the conditions.
Example of COUNTIFS in Pictures:
25 | P a g e
Data Analytics using Spreadsheets I Theory
26 | P a g e
Data Analytics using Spreadsheets I Theory
Pivot tables’ usefulness is easy to understand: If we have a large chunk of data, Pivot Tables
help us to turn the data set into useful reports and summaries. Visualizations are available,
too in form of Pivot Charts. Some use cases for pivot tables are as follows:
• Run automatic calculations on summed or counted values: Pivot tables are efficient
at performing calculations on large data sets. By summarizing and organizing data, we
can effortlessly compute sums, counts, averages, and more. This feature is perfect if
our work requires us to deal with sensitive financial data. For instance, we can benefit
from pivot tables if we are a financial analyst tallying expenses or a sales manager
assessing our business’ revenue. Pivot tables are also great if we need to generate
quick insights but don’t have the time to do manual calculations.
• Create percentages of totals: With pivot tables, you have a straightforward way of
generating percentages of totals. This lets us grasp the proportional contribution of
each data category in the data set. This is perfect if we want to get a holistic view of
the data. Marketers, for example, find pivot tables useful when evaluating their
campaign’s performance. They’re also ideal for project managers assessing resource
allocation. Using pivot tables, we can transform absolute values into insightful
percentages. This improves our data interpretation and decision-making quality.
• Segment data by date, the user, or other variables and calculate totals: Pivot tables
allow us to segment data effortlessly. This enables in-depth analysis based on specific
criteria such as date, user name, or other customizable variable. With this feature, we
can generate time-sensitive reports invaluable for assessing trends, identifying
patterns, and making informed decisions. For example, if you’re a sales manager, you
can analyze revenue by quarter. If you’re an HR professional, you can analyze
employee performance by department. Project managers can track progress across
different phases.
27 | P a g e