Graphical Analysis of Data Using Microsoft Excel (2016 Version)
Graphical Analysis of Data Using Microsoft Excel (2016 Version)
Graphical Analysis of Data Using Microsoft Excel (2016 Version)
In several upcoming labs, a primary goal will be to determine the mathematical relationship
between two variable physical parameters. Graphs are useful tools that can elucidate such
relationships. First, plotting a graph provides a visual image of data and any trends therein.
Second, via appropriate analysis, they provide us with the ability to predict the results of any
changes to the system.
best-fit line
Best-fit line Equation:
y = mx + b
y data
b = y-intercept
y
m = slope
= y/x = y2-y1/x2-x1
b x
x data
Computer spreadsheets are powerful tools for manipulating and graphing quantitative data. In
this exercise, the spreadsheet program Microsoft Excel will be used for this purpose. In
particular, students will learn to use Excel in order to explore a number of linear graphical
relationships. Please note that although Excel can fit curves to nonlinear data sets, this form of
analysis is usually not as accurate as linear regression.
Scenario – A certain experiment is designed to measure the volume of 1 mole of helium gas at
a variety of different temperatures, while keeping the gas pressure constant at 758 torr:
a) Launch the program Microsoft Excel (2016 version, found on all computers in all the
computer centers on campus). Go to the Start button (at the bottom left on the screen), then
click Programs, followed by Microsoft Excel.
b) Enter the above data into the first two columns in the spreadsheet.
Reserve the first row for column labels.
The x values must be entered to the left of the y values in the spreadsheet. Remember
that the independent variable (the one that you, as the experimenter, have control of)
goes on the x-axis while the dependent variable (the measured data) goes on the y-axis.
c) Highlight the set of data (not the column labels) that you wish to plot (Figure 1).
Figure 1
Figure 2
Choose the scatter graph that shows data points only, with no connecting lines – the option
labeled Scatter with Only Markers (Figure 3).
Figure 3
You should now see a scatter plot on your Excel screen, which provides a preview of your
graph (Figure 4).
Figure 4
If all looks well, it is time to add titles and label the axes of your graph (Figure 5).
First, click inside the chart.
Switch to the Design tab, and click Add Chart Element > Chart Title > Above Chart
The graph should be given a meaningful, explanatory title that starts out “Y versus X”
followed by a description of your system.
Click on Axis Titles (select Primary Horizontal Axis Title and Primary Vertical Axis Title)
to add labels to the x- and y-axes. Note that it is important to label axes with both the
measurement and the units used.
Figure 5
To change the titles, click the text box for each title, highlight the text and type in your new
title (Figure 6).
Figure 6
d) Your next step is to add a trendline to the plotted data points. A trendline represents the
best possible linear fit to your data. To do this you first need to "activate" the graph. Do this
by clicking on any one of the data points. When you do this, all the data points will appear
highlighted.
g) By graphing the five measured values, a relationship is established between gas volume
and temperature. The graph contains a visual representation of the relationship (the plot)
and a mathematical expression of the relationship (the equation). It can now be used to
make certain predictions.
For example, suppose the 1 mole sample of helium gas is cooled until its volume is
measured to be 10.5 L. You are asked to determine the gas temperature. Note that the
value 10.5 L falls outside the range of the plotted data. How can you find the temperature if
it doesn't fall between the known points? There are two ways to do this.
Method (1): Extrapolate the trendline and estimate where the point on the line is.
Click on the Layout tab along the top menu, then Trendline > More Trendline Options.
In the section labeled Forecast enter a number in the box labeled Backward, since we
want to extend the trendline the backward x direction. To decide what number to enter,
look at your graph to see how far back along the x-axis you need to go in order to cover
the area where volume = 10.5 L. After entering a number, click Close, and the line on
your graph should now be extended in the backward direction.
Now use your graph to estimate the x value by envisioning a straight line down from y =
10.5 L to the x-axis. Record this value on your report.
Method (2): Plug this value for volume into the equation of the trendline and solve for the
unknown temperature. Do this and record your answer on your report. Note that this
method is generally more precise than extrapolating and "eyeballing" from the graph.
Data A Data B
Amount of Dye (mol) Absorbance Amount of Dye (mol) Absorbance
(unitless) (unitless)
0.100 0.049 0.800 0.620
0.200 0.168 0.850 0.440
0.300 0.261 0.900 0.285
0.400 0.360 0.950 0.125
0.500 0.470
0.600 0.590
0.700 0.700
0.750 0.750
You would like to see how these two sets of data relate to each other. To do this you will have
to place both sets of data, as independent relationships, on the same graph. Note that this
process only works when you have the same axis values and magnitudes.
a) Enter this new data on a fresh page (Sheet 2) in Excel. Be sure to label your data columns
A and B. Again, remember to enter the x values to the left of the y values.
b) First, plot Data A only as an XY Scatter plot (the same way you did with the data in Part 1).
Fit a trendline to this data using linear regression, and obtain the equation of this line.
Figure 8
Linear
(Series1)
Figure 9
d) Print out a full-sized copy of your prepared graph and attach it to your report. Then record
the following information on your report:
the equation of the best-fit trendline for Data A,
the equation of the best-fit trendline for Data B,
If these trendlines were extrapolated, they would intersect. Determine the values of x
and y for the point of intersection using simultaneous equations.
When many independent measurements are made for one variable, there is inevitably some
scatter (noise) in the data. This is usually the result of random errors over which the
experimenter has little control.
Scenario – Ten different students at two different colleges each measure the sulfate ion
concentration in a sample of tap water:
College
35.9 43.2 33.5 35.1 32.8 37.6 31.9 36.6 35.0 32.0
#1
ppm ppm ppm ppm ppm ppm ppm ppm ppm ppm
dataset
College
45.1 34.2 36.8 31.0 40.7 29.6 35.4 32.5 43.5 38.8
#2
ppm ppm ppm ppm ppm ppm ppm ppm ppm ppm
dataset
Simple statistical analyses of these datasets might include calculations of the mean and median
concentration, and the standard deviation. The mean ( x ) is simply the average value, defined
as the sum () of each of the measurements (xi) in a data set divided by the number of
measurements (N):
x
x i
The median (M) is the midpoint value of a numerically ordered dataset, where half of the
measurements are above the median and half are below. The median location of N
measurements can be found using:
M (N 1) 2
When N is an odd number, the formula yields a integer that represents the value corresponding
to the median location in an ordered distribution of measurements. For example, in the set of
numbers (3 1 5 4 9 9 8) the median location is (7 + 1) / 2, or the 4th value. When applied to the
numerically ordered set (1 3 4 5 8 9 9), the number 5 is the 4th value and is thus the median –
three scores are above 5 and three are below 5. Note that if there were only 6 numbers in the
set (1 3 4 5 8 9), the median location is (6 + 1) / 2, or the 3.5th value. In this case the median is
half-way between the 3rd and 4th values in the ordered distribution, or 4.5.
Standard deviation (s) is a measure of the variation in a dataset, and is defined as the square
root of the sum of squares divided by the number of measurements minus one:
(x i x )2
s
N 1
So to find s, subtract each measurement from the mean, square that result, add it to the results
of each other difference squared, divide that sum by the number of measurements minus one,
then take the square root of this result. The larger this value is, the greater the variation in the
data, and the lower the precision in the measurements.
While the mean, median and standard deviation can be calculated by hand, it is often more
convenient to use a calculator or computer to determine these values. Microsoft Excel is
particularly well suited for such statistical analyses, especially on large datasets.
a) Enter the data acquired by the students from College #1 (only) into a single column of cells
on a fresh page (Sheet 4) in Excel. Then in any empty cell (usually one close to the data
cells), instruct the program to perform the required functions on the data. To compute the
mean or average of the data entered in cells a1 through a10, for example, you must:
click the mouse in an empty cell
type "=average(a1:a10)"
and press return
To obtain the median you would instead type “=median(a1:a10)”. To obtain the standard
deviation you would instead type "=stdev(a1:a10)".
Rejecting Outliers
Do all the measurements in the College #1 data set look equally good to you, or are there any
values that do not seem to fit with the others? If so, are you allowed to reject these
measurements?
Outliers are data points which lie far outside the range defined by the rest of the measurements
and may skew your results to a great extent. If you determine that an outlier resulted from an
obvious experimental error (e.g., you incorrectly read an instrument or prepared a solution), you
may reject the point without hesitation. If, however, none of these errors is evident, you must
use caution in making your decision to keep or reject a point. One rough criterion for rejecting a
data point is if it lies beyond two standard deviations from the mean or average.
c) Using the above criteria, determine if there are any outliers in the College #1 dataset.
Record these outlier measurements (if any) on your report.
Then, excluding the outliers, re-calculate the mean, median and standard deviation of
this data set (use Excel).
Rejecting data points cannot be done just because you want your results to look better. If
you choose to reject an outlier for any reason, you must always include documentation in
your lab report which clearly states:
that you did reject a point
which point you rejected
why you rejected it
Unlike the linear plots created so far, a scatter plot simply shows the variation in measurements
of a single variable in a given dataset, i.e., it supplies a visual representation of the “noise” in the
data. The data is plotted in a column, and there is no x-y dependence here (Figure 10). Note
that datasets with a greater degree of scatter will have a higher standard deviation and consist
of less precise measurements than datasets with a small degree of scatter.
Measurements
Experimental
Figure 10
To obtain such a plot using Excel, all the x values for each dataset must be identical. Thus, let
the College #1 data be assigned x = 1, and let x = 2 for all the College #2 data:
d) Enter the data as shown above into the first four columns of your spreadsheet.
Plot the College #1 dataset as an XY Scatter Plot.
Now add the College #2 dataset to this graph applying the same steps you used to
create your earlier graph in the section “Two Data Sets with Overlay” (Part 2).
Add appropriate axis labels and a title. You may also want to adjust the x-axis and y-
axis scales to improve the final look of your graph.
e) Print out a full-sized copy of your prepared graph and attach it to your report. Then record
the following information on your report:
Which dataset (College #1 or College #2) show the least scatter? The greater standard
deviation? The more precise measurements?