‘THE UNIVERSITY
OF QUEENSLAND.
Excel
Data Analysis
Course objectiv
+ Import data
‘+ Use statistical functions in Excel
+ Create histograms
‘Gain insights from your data
Student Training and Support
Phone: (07) 33484312
mal askus@brary. uqedu.ay
Web: ntpsuiweb lirary.uq edu.aufibrary-services/raining!
Staff Training (Bookings)
Phone (07) 3385 2566
Email stadev@uq.edu.au
Web np: ug edu auistattdevelopment
‘taf may contact the wane wth enqunes and feedback related o raring cortent Pease covtact Stat Development fr booking
lenges or your local LT. support for genera tectrical enque.
Reproduced or adapted from exginal content provided under Creative Commons lense by
oxo -va utray, THE UNIVERSITY
Staff and Student, Traiing (OF QUEENSLAND
Table of Contents
Importing External Data.
Exercise 1. Importing Exteral Data
Exercise 2. Importing data from a file.
Descriptive Statistics.
Exercise 3. Using Deserpve States...
‘Statistical Functions,
Exercise 4. Using basic statistical functions in Excel
Using Variance and Standard Deviation in Exc...
Exercise 5. Variance and Standard deviation
Histograms and Frequency oso.
Exercise 6. Creating histograms.
‘Correlation and Linear Regression.
Exercise 7. Calculate Corlation Co-eficient
Exercise 8. Create Chart and Linear Regression.
Exercise 9. Forecasting
THOS nnn 2
Exercise 10. Significance test...
ANOVA: Analysis of Variance i
Exercise 11. ANOVA: Analysis of Variance..
Rank and Percenties. ss
Exercise 12. Obtaining your Rank.
Exercise document:
Go to https://web library.ug.edu,aulibrary-services/training/training-resources and click on Data
Analysis (ZIP,40.9 KB) to download. Save these files on your H/ drive or to your local machine or
a USB drive.
‘Statistical Function definitions can be found at:
‘tps:/support office. com/en-us/article/statistical-functions-reference-624dac86-375-4436-bc25-
‘7646597 19114
20119 Microsoft Excel: Data Analysis‘THE UNIVERSITY
(OF QUEENSLAND.
VO Library
Staff and Student LT. Training
Importing External Data
Data located in compatible extemal files can be imported into excel without the need to retype all
the information again. Depending on the format of the data you Would lke to impor, diferent
methods can be used, including opening and saving in Excel, linking to data, importing data and
copying and pasting data into excel
Exercise 1. Importing External Data
‘Open the spreadsheet Data Analysis_Exercises.xlsx (which can be found under the Excel
section on the Library Training Resources page. The External Data Link sheet is selected.
Importing Data from websites
Data from websites and other sources can be imported into Excel if itis in an appropriate format.
1. Copy the URL of the web page with the data you
want to import.
‘¢.9. World University Rankings on Wikipedia (which can be found 4 QS World University Rening —
{in col 1 ofthe External Data Link shoot) : =
len wikipedia orpwiOS. World Uni
Note: For this exercise ignore From Web in the Get
External Data group. It will bring in the entire web
page and not just a selected table
2. Navigate to the Data tab
3. Click on New Query (in the Get & Transform
group)
4. From the drop down menu, select From Other
‘Sources > From Web
“This opens the dalogue box for you to enter the URL of the web
‘page with the data you want to Import
5. Paste the URL in the From Web dialogue box
land click OK
‘The Navigator Pane will open wth alist of data that can be
Imported into excel
6. Select the required data set (QS World University
Rankings - Top 50) on the left pane of the
Navigator to preview it
'NB: You can use the edit button to lean the data before
Importing
7. Select QS World University Rankings ~ Top 50
8. Click on Load
3ott9 Microsoft Excel: Data Analysisva Litany ‘THE UNIVERSITY
‘Staff and Student LT. Training OF QUEENSLAND,
A connection wil be created tothe data on the website. This will
‘ensure that refreshing your excel le wil update the data to the =
latest version. Excel wil en open @ new worksheot wih the
Imported data,
Refresh Linked Data
9. Glick on any cell within the data table SG Som
10. Click on the Data tab TWromTatie REE ropes A
11. Select Refresh All emimaeel' Lae
NB: Retoah al wit eres al connectons inthe workbook you <= AR
wanttoreesh data on a single sheet ck Reesh, 1B brah
NB: You may get a Mcrosof Excel Security Notice about
connectons to external data sources. You can safely cick OK here
‘ut soe the section on Considerations when importing data Into,
Excel below fr further information
Considerations when importing data into Excel
Malware / Macros — Unfortunately there are ways to hide malware inside Excel files. This is
usually done via "macros" which are litle programs that are typically created to do complex or
repetitive tasks. Because hackers have exploited these tools, Microsoft has disabled macros by
default in Excel. In fact, when you open an Excel file from an untrusted source, you will get a
‘security warning like this one._If you are working on data from an unknown or untrusted source,
use caution before “Enabling Editing”
‘Some hackers have even learned to use social engineering techniques to try and trick users into
turing macros back on. For example there may be an image in the fle that appears blurred with a
note that itis for security reasons. The goal is to get you to enable macros eo that you can ‘see’
the Image when, in reality, enabling the macro allows the virus to run. Of course if you have good
anti-virus / anti-malware programs installed, they will go a long way towards mitigating that threat.
References within a file or sheet to external data
‘You can refer to the contents of cells in another Excel workbook by creating an external reference.
‘An external reference (also called a link) isa reference to a cell or range on a worksheet in another
Excel workbook, or a reference to a defined name in another workbook. If your data is coming
from a source beyond your immediate control, you may find that these links’ are broken. If you
don't have access to the workbooks/worksheets where the underlying data lives, you won't be able
{0 use it via the link in the spreadsheet you are currently working on.
40119 Microsoh Excel: Data Analysis‘THE UNIVERSITY
(OF QUEENSLAND,
Exercise 2. Importing data from a file
Open exercise files and enable content
1. Open the exercise fle Data
Analysis_Exercises.xlsx and select the
Importing Data & Histograms
worksheet "= *
2. Click on the Enable Content button on
the Security Warning (if necessary)
va Library
Staff and Student LT. Training
3. Ifyou get a Security Warning dialog box.
Click on Yes he a
Cleentpemrentencteenien (SS (ER
Note: In Office 365 (Windows version) Microsoft removed the Text Import Wizard as an option
When using steps below. They force you to use the Power Query window which does not have
the “Treat consecutive delimiters as one” option. You can get around this by opening the text file
directly in Excel which will launch the wizard below
Import data from text fi
4. Click the Data tab
5. Click From Text (in the Get External Data
group)
6. Locate data_analysis.txt
7. Glick on Import (in Mac ~ Get Data)
By Crekon Dstenned open poten womer eee
9. Click Next carmen
Testimpon Ward Step 2013
10. Tick the following options:
aes even sce
Catmter
Space 0
‘Treat consecutive delimiters as one
11. Click Next
Sotto Microsoft Excel: Data AnalysisvoUutray ‘THE UNIveRsiTY
Staff and Student. Training (OF QUEENSLAND
Testinpon Wend Sep 39
12. Ensure General option is selected Th scaeen ets you eet each cohamn and set te Data Forma
13. Click Finish a
a
Orme “cena eres nan
Otte [ou
14, Assign data to $AS1 in existing worksheet
15. Click OK
‘Sle now yeu ante went cata in your wontbece
te
retabie Report
3 Petron
Bh Conn create connedion
are ge you mantt put tne cata?
© besting washes
Otten wonsneet
aca ve stato te Ost paced
fees) a]
Microsoft Excel: Date Analysis‘THE UNIVERSITY
(OF QUEENSLAND,
va Library
Staff and Student LT. Training
Descriptive Statistics
Descriptive statistics is the discipline of quantitatively (expressed as numbers) describing the main
features of a collection of data. Excel's Analysis Toolpak add-in offers a variety of features to
undertake statistical computations and graphing. Descriptive Statistics is included to provide
statistical averages (mean, mode, median), standard error, standard deviation, sample variance,
kurtosis and confidence levels of sample data.
Exercise 3.
‘Mac users may need to add the Analysis Tool Pack
Dest Statistics
Data Tab - Far right hand side ~ click Analysis
Tools button
Click next to Analysis ToolPak
Choose OK
1. Click Data Analysis (at the far right of ribbon)
on the Data tab
=
2. Click Descriptive Statistics Elon ji
3. Click OK —_—————
Tot19 Microsoft Excl: Data AnalysisUO Library
Staff and Student LT. Training
‘THE UNIVERSITY
OF QUEENSLAND
4, Highlight cells $A$1:$D$201 for Input Range | |
Tasaee a =|
5, Select Grouped by columns ere some =
6. Click Labels in first row box Gece Om
7. Click Output Range ——
8 Highlight cell $G$1 for Output Range cn |
9. Select Summary statistics
Charts group > Recommended
Charts
3. Select Scatter
Add the regression line
1. Click Add Chart Element button —
Trendline— Linear Trendline
2. The Trendiine will appear on the chart
3. Right click the Trendline
4. Choose Format Trendline
5. Within Trendline Options.
6. Select Checkbox to "Display Equation on
Chart”
Select Checkbox to “Display R-squared
value on chart”
towards the top right ofthe chart. If the formulas are
‘obscured by the Trendiine, you can move them by
‘selecting the text box with the formulas and then drag it
to where you want,
14ot19
Format Trendline
Trendline Options
oO ll
4 Trendine Options |
Z Display Equation on chart
J Display B-squared value on chart
Microsoft Excel: Date Analysis‘THE UNIVERSITY
(OF QUEENSLAND,
va Library
Staff and Student LT. Training
‘To Find Regression Summary
1. Click on Data Analysis on Data tab (far
fight on ribbon)
2. Select Regression
3. Click on OK
Input Y range, Select C4:614
Input X range, Select B4:B14
‘Output Range, Select A22
Click on OK
Noose
‘Note: You willbe presentad with Summary Output which
Includes regression analysis,
Dette: —]
Interpreting results: A demonstrated strong positive correlatio
Equation (Y=mx+c) Y = 308.63x + 4018.1 Matches the coefficients in regression summary
Intercept indicates the predicted cost of tution in the Year 2000. This is the line of best fit value not the
‘actual value(the line of best fit value for ¥if X=
X Variable indicates the average increase in $ in tuition fees year to year approximately $308.63,
Forecasting
Forecasting is estimating the likelihood of an event taking place in the future, based on available
data. Statistical forecasting concentrates on using the pas fo predict the future by identifying
trends, patterns and business drives within the data to develop a forecast.
Exercise 9. Forecasting
Use worksheet “Correlation & Linear Regression’
In Excel the FORECAST function takes raw trendline data, an input (independent variable)
and retums the dependent variable
1. Click in $¢$20
2. Click the Insert Function button ——
3. Select Forecast from the list of functions
(search for Forecast in the search box if you
cannot see it)
150t19 Microsoft Excl: Data Analysis:‘THE UNIVERSITY
(OF QUEENSLAND.
UO Litrary
Staff and Student LT. Training
4. X, select B20
Known_y’s, select C4:C14 (the range name
Tuition_Fees will appear)
6. Known_y’s, select B4:814 (the range name
Year will appear)
7. Note how the indicated answer matches the Intercept
value ofthe regression analysis
Click OK
In cell B20 type 20 to forecast the cost of
tuition fees in year 20,
T Tests
TTests are performed when you have two sets of measurements or results from given populations
and you would like to compare them to see if they are significantly different.
For example you may have two lists of measurements from the same set of people. The first set of
measurements may have been taken in the morning and the second set in the afternoon. This type
of TTest is known as a related TTest or a paired Test because you have tested the same population
twice.
Alternatively if you had two sets of measurements taken from two sets of people with one set being
in the morning and the other in the afternoon you would have an unpaired or independent TTest.
This is because you have tested two different populations.
If you are sure about the direction of differences, for example that the morning measurements a
faster than the afternoon then you perform a one tail t test.
If you are unsure about the difference between the values perform a two tail t test.
A results called "statistically significant i the result of the t test comes in at below .05. This is often
referred to as the P Value.
Exercise 11 Significance
wT eT 3
Reaction Ties
Moming. Afternoon
cr)
On the T-Test spreadsheet are two series of 7 wn
‘measurements. ton 97
10001006
‘These measurements are paired as they are from ad
the same population but taken at different times, a
16ot19 Microsoft Excel: Data AnaysisUO Library
‘Staff and Student LT. Training
‘THE UNIVERSITY
OF QUEENSLAND
1. Select cell B12
Using the Insert Function button search for and
locate the T.Test function.
‘Note: The TTest function i tll avaiable for compatibility
purposes with Excel 2007 and below.
In the T.Test Function Arguments dialog box Array!
and Array2 are the cell ranges containing the two
columns of measurements.
In this case B3:810 and C3:C10
Tails can be either a 1 or a2
Use 1 if you are sure about the direction of the
differences.
Use 2 if you are unsure about the direction of the
differences.
‘Type can either bea 1, 20r3
Use 1 if your data is from a paired population.
Use 2 if your data is from an unpaired population
with an equal variance.
Use 3 if your data is from an unpaired population
with an unequal variance.
t7ott9
Microsoft Excel: Date Analysis‘THE UNIVERSITY
(OF QUEENSLAND,
UO Litany
Staff and Student LT. Training
ANOVA: Analysis of Variance
In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups
are all equal. The ANOVA test isthe initial step in identifying factors that are influencing a given data
‘set. Anova should be performed on 3 or more groups of data.
Exercise 1 ANOVA. of Variance
Use worksheet “ANOVA - Rank & percentile”
To conduct the one-way ANOVA
2. Click on Data Analysis on the Data Tab (far
right on ribbon)
Select Anova: Single Factor
4. Click OK.
5. Select the input range (A1:C13)
(automatically absolute references)
6. Click “Labels in first row” option asises13
7. Select Output Range (A16) ® cokes
8. Click OK. © Rows
© Quoutrange: sie
Note: Descriptive statistics and ANOVA summary table are displayed on screen
Interpreting results: in the summary section we can see the mean exam results for each class, But are
these differences statistically significant?
‘There are two types of hypotheses, Null (negative) or Aiterative (positive). It Is best practice to use null
hhypotheses so no personal opinions creep in to the testing statement.
‘A null hypothesis is a default position and can never be proven. Statistically results can only reject or fall 10
‘ject the null hypotheses.
‘Null hypotneses are always phrased as a negative statement e.g. There is no real diference between the
effectiveness of lectures, online delivery and video delivery.
‘The test result shows F =0.93 With a critical P-value of 4, the critical F = 3.288, Therefore, since the F
Statistic is smaller than the critical value, we failto reject the null hypothesis. Remember from before the
value is statistically significant if itis below .05. This value of 4 shows there is some connection in the data
though. So, we fllto reject that there is no difference between the effectiveness of lectures, online delivery
‘and video delivery. These values may be explained by the small sample size. A larger sample of data may
{give more statisticaly significant results. Apparenty, the differences we saw in his sample were simply due
to random sampling error.
1Bot 19 Microsoht Excel: Data Analysis‘THE UNIVERSITY
(OF QUEENSLAND,
UO Libary
‘Staff and Student LT. Training
Rank and Percentiles
Percentile rank means the percentage of scores that fall “at or below” a certain number. Percentiles
are most often used for determining the relative standing of an individual in a population or the rank
Position of the individual. Percentiles measure position from the bottom,
Exercise 12.
Use worksheet “ANOVA - Rank & percentile”
1. Click Data Analysis on the Data Tab
(far right on ribbon)
2. Click Rank and Percentile
3. Click OK
‘Complete dialog box:
4. Highlight cells SAS1:SC$13 for Input
Range
'NB: In this instance, do not merely cick on cokurn A
header as the program wil process every row in the
spreadsheet
In Grouped By, select Columns
Click Labels in first row
Select Output Range as $MS1 Sissel
Click OK muerte
comuteotrs
© avout:
New eae
Rene
Interpreting results:
Point - The location of the value within the original ist. This can be used to quickly sort the output table into
the same order of the original list.
‘Original - This Is the column containing the original values. This column has the same column name as the
Original ist since we used labels in the fst row.
Rank - This isthe rank of the corresponding number in the list.
Percent - This Is the numbers percentage rank within the ist. This percentage indicates the proportion of
thelist which are below this given number.
: hitps:/web library.uq.edu.aurfles/142294/20210831_Excel_Data_Analysis pdf
190t19 Microsoft Excel: Data Anaiysis