Data Analysis Techniques
Data Analysis Techniques
Data Analysis Techniques
of Suffolk
Assignment Brief
Course/Programme: BABS Foundation
Level: 3
Student ID:
Task 2
Line Chart:
Here is the Line Chart showing the hours of sleep for each day:
This line chart shows the trend of hours of sleep over the 10 days. We can see that there is a
general fluctuation in sleep hours, with some days having more sleep and others having less.
Bar Chart:
Here is the Bar Chart showing the hours of sleep for each day:
This bar chart provides a clear visual comparison of the hours of sleep for each day. We can
easily see which days had the most and least hours of sleep.
Task 3
I. Mean
To calculate the mean, we need to add up all the hours of sleep and divide by the total number of
days.
Steps:
1. Add up all the hours of sleep: 7.5 + 8.0 + 6.5 + 7.0 + 8.5 + 7.2 + 6.8 + 7.8 + 8.2 + 7.1 =
74.6
2. Divide the sum by the total number of days: 74.6 ÷ 10 = 7.46
Final Value: The mean hours of sleep per day is 7.46.
II. Median
To calculate the median, we need to arrange the data in order and find the middle value.
Steps:
1. Arrange the data in order: 6.5, 6.8, 7.0, 7.1, 7.2, 7.5, 7.8, 8.0, 8.2, 8.5
2. Since there are 10 days (an even number), the median is the average of the two middle
values: (7.2 + 7.5) ÷ 2 = 7.35
Final Value: The median hours of sleep per day is 7.35.
III. Mode
The mode is the value that appears most frequently in the data.
Steps:
1. Examine the data to find the value that appears most frequently: No value appears more
than once, so there is no mode.
Final Value: There is no mode in the data.
IV. Range
The range is the difference between the largest and smallest values.
Steps:
1. Find the largest and smallest values: Largest: 8.5, Smallest: 6.5
2. Calculate the range: 8.5 - 6.5 = 2.0
Final Value: The range of hours of sleep per day is 2.0.
V. Standard Deviation
To calculate the standard deviation, we need to follow these steps:
Step 1: Calculate the mean
We've already done this: Mean = 7.46
Step 2: Calculate the deviations from the mean
Deviations from the mean = (Hours of sleep - Mean)
Day Hours of Sleep Deviation from Mean
1 7.5 7.5 - 7.46 = 0.04
2 8.0 8.0 - 7.46 = 0.54
3 6.5 6.5 - 7.46 = -0.96
4 7.0 7.0 - 7.46 = -0.46
5 8.5 8.5 - 7.46 = 1.04
6 7.2 7.2 - 7.46 = -0.26
7 6.8 6.8 - 7.46 = -0.66
8 7.8 7.8 - 7.46 = 0.34
9 8.2 8.2 - 7.46 = 0.74
10 7.1 7.1 - 7.46 = -0.36
Step 3: Calculate the squared deviations
Squared deviations = Deviation from Mean ²
Day Squared Deviation
1 0.04² = 0.0016
2 0.54² = 0.2916
3 -0.96² = 0.9216
4 -0.46² = 0.2116
5 1.04² = 1.0816
6 -0.26² = 0.0676
7 -0.66² = 0.4356
8 0.34² = 0.1156
9 0.74² = 0.5476
10 -0.36² = 0.1296
Step 4: Calculate the variance
Variance = (Sum of squared deviations) / (n - 1) where n is the number of data points (10 in this
case)
Sum of squared deviations = 0.0016 + 0.2916 + 0.9216 + 0.2116 + 1.0816 + 0.0676 + 0.4356 +
0.1156 + 0.5476 + 0.1296 = 4.3012
Variance = 4.3012 / (10 - 1) = 4.3012 / 9 = 0.478
Step 5: Calculate the standard deviation
Standard Deviation = √Variance = √0.478 ≈ 0.691
Final Value: The standard deviation of hours of sleep per day is approximately 0.691.
Task 4
I. Calculate the m value
To calculate the m value, we need to find the slope of the linear regression line. We can use the
following formula:
m = (N Σ(xy) − Σx Σy) / (N Σ(x²) − (Σx) ²)
where n is the number of data points, x is the day number, y is the hours of sleep, and Σ denotes
the sum.
Steps:
1. Calculate Σx, Σy, Σxy, and Σx²:
Day x y xy x²
1 1 7.5 7.5 1
2 2 8.0 16.0 4
3 3 6.5 19.5 9
4 4 7.0 28.0 16
5 5 8.5 42.5 25
6 6 7.2 43.2 36
7 7 6.8 47.6 49
8 8 7.8 62.4 64
9 9 8.2 73.8 81
10 10 7.1 71.0 100
Σx = 1 + 2 +... + 10 = 55
Σy = 7.5 + 8.0 +... + 7.1 = 74.6
Σxy = 7.5 + 16.0 +... + 71.0 = 441.1
Σx² = 1 + 4 +... + 100 = 385
2. Plug the values into the formula:
m = (10 * 441.1 - 55 * 74.6) / (10 * 385 - 55²)
= (4411 - 4093) / (3850 - 3025) m = 318 / 825
= 0.385
So, the calculated m value is approximately 0.385.
Discussion:
The m value represents the slope of the linear regression line. In this case, the slope is
positive, indicating that as the day number increases, the hours of sleep also tend to increase. The
value of m is relatively small, suggesting that the relationship between the day number and hours
of sleep is not very strong.
II. Calculate the c-value
To calculate the c value, we can use the following formula:
c = (Σy - m * Σx) / N
where N is the number of data points, x is the day number, y is the hours of sleep, m is the slope,
and Σ denotes the sum.
Steps:
1. Plug the values into the formula:
c = (74.6 - 0.385 * 55) / 10 c = (74.6 - 21.225) / 10
= 53.375 / 10
= 5.3375
So, the calculated c-value is approximately 5.34.
Discussion:
The c value represents the y-intercept of the linear regression line. In this case, the y-
intercept is positive, indicating that when the day number is 0, the expected hours of sleep is
approximately 5.34 hours. This value can be interpreted as the baseline hours of sleep.
III. Forecasting
We can use the linear regression equation y = mx + c to forecast the number of hours of sleep for
day 11 and day 15.
Day 11:
y = 0.385 * 11 + 5.34 y = 4.235 + 5.34 y = 9.575
Day 15:
y = 0.385 * 15 + 5.34 y = 5.775 + 5.34 y = 11.115
So, the forecasted hours of sleep for day 11 is approximately 9.58 hours, and for day 15 is
approximately 11.12 hours.
Reference
Zou, K.H., Tuncali, K. and Silverman, S.G., 2003. Correlation and simple linear
regression. Radiology, 227(3), pp.617-628.