Arda Gozacanl Quant Essay

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 28

IY461- Quantitative Methods and Information Systems for Business

Statistical Report

Arda Gozacan

P277557
STATISTICAL REPORT

INTRODUCTION

A statistical report supplies exhaustive performance data pertaining to an action such as

archiving, extracting, or deleting data. Make use of the information contained in statistical

reports to find the methods that are the most successful in improving performance. You

could, for instance, modify the access mechanism for a database table, increase the number of

keys, or build an index for the key field (Nimon, 2011). All of these options are available to

you. The results of statistical analysis offer managers with valuable information regarding the

organization's performance in a variety of domains. This data can be presented in the form of

averages, percentages, ratios, and correlation, among other things. When a management

understands statistics, it enables them to describe an issue, recognize and evaluate various

courses of action, estimate error, monitor processes, and take appropriate corrective

procedures in order to attain the best possible results (Carlson & Wu, 2012).

It is much simpler for managers to understand this type of information when it is presented to

them in the form of charts, graphs, tables, and the like. This format also makes it possible to

compare the current performance to that of previous periods as well as to benchmarks. An

analysis of statistical reports conducted on a representative sample of customers can provide

a relatively accurate and cost-effective overview of the market, along with data that is both

quicker and less expensive than that which would be obtained by conducting a census of each

and every customer with whom a company might ever interact. The statements are supported

by statistics (Li et al., 2020). When leaders are tasked with persuading followers to move in a

particular direction or take a risk based on baseless assumptions, they may find themselves in

a precarious position. The use of statistics can shed light on relationships. An in-depth

analysis of the data can reveal hidden connections between two variables, such as variations
in income and special sales offers, or unhappy customers and the things they buy (Carlson &

Wu, 2012).

This report consists of three sections. The first section covers quantitative analysis of a data

set for independent variable X and the second section is dedicated to the dependent variable

Y. The section three covers the quantitative calculations for both variables to determine if

there is a correlation between two variables and conduct a linear regression analysis to

understand the relationship between the variables X and Y.

SECTION 1

Data Set for Independent Variable X

67.1 80.0 22.3 33.7 31.6 39.5 44.3 29.0 97.2 8.7

A-

Minimum Value

A dataset is said to have a minimum value if it contains a data value that is either lower than

or equal to all of the other data values in the dataset. It is possible to locate it by rearranging

the values so that they are presented in ascending order, given that the first number in this

sorted dataset represents the highest value (Groebner et al., 2013). In regard to this data set:

Reordered data set

8.7, 22.3, 29.0, 31.6, 33.7, 39.5, 44.3, 67.1, 80.0, 97.2

Thus, the minimum value is 8.7

Maximum Value
The data value that is bigger than all of the other values in the dataset is referred to as the

maximum value. The minimum value is the data value that is less than all of the other values

in the dataset. It is possible to identify it by sorting the values from lowest to highest, with the

number that corresponds to the minimal value appearing at the end of the ordered dataset

(Groebner et al., 2013). In regard to this data set:

Reordered data set

8.7, 22.3, 29.0, 31.6, 33.7, 39.5, 44.3, 67.1, 80.0, 97.2

Thus, the minimum value is 97.2

Range

A dataset's range is determined by subtracting the lowest value from the greatest value (Von

Hippel, 2005). Therefore the range of this dataset can be calculated as follows:

Reordered data set

8.7, 22.3, 29.0, 31.6, 33.7, 39.5, 44.3, 67.1, 80.0, 97.2

97.2 – 8.7 = 88.5

Range is 88.5

Q1

The interquartile range (IQR) is a "measure of variability" that is computed by dividing a

dataset into quartiles, which divide a dataset into four equal halves in ascending order. This

"measure of variability" is known as the interquartile range (IQR). The dataset is divided into

three parts by the components Q1, Q2, and Q3 (Von Hippel, 2005).
Q1 is the value that corresponds to the "middle" position in the ascending order of the first

half of the dataset. As a direct consequence of this, the following constitutes the Q1 for this

dataset:

Reordered data set

8.7, 22.3, 29.0, 31.6, 33.7, 39.5, 44.3, 67.1, 80.0, 97.2

First half of the data set (8.7, 22.3, 29.0, 31.6, 33.7)

The middle value of the first half of the data set is: 29.0

Therefore Q1 is: 29.0

Median

The number in the middle of an organized set of data is referred to as the median. There are

two ways to calculate the median, and which one is used is determined by the number of

values contained in the dataset. After placing the values in descending order in order to find

the median, the number of values in the dataset is then estimated after being rearranged in

ascending order. To get the median of a dataset, divide the total number of values by two if

the total number of values is an odd number, and then find the number that corresponds to

that location. However, if the number is an even number, it is required to find the value in

that position and then, after dividing it by 2, averaging it with the value in the next higher

position (Groebner et al., 2013).

Reordered data set

8.7, 22.3, 29.0, 31.6, 33.7, 39.5, 44.3, 67.1, 80.0, 97.2

Total number of values in this dataset is: 10

To determine median;
10/2 = 5

The fifth number in this dataset is: 33.7

33.7 + 39.5 = 73.2

Median is 73.2 / 2 = 36.6

Q3

The interquartile range (IQR) is a "measure of variability" that is computed by dividing a

dataset into quartiles, which divide a dataset into four equal halves in ascending order. This

"measure of variability" is known as the interquartile range (IQR). The dataset is divided into

three parts by the components Q1, Q2, and Q3 (Von Hippel, 2005).

Within the second half of the dataset, the value Q3 represents the "middle" position when

arranged in ascending order. As a consequence of this, the following terms may be used to

refer to the Q3 in this dataset:

Reordered data set

8.7, 22.3, 29.0, 31.6, 33.7, 39.5, 44.3, 67.1, 80.0, 97.2

Second half of the data set (39.5, 44.3, 67.1, 80.0, 97.2)

The middle value of the second half of the data set is: 67.1

Therefore Q3 is: 67.1

Interquartile Range

The interquartile range (IQR) is a "measure of variability" that is computed by dividing a

dataset into quartiles, which divide a dataset into four equal halves in ascending order. This

"measure of variability" is known as the interquartile range (IQR). The dataset is divided into

three parts by the components Q1, Q2, and Q3 (Von Hippel, 2005).
Q1 is subtracted from Q3 to get the interquartile range.

Q1 is 29.0

Q3 is 67.1

Interquartile range is 67.1 – 29.0 = 38.1

Mode

Because the mode is defined as the value that occurs in a dataset the most frequently, all that

is required to locate the mode is to identify the value that occurs in the dataset the most

frequently (Groebner et al., 2013). The problem is that this dataset does not contain any

repeated numbers, hence it is impossible to determine what the mode is.

Mean

A dataset's mean is calculated by summing all of the values in the dataset and dividing the

total by the number of values (Groebner et al., 2013).

Reordered data set

8.7, 22.3, 29.0, 31.6, 33.7, 39.5, 44.3, 67.1, 80.0, 97.2

Total of values = 453.4

The mean of the dataset is 453.4/ 10 = 45.34

Standard Deviation

A dataset's level of even distribution of values is evaluated using a statistic known as the

standard deviation. The following is the formula for calculating the standard deviation:
To calculate the standard deviation it is essential to follow these steps;

Step 1: Find the mean.

Step 2: For each data point, find the square of its distance to the mean.

Step 3: Sum the values from Step 2.

Step 4: Divide by the number of data points.

Step 5: Take the square root.

Standard Deviation, σ: 26.199358770779

Count, N: 10

Sum, Σx: 453.4

Mean, μ: 45.34

Variance, σ2: 686.4064

Steps

Σ(xi - μ)2
σ2 =
N

(8.7 - 45.34)2 + ... + (97.2 - 45.34)2


=
10

6864.064
=
10

= 686.4064
σ = √686.4064

= 26.199358770779

Calculating and identifying the major parameters that have an impact on the design of a

Grouped Frequency Distribution Table is the first step in the construction of this type of

table. The primary variables that are included in the dataset are as follows:

Distribution

Mean 45.34

Median 36.6

Standard Deviation (s) 27.61655

Skewness 0.79963

Kurtosis -0.1702

Lowest Score 8.7

Highest Score 97.2

Distribution Range 88.5

Total Number of Scores 10

Number of Distinct Scores 10

Lowest Class Value 0

Highest Class Value 103.9

Number of Classes 4

Class Range 26
Based on these values the frequency distribution table for four class intervals can be found as

follows;

Frequency Distribution Table

Class Count Percentage

0 - 25.9 2 20

26 - 51.9 5 50

52 - 77.9 1 10

78 - 103.9 2 20

Total 10 100

Frequency table :

elemen frequency cumulative relative cumulative relative

t frequency frequency frequency

8.7 1 1 0.1 0.1


22.3 1 2 0.1 0.2

29.0 1 3 0.1 0.3

31.6 1 4 0.1 0.4

33.7 1 5 0.1 0.5

39.5 1 6 0.1 0.6

44.3 1 7 0.1 0.7

67.1 1 8 0.1 0.8

80.0 1 9 0.1 0.9

97.2 1 10 0.1 1

Z-score: {-1.3985, -0.8794, -0.6237, -0.5244, -0.4443, -0.2229, -0.0397, 0.8306, 1.3229,

1.9794}

Count items: 10
C-

Frequency Distribution Table

10%
0 - 25.9
26 - 51.9
52 - 77.9
25% 78 - 103.9
50%
Total

5%
10%
Organization of Values into Class Intervals
100
Total
10

20
78 - 103.9
2

10
52 - 77.9
1

50
26 - 51.9
5

20
0 - 25.9
2

0 20 40 60 80 100 120

Frequency Distribution Table Percentage Frequency Distribution Table Count

D-

As can be seen from the pie chart, the frequency rate for the class interval 0-25.9 was ten

percent, making it the interval with the second lowest statistic out of all the intervals. In

comparison to the other class intervals, the class intervals 26 to 51.9 have the greatest rate of

35 percent, while the class intervals 52 to 77.9 have the lowest rate of 5 percent. It seems that

10 percent covers the range that goes from 78 to 103.9. When looking at the bar chart that

illustrates the classification of the data into class intervals, it is possible to see that the total

frequency distribution table percentage for the class interval that covers the range from 26 to

51.9 is 50 percent, while the frequency distribution table count for the same interval is

computed to be 5. Both the class intervals 0 to 25.9 and 78 to 103.9 appear to have the same

total frequency distribution table percentage, which comes out to 20 percent, and both

intervals appear to have the same number of frequency distribution table count, which comes

out to 2. In conclusion, given that the count in the frequency distribution table is 1, it would
appear that the total percentage for the interval 52 to 77.9 in the frequency distribution table

is 10%.

SECTION 2

Data Set for dependent Variable Y

y 2.5 4.8 5.2 3.2 5.5 3.7 6.6 4.6 5.0 2.4

A and B

Minimum Value

A dataset is said to have a minimum value if it contains a data value that is either lower than

or equal to all of the other data values in the dataset. It is possible to locate it by rearranging

the values so that they are presented in ascending order, given that the first number in this

sorted dataset represents the highest value (Groebner et al., 2013). In regard to this data set:

Reordered data set

2.4 2.5 3.2 3.7 4.6 4.8 5.0 5.2 5.5 6.6

Thus, the minimum value is 2.4

Maximum Value

The data value that is bigger than all of the other values in the dataset is referred to as the

maximum value. The minimum value is the data value that is less than all of the other values

in the dataset. It is possible to identify it by sorting the values from lowest to highest, with the

number that corresponds to the minimal value appearing at the end of the ordered dataset

(Groebner et al., 2013). In regard to this data set:

Reordered data set


2.4 2.5 3.2 3.7 4.6 4.8 5.0 5.2 5.5 6.6

Thus, the minimum value is 6.6

Range

A dataset's range is determined by subtracting the lowest value from the greatest value (Von

Hippel, 2005). As a result, the dataset's range is as follows:

Reordered data set

2.4 2.5 3.2 3.7 4.6 4.8 5.0 5.2 5.5 6.6

6.6 – 2.4 = 4.2

Range is 4.2

Q1

The interquartile range (IQR) is a "measure of variability" that is calculated by splitting a

dataset into quartiles, which divide a dataset into four equal halves in ascending order. Q1,

Q2, and Q3 are the components that split the dataset (Von Hippel, 2005).

In ascending order, Q1 is the "middle" value in the first half of the dataset. As a result, the Q1

for this dataset can be identified as:

Reordered data set

2.4 2.5 3.2 3.7 4.6 4.8 5.0 5.2 5.5 6.6

First half of the data set (2.4 2.5 3.2 3.7 4.6)

The middle value of the first half of the data set is: 3.2

Therefore Q1 is: 3.2


Median

The number in the middle of an organized set of data is referred to as the median. There are

two ways to calculate the median, and which one is used is determined by the number of

values contained in the dataset. After placing the values in descending order in order to find

the median, the number of values in the dataset is then estimated after being rearranged in

ascending order. To get the median of a dataset, divide the total number of values by two if

the total number of values is an odd number, and then find the number that corresponds to

that location. However, if the number is an even number, it is imperative to find the value in

that position and then, after dividing it by 2, averaging it with the value in the next higher

position (Groebner et al., 2013).

Reordered data set

2.4 2.5 3.2 3.7 4.6 4.8 5.0 5.2 5.5 6.6

Total number of values in this dataset is: 10

To determine median;

10/2 = 5

The fifth number in this dataset is: 4.6

Because the obtained number is an even number we need to get the average of 4.6 and 4.8

(next higher number). Thus,

4.6 + 4.8 = 9.4

Median is 9.4 / 2 = 4.7


Q3

The interquartile range (IQR) is a "measure of variability" that is calculated by splitting a

dataset into quartiles, which divide a dataset into four equal halves in ascending order. Q1,

Q2, and Q3 are the components that split the dataset (Von Hippel, 2005).

In ascending order, Q3 is the "middle" value in the second half of the dataset. As a result, the

Q3 in this dataset can be identified as:

Reordered data set

2.4 2.5 3.2 3.7 4.6 4.8 5.0 5.2 5.5 6.6

Second half of the data set (4.8 5.0 5.2 5.5 6.6)

The middle value of the second half of the data set is: 5.2

Therefore Q3 is: 5.2

Interquartile Range

The interquartile range (IQR) is a "measure of variability" that is calculated by splitting a

dataset into quartiles, which divide a dataset into four equal halves in ascending order. Q1,

Q2, and Q3 are the components that split the dataset (Von Hippel, 2005).

Q1 is subtracted from Q3 to get the interquartile range.

Q1 is 3.2

Q3 is 5.2

Interquartile range is 5.2 – 3.2 = 2.0


Mode

Because the mode is defined as the value that occurs in a dataset the most frequently, all that

is required to locate the mode is to identify the value that occurs in the dataset the most

frequently (Groebner et al., 2013). The problem is that this dataset does not contain any

repeated numbers, hence it is impossible to determine what the mode is.

Mean

A dataset's mean is calculated by summing all of the values in the dataset and dividing the

total by the number of values (Groebner et al., 2013).

Reordered data set

2.4, 2.5, 3.2, 3.7, 4.6, 4.8, 5.0, 5.2, 5.5, 6.6

Total of values = 43.5

The mean of the dataset is 43.5 / 10 = 4.35

Standard Deviation

The standard deviation is a measure of how evenly distributed the values are within a dataset.

(Von Hippel, 2005) is the standard deviation formula.

Standard Deviation, σ: 1.2947972814306

Count, N: 10

Sum, Σx: 43.5

Mean, μ: 4.35

Variance, σ2: 1.6765


Steps

Σ(xi - μ)2
2
σ =
N

(2.4 - 4.35)2 + ... + (6.6 - 4.35)2


=
10

16.765
=
10

= 1.6765

σ = √1.6765

= 1.2947972814306

So the standard deviation is 3.2012653748167

Part 3

Use either a correlation or a simple linear regression analysis to determine whether or not two

numerical variables have a close connection along a linear axis. To phrase this another way,

the results of a correlation research provide insight into the nature, magnitude, and orientation

of the linear connection that exists between the two variables (Kasuya, 2019). On the other

hand, simple linear regression analysis can be used to estimate the values of one variable
based on the values of another variable, as well as to forecast the parameters that appear in a

linear equation. This type of analysis also predicts the linear equation itself.

Correlation

The Pearson correlation coefficient, sometimes known as a "r," can have a value that falls

anywhere between -1 and 1, inclusive. The resultant number, which is obtained once it is

calculated, reveals whether or not the datasets are related with one another. The degree of

correlation can be determined by calculating the distance between the variable "r" and the

value zero (Kasuya, 2019). If the computed value of r is further away from zero, the dataset

has a strong linear link; however, the relationship weakens as r approaches closer to zero. The

sign of r can also be used to indicate the direction of the link that exists between the

variables. For instance, if the value of r is positive, it indicates that when one variable

increases, the other variable also increases; on the other hand, if the value of r is negative, it

indicates that as one variable increases, the other variable decreases. According to O'Brien

and Sharkey Scott's (2012) research, when r is equal to 1 or -1, a linear function of one of the

variables can properly explain the other variable. As a consequence of this, the rule that may

be applied to determine correlation is as follows:

Guidelines for interpreting correlation coefficient r :

0.7<|r|≦1 strong correlation

0.4<|r|<0.7 moderate correlation

0.2<|r|<0.4 weak correlation

0≦|r|<0.2 no correlation

When the data set used for this investigation is taken into consideration, the value of the

correlation can be approximated by applying the following approaches (Kasuya, 2019):


X: X Values

Y: Y Values

Mx: Mean of X Values

My: Mean of Y Values

X - Mx & Y - My: Deviation scores

(X - Mx)2 & (Y - My)2: Deviation Squared

(X - Mx)(Y - My): Product of Deviation Scores

8.7+22.3+ ˙˙˙ +80+97.2


x̄ = = 45.34
10

2.4+2.5+ ˙˙˙ +5.5+6.

ȳ =6 = 4.35

10

Σ(x - x̄ )2 = (8.7-45.34)2+(22.3-45.34)2+ ˙˙˙ +(80-45.34)2+(97.2-45.34)2 = 6864.064

Σ(y - ȳ)2 = (2.4-4.35)2+(2.5-4.35)2+ ˙˙˙ +(5.5-4.35)2+(6.6-4.35)2 = 16.765

Σ(x - x̄ )(y - ȳ) = (8.7-45.34)*(2.4-4.35)+(22.3-45.34)*(2.5-4.35)+ ˙˙˙ +(80-45.34)*(5.5-

4.35)+(97.2-45.34)*(6.6-4.35) = 310.62

SXY = Σ(x - x̄ )(y - ȳ)

n-1
310.6

SXY = 2 = 34.5133

10 - 1

Σ(xi - x̄ )(yi - ȳ)
r=
√(Σ( xi - x̄ )2Σ(yi - ȳ)2 )

r 310.62
= 0.9157
= √(6864.064*16.765)

The conclusion that can be drawn from the computations shown above is that the value of "r"

is 0.9157. This figure suggests that there is a significant positive connection, which in turn

implies that high scores on variable X correspond to high values for variable Y (and vice

versa).

B-

Linear Regression
Calculation Summary

Ŷ = b0 +b1X

b1 SPxy Σ(xi-x̄ )(yi-ȳ)


=
= SSx Σ(xi-x̄ )2

b1 310.62
= 0.04525
= 6864.064
b0 = ȳ - b1x̄

x̄ = 45.34

ȳ = 4.35

b0 = 4.35 -0.04525*45.34 = 2.2982

SSRegression Σ( ŷi - ȳ)2 14.0565


R2 = = = = 0.8384
SStotal Σ( yi - ȳ)2 16.765

The standard deviation of the residuals is:

Σ( yi - ŷ)2
S2res =
n-2

Residual outliers

Sres = √MSE = √0.3386 = 0.5819.

The average of the residulas is always zero.

The outliers thresholds are: ±k*Sres.

Thresholds = ±3*0.5819 = ±1.7456.

1. Y and X relationship

R Square (R2) equals 0.8384. It means that 83.8% of the variability of Y is explained by X.

correlation (R) equals 0.9157. It means that there is a very strong direct

relationship between X and Y.

2. Goodness of fit

Overall regression: right-tailed, F(1,8) = 41.5184, p-value = 0.0001997. Since p-value < α

(0.05), we reject the H0.

The linear regression model, Y = b0+ b1X + ε, provides a better fit than the model without the

independent variable resulting in, Y = b0 + ε.


The Slope (a): two-tailed, T(8)=6.4435, p-value = 0.0001997. For one predictor it is the same

as the p-value for the overall model.

The Y-intercept (b): two-tailed, T(8) = 6.2492, p-value = 0.0002459. Hence b is significantly

different from zero.

3. Residual normality

The linear regression model assumes normality for residual errors. Shapiro will p-value

equals 0.1736. It is assumed that the data is normally distributed.

C-

In order to get the scatterplot of the given dataset, the following figure can be displayed.

Scatterplot
D–

According to the Pearson Moment Correlation Coefficient, which was found to be 0.9157, it

is possible to draw the conclusion that the independent variable X and the dependent variable

Y have a strong positive correlation. This would imply that high scores on the X variable

correspond to high scores on the Y variable. Scores that are high on the Y variable coincide

with scores that are high on the X variable. In addition, the results of the Linear Regression

analysis allow one to draw the conclusion that the variables X and Y have a positive

connection that is extremely close to being perfect. The results of the Pearson Moment

Correlation Coefficient are given more credence by the scatterplot analysis, which reveals a

constructive connection between the independent variable X and the dependent variable Y.

The independent variable X and the dependent variable Y have a positive and strong
relationship, which means that as the value of X grows, the value of Y rises as well, and as

the value of Y rises, the value of X rises as well.

CONCLUSION

In conclusion, research in business statistics gives managers the ability to analyse past

performance, predict future business practices, and effectively lead companies. The

application of statistics allows for the characterization of markets, the formation of

advertising, the establishment of prices, and the management of changes in consumer

demand. Those who use statistical research in business should have a solid understanding of

how statistics are computed, particularly the meanings of the mean, median, and mode in

relation to a given data set. The average of a set of numbers is referred to as the mean, the

number that falls in the middle of the group is referred to as the median, and the number that

appears most frequently is referred to as the mode. The most effective managers are aware

that each of these tenets contributes to the formation of a more complete picture of the state

of the company's health.

The independent variable X and the dependent variable Y have been shown to have a

significant positive connection, as determined by the statistical analysis that was carried out

as part of this body of study. According to the data shown above, this would imply that high

scores on variable X are equivalent to high scores on variable Y. There is a correlation

between having high scores on variable Y and having high scores on variable X. Because

there is a positive and strong association between the independent variable X and the

dependent variable Y, it can be deduced that if the value of X goes up, so does the value of Y,

and if the value of Y goes up, so does the value of X. In other words, when the value of X

goes up, so does the value of Y.


REFERENCES

1- Carlson, K.D. and Wu, J., 2012. The illusion of statistical control: Control variable practice in
management research. Organizational research methods, 15(3), pp.413-435.

2- Crawford, S.L., 2006. Correlation and regression. Circulation, 114(19), pp.2083-

2088.

3- Groebner, D.F., Shannon, P.W., Fry, P.C. and Smith, K.D., 2013. Business statistics.

Pearson Education UK.

4- Kasuya, E., 2019. On the use of r and r squared in correlation and regression (Vol. 34, No. 1,
pp. 235-236). Hoboken, USA: John Wiley & Sons, Inc..

5- Li, X., Li, K., Ding, Y., Shi, Y., Guo, W., Wei, D., Wang, L. and Zeng, Y., 2020. Design and
Research of Statistical Analysis System based on Business Decision Field. J. Softw., 15(6),

pp.172-180.

6- Nimon, K., 2011. Improving the quality of quantitative research reports: A call for
action. Human Resource Development Quarterly, 22(4), pp.387-394.

7- O'Brien, D. and Sharkey Scott, P., 2012. Correlation and regression.

8- Von Hippel, P.T., 2005. Mean, median, and skew: Correcting a textbook rule. Journal

of statistics Education, 13(2).

You might also like