100% found this document useful (1 vote)
475 views13 pages

Group Mid-Term Exam: Ministry of Education and Training National Economics University 000

This document contains details of a mid-term exam for a statistics class at the National Economics University in Hanoi, Vietnam. It includes a frequency table analyzing family income, as well as student responses evaluating the frequency table, a histogram, and bar chart of the data. It determines that descriptive measures of central tendency are most appropriate for the ordinal income variable. The students discuss advantages and disadvantages of using the median, mean, and mode to record income data.

Uploaded by

Thảo Phương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
475 views13 pages

Group Mid-Term Exam: Ministry of Education and Training National Economics University 000

This document contains details of a mid-term exam for a statistics class at the National Economics University in Hanoi, Vietnam. It includes a frequency table analyzing family income, as well as student responses evaluating the frequency table, a histogram, and bar chart of the data. It determines that descriptive measures of central tendency are most appropriate for the ordinal income variable. The students discuss advantages and disadvantages of using the median, mean, and mode to record income data.

Uploaded by

Thảo Phương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 13

Ministry of Education and Training

National Economics University


====000====

GROUP MID-TERM EXAM


SUBJECTS: STATISTICS

Students: Lê Thu Trà - 11196114


Phạm Lâm Anh - 11190467
Nguyễn Minh Thư - 11194964
Dương Minh Thu - 11194937
Nguyễn Phương Thảo - 11194796
Lê Phạm Quỳnh Trang - 11195267
Class: Advanced Finance 61B

Ha Noi - 02/2021
Question 1:
1. Make a frequency table for the variable. Does the frequency table make sense?
Does it make sense to make a histogram of the variable? A bar chart?
The variable we chose is “income”, which perfectly represents the total family
income in the year before the survey.

FREQUENCY TABLE
Statistics

TOTAL FAMILY INCOME


FOR LAST YEAR
Valid 1354
N
Missing 65

TOTAL FAMILY INCOME FOR LAST YEAR

Frequency Percent Valid Cumulative


Percent Percent
UNDER $1 000 17 1.2 1.3 1.3
$1 000 TO 2 999 17 1.2 1.3 2.5
$3 000 TO 3 999 9 .6 .7 3.2
$4 000 TO 4 999 7 .5 .5 3.7
$5 000 TO 5 999 13 .9 1.0 4.7
$6 000 TO 6 999 19 1.3 1.4 6.1
$7 000 TO 7 999 17 1.2 1.3 7.3
$8 000 TO 9 999 40 2.8 3.0 10.3
Valid
$10000 TO 12499 58 4.1 4.3 14.5
$12500 TO 14999 56 3.9 4.1 18.7
$15000 TO 17499 50 3.5 3.7 22.4
$17500 TO 19999 54 3.8 4.0 26.4
$20000 TO 22499 42 3.0 3.1 29.5
$22500 TO 24999 59 4.2 4.4 33.8
$25000 TO 29999 79 5.6 5.8 39.7
$30000 TO 34999 86 6.1 6.4 46.0
Advanced Finance 61B
$35000 TO 39999 82 5.8 6.1 52.1
$40000 TO 49999 119 8.4 8.8 60.9
$50000 TO 59999 108 7.6 8.0 68.8
$60000 TO 74999 111 7.8 8.2 77.0
$75000 TO
66 4.7 4.9 81.9
$89999
$90000 - $109999 45 3.2 3.3 85.2
$110000 OR
76 5.4 5.6 90.8
OVER
REFUSED 124 8.7 9.2 100.0
Total 1354 95.4 100.0
DK 49 3.5
Missing NA 16 1.1
Total 65 4.6

Total 1419 100.0

=> As we all know, a frequency distribution records the number of times each
value occurs and is presented in the form of table. However, as can be seen from
the frequency table, it is hard to draw a conclusion and figure out the trend since
there are too many observations. Therefore, in our opinion, this frequency table
does not make sense as we believe readers would struggle to fully interpret the
aforesaid table.

HISTOGRAM

Advanced Finance 61B


=> A histogram is the most commonly used graph to show frequency distributions.
The X-axis on the histogram are intervals that show the scale of values which the
measurements fall under, whereas the Y-axis shows the number of times that the
values occurred within the intervals set by the X-axis. However, when looking at
the histogram, our group realize that the bins set on the X-axis are hard to
understand and irrelevant to the aforesaid dataset. Moreover, the missing values
column is not stated out clearly since it is lying among the non-missing ones,
causing misunderstandings to readers. Therefore, the histogram does not make
sense.

BAR CHART
Advanced Finance 61B
=> A bar chart is used when you want to show a distribution of data points or
perform a comparison of metric values of your data. From a bar chart, we can see
which groups are highest or most common, and how other groups compare against
the others. Even though there are a lot of values on the X-axis of the bar chart, we
can still see the trend going on here and can even compare these values in order to
draw a conclusion. Besides, the missing values column (so-called “REFUSED”) is
stated, making readers understand the graph more clearly. Hence, the bar graph in
this case does make sense.
2. What is the scale of measurement for the variable?
The scale of measurement for the variable “income” is ordinal because it has
been divided into several categories, which are not mathematically measured or
determined but are merely assigned as lables for opinion.

Advanced Finance 61B


The yellow row is the variable “income”
3. What descriptive statistics are appropriate for describing this variable and why?
Does it make sense to compute a mean?
- There are 4 types of descriptive statistics:

Measures of
Measures of Measures of Measures of
Dispersion or
Frequency Central Tendency Position
Variation
- Count, Percent, - Mean, Median, - Range, Variance, - Percentile
Frequency and Mode Standard Deviation Ranks, Quartile
- Shows how often - Locates the - Identifies the Ranks
something occurs distribution by spread of scores by - Describes how
- Use this when you various points stating intervals scores fall in
want to show how - Use this when - Range = High/Low relation to one
often a response is you want to show points another. Relies
given how an average or - Variance or on standardized
most commonly Standard Deviation scores
indicated response =difference between - Use this when
observed score and you need to
mean compare scores to
Advanced Finance 61B
- Use this when you
want to show how
"spread out" the data
a normalized
are. It is helpful to
score (e.g., a
know when your
national norm)
data are so spread
out that it affects the
mean

=> As we can see from the table, Measure of Central Tendency is the most
appropriate to describe the data and closest to our purpose, which is to know the
most commonly indicated response.
Statistics
TOTAL FAMILY INCOME FOR LAST
YEAR
Valid 1354
N
Missing 65
Mean 16.13
Median 17.00
Mode 24
Skewness -.608
Std. Error of Skewness .066

As can be observed from the Histogram drawn in part 1.1 and the fact that the
Coefficient of Skewness is smaller than 0 (negative), the graph appears to be
NEGATIVELY SKEWED. Moreover, as we look at the frequency table, this is an
open-ended distribution, which means one or more of the classes (or bins) is open-
ended.
=> Hence, it does NOT make sense to compute a Mean.

Advanced Finance 61B


4. Discuss the advantages and disadvantages of recording income in this manner.
Describe other ways of recording income and the problem associated with each of
them.
- Avantages and disadvantages of recording the variable using “Median”:
 Advantages:
 Easy to understand and calculate
 Not affected by outlying values => thus can be used when the mean would
be misleading
 Disadvantages:
 Value of one observation => fails to reflect the whole data set
 Not easy to use in other analysis
- There are 2 other ways to describe the variable, which are “Median” and “Mode”:

ADVANTAGES & DISAVANTAGES

Median Mode
- Easy to understand and - Easy to understand and
calculate calculate
- Values of every items - Not affected by extreme
Advantages are included => values
representative for the - Can be computed in an
whole set of data open-ended frequency
table

- Not defined when there


are no repeats in a data set
- Not based on all values
Disadvantages - Sensitive to outliers
- Unstable when the data
consist of a small number
of values.

Advanced Finance 61B


Question 2: In the gss.sav file, the variable tvhours tells you how many hours
per day GSS respondents say they watch TV.
(a) Make a frequency table of the hours of television watched. Do any of the values
strike you as strange? Explain.

Statistics

Hours per day watching TV


Valid 906
N
Missing 513

Hours per day watching TV


Valid Cumulative
Frequency Percent
Percent Percent
0 54 3.8 6.0 6.0
1 189 13.3 20.9 26.8
2 238 16.8 26.3 53.1
3 159 11.2 17.5 70.6
4 115 8.1 12.7 83.3
5 54 3.8 6.0 89.3
6 30 2.1 3.3 92.6
7 10 .7 1.1 93.7
As can be
Valid 8 22 1.6 2.4 96.1
seen 10 13 .9 1.4 97.6 from
the 11 3 .2 .3 97.9
12 13 .9 1.4 99.3
14 2 .1 .2 99.6
15 2 .1 .2 99.8
20 1 .1 .1 99.9
24 1 .1 .1 100.0
Total 906 63.8 100.0
NAP 486 34.2
Missing NA 27 1.9
Total 513 36.2
Total 1419 100.0
frequency table, the values which strike me as strange is 12 – which is the 12 hours
watching television per day. The number of people who watch TV from 0 to 10

Advanced Finance 61B


hours is relatively high, and then the figures start to go downward dramatically
from the variable 11. However, among the variable from 11 to 24, only variable 12
is unexpectedly higher in this group, which strike us as extraordinary.
(b) Based on the frequency table, answer the following questions: Of the people
who answered the question, what percentage don’t watch any television? What
percentage watch two hours or less? Five hours or more? Of the people who watch
TV, what percentage watch one hour? What percentage watch four hours or less?
Based on the frequency table:
- Of the people who answered the question:
 6% of the people don’t watch any televisions.
 53.1% of the people watched TV for two hours or less.
 16,6% of the people watched TV for five hours or more

** = 83.4%, which is the total valid


percent of whom watching TV from 0 to 4 hours
- Of the people who watch TV (which means the values of variable 0 is excluded):
 20.9% watch TV for one hour
 82.27% watch TV for four hours or less

** 906 – 54 = 852, which is the total number of people who watch TV


*** 189 + 238 + 159 + 115 = 701, which is the total number of people who
watch TV from 1 to 4 hours per day
(c) From the frequency table, estimate the 25th, 50th, 75th, 95th percentiles. What
is the value for the Median, Mode?
A percentile is the value in a data distribution below which a given percentage
of values falls. There are a number of different ways to calculate percentiles in
SPSS, and also a number of different formulae. Our group are going to calculate
the 25th, 50th, 75th, 95th percentiles for the variable tvhours. We are going to use the

Advanced Finance 61B


Frequencies option, which calculates percentiles using a weighted average formula
(as shown in the SPSS data view above).

Statistics

Hours per day watching TV


N Valid 906
Missing 513

Median 2.00
Mode 2
Percentiles 25 1.00
50 2.00
75 4.00

95 8.00

As can be seen from the results which appear in the SPSS output view:
 The value for 25th percentiles is 1.00
 The value for 50th percentiles is 2.00
 The value for 75th percentiles is 4.00
 The value for 95th percentiles is 8.00
 Both the values for Median and Mode is 2
(d) Make a bar chart of the hours of TV watched. What problem do you see with
this display?
As can be seen from the bar graph below, most of the respondents watch TV
from 1 to 4 hours per day, whereas only a minority of those watch TV for more
than 10 hours. As a result, the dataset is not distributed equally.
Moreover, the values “9, 13, 16, 17, 18, 19, 21, 22, 23” are not included in the
bar chart due to the fact that these values do not appear in the survey answers (this
might occured since the number of respondents are not large enough). Therefore,
the problem in the bar chart below is that it does not show a gap which represents

Advanced Finance 61B


these uncollected data (so-called missing values), which can lead to
misunderstanding for readers at their first glance.

BAR GRAPH

(e) Make a histogram of the hours of TV watched. What causes all of the values to
be clumped together? Compare this histogram to the bar chart you generated in
question 2d. Which is a better display for these data?
HISTOGRAM

Advanced Finance 61B


- All of the values in the histogram are clumped together due to the fact that this
dataset is POSITIVELY SKEWED ( since most values are clustered around the left
tail of the distribution while the right tail of the distribution is longer), which means
most of the people who took this survey watch television from 1 to 4 hours,
whereas only few of those watch TV for more than 10 hours.
- In our opinion, a Histogram would do a better job than a Bar chart in this case. As
we have stated in part (d), the bar chart does NOT show the gap which represent
the uncollected data, whereas it is a different story when we look at the histogram.
Hence, the histogram would be a better display of the data since it can do both jobs,
which are “show the distributions of the values of data collected” + “show a gap to
represent these uncollected data”.

------------------------------------Thank you for reading------------------------------------

Advanced Finance 61B

You might also like