M01 Berenson 15 SSM CH01
M01 Berenson 15 SSM CH01
M01 Berenson 15 SSM CH01
1.2 (a) Business size represents a categorical variable because each size represents a particular
category.
(b) The measurement scale is ordinal, because of the different sizes.
1.4 (a) The telephone number assigned to the smartphone is a categorical variable.
(b) The data usage for a current month (in GB) is a numerical variable that is continuous
because any value within a range of values can occur.
(c) The length (in minutes and seconds) of the last voice call made using the smartphone is a
numerical variable that is continuous because time can have any value from 0 to any
reasonable unit of time.
(d) The number of apps installed on the smartphone is a numerical variable that is discrete
because the outcome is a count.
(e) Whether a device protection plan exists is a categorical variable because the answer can
be only yes or no.
1.10 The variable test score would be numerical, and presumably, in the range of 0 through 100. If
fractional credit for an answer is possible, the variable would need to be continuous and not
discrete.
1.14 A simple random sample would be less practical for personal interviews because of travel costs,
unless interviewees are paid to attend a central interviewing location.
1.16 Here all members of the population are equally likely to be selected and the sample selection
mechanism is based on chance. But selection of two elements is not independent; for example if
A is in the sample, we know that B is also, and that C and D are not.
1.18 (a) Row 16: 2323 6737 5131 8888 1718 0654 6832 4647 6510 4877
Row 17: 4579 4269 2615 1308 2455 7830 5550 5852 5514 7182
Row 18: 0989 3205 0514 2256 8514 4642 7567 8896 2977 8822
Row 19: 5438 2745 9891 4991 4523 6847 9276 8646 1628 3554
Row 20: 9475 0899 2337 0892 0048 8033 6945 9826 9403 6858
Row 21: 7029 7341 3553 1403 3340 4205 0823 4144 1048 2949
Row 22: 8515 7479 5432 9792 6575 5760 0408 8112 2507 3742
Row 23: 1110 0023 4012 8607 4697 9664 4894 3928 7072 5815
Row 24: 3687 1507 7530 5925 7143 1738 1688 5625 8533 5041
Row 25: 2391 3483 5763 3081 6090 5169 0546
Note: All sequences above 5000 are discarded. There were no repeating sequences.
(b) 0089 0189 0289 0389 0489 0589 0689 0789 0889 0989
1089 1189 1289 1389 1489 1589 1689 1789 1889 1989
2089 2189 2289 2389 2489 2589 2689 2789 2889 2989
3089 3189 3289 3389 3489 3589 3689 3789 3889 3989
4089 4189 4289 4389 4489 4589 4689 4789 4889 4989
(c) With the single exception of invoice 0989, the invoices selected in the simple random
sample are not the same as those selected in the systematic sample. It would be highly
unlikely that a random process would select the same units as a systematic process.
1.20 Before accepting the results of a survey of college students, you might want to know, for
example:
Who funded the survey? Why was it conducted? What was the population from which the sample
was selected? What sampling design was used? What mode of response was used: a personal
interview, a telephone interview, or a mail survey? Were interviewers trained? Were survey
questions field-tested? What questions were asked? Were the questions clear, accurate, unbiased,
valid? What operational definition of immediately and effortlessly was used? What was the
response rate?
1.22 The results are based on a survey of bank executives. If the frame is supposed to be banking
institutions, how is the population defined? There is no information about the response rate, so
there is an undefined nonresponse error.
1.24 Before accepting the results of the survey, you might want to know, for example:
Who funded the survey? Why was it conducted? What was the population from which the sample
was selected? What sampling design was used? What mode of response was used: a personal
interview, a telephone interview, or a mail survey? Were interviewers trained? Were survey
questions field-tested? What questions were asked? Were the questions clear, accurate, unbiased,
valid? What was the response rate? What was the margin of error? What was the sample size?
What frame was used?
1.26 (a) Invalid values include Appel, Samsun, APPLE, Apple iPhone, and mOTOROLA.
(b) Appel should be Apple. Samsun should be Samsung. APPLE should be Apple. Apple
iPhone should be Apple. mOTOROLA should be Motorola.
1.28 (a) Fund Number contains the wrong data. Category should do a domain check. 5-Yr
Return should format the number as a percentage. 10-Yr Retrun should mark missing
value. Net Expense Ratio needs a domain check and should check outliers. Rating
should do a domain check and check outliers. Assets needs a domain check and should
mark missing value.
(b) The last three values in New Expense Ratio; the eleventh value in Assets.
(c) For both New Expense Ratio and Assets, a maximum value could be defined.
1.30 No, because the categories do not seem to be mutually exclusive. The categories need specific
ranges, such as younger than 21, 21 to 23, 35 to 54, and 55 or older.
1.32 (a) The times for each of the hotels would be arranged in separate columns.
(b) The hotel names would be in one column and the times would be in a second column.
1.34 A population contains all the items of interest whereas a sample contains only a portion of the
items in the population.
1.36 Categorical random variables yield categorical responses such as yes or no answers. Numerical
random variables yield numerical responses such as your height in inches.
1.38 Both nominal and ordinal variables are categorical variables but no ranking is implied in nominal
variable such as male or female while ranking is implied in ordinal variable such as a student’s
grade of A, B, C, D and F.
1.40 A list of values defines a domain for a categorical variable, whereas a range defines a domain for
a numerical variable.
1.42 Missing values are values that were not collected for a variable. Outliers are values that seem
excessively different from most of the other values
1.44 Coverage error is error generated due to an improperly or inappropriately framed population
which can result in a sample that may not be representative of the population that one wishes to
study. Non-response error is error generated due to members of a chosen sample not being
contacted even after repeated attempts so that information that should be provided is missing.
Excel also includes many other statistical capabilities that can be further explored on the
Microsoft Office Excel official website.
1.48 The answers are based on an article titled “U.S. Satisfaction Still Running at Improved Level”
and written by Lydia Saad (August 15, 2018). The article is located on the following site:
https://news.gallup.com/poll/240911/satisfaction-running-improved-
level.aspx?g_source=link_NEWSV9&g_medium=NEWSFEED&g_campaign=item_&g_content
=U.S.%2520Satisfaction%2520Still%2520Running%2520at%2520Improved%2520Level
(a) The population of interest includes all individuals aged 18 and older who live within the
50 U.S. states and the District of Columbia.
(b) The collected sample includes a random sample of 1,024 individuals aged 18 and older
who live within the 50 U.S. states and the District of Columbia.
(c) A parameter of interest is the percentage of the population of individuals aged 18 and
older and live within the 50 U.S. states and the District of Columbia who are satisfied
with the direction of the U.S.
(d) A statistic used to the estimate the parameter in (c) is the percentage of the 1,024
individuals included in the sample. In this case, the statistic is 36%.
1.50 (a) One variable collected with the American Community Survey is marital status with the
following possible responses: now married, widowed, divorced, separated, and never
married.
(b) The variable in (a) represents a categorical variable.
(c) Because the variable in (a) is a categorical, this question is not applicable. If one had
chosen age in years from the American Community Survey as the variable, the answer to
(c) would be discrete.
1.52 (a) The population of interest consisted of 10,000 benefited employees of the University of Utah.
(b) The sample consisted of 3,095 employees of the University of Utah.
(c) Gender, marital status, and employment category represent categorical variables. Age in years,
education level in years completed, and household income represent numerical variables.