EMPTY - Practice Test

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Practice test

Question 1:

A researcher conducts a survey to measure people's opinions on a political issue. The options for
responses are "Agree," "Disagree," and "Neutral." What is the measurement level of this variable?

a. Interval
b. Nominal
c. Ordinal
d. Ratio

Question 2:

Given the following table, calculate the chi square statistic:

x/y 0 1 Total
0 120 80 200
1 80 120 200
Total 200 200 400

Question 3:

Which of the following statements about correlation is correct?

I: Correlation measures the strength of the linear relationship between two variables.
II: Correlation can only range from -1 to +1.

a. Both I and II are correct


b. Both I and II are incorrect
c. Only II is correct
d. Only I is correct

Question 4:

A group of researchers conducted a study to examine the academic performance of students in a


particular course. They collected data from a random sample of 180 students who took the course
and calculated the mean score to be 20, with a variance of 9.

Now, they are interested in estimating the range within which the population mean score is likely to
fall. Using a 95% confidence level, what is the confidence interval for the population mean score?
Question 5: Linear equation.

a- The value of the intercept is:


b- The value of the slope is:
c- The sign of the relationship is:
d- If the x value is 2, the predicted y value is:

Question 6:

After the introduction of a new mobile app, a company receives feedback from its users. The product
development team wants to determine the proportion of users who preferred the previous version
of the app compared to the new one (assuming all users have a preference and none are indifferent).
The team asks you to design a study using a random sample and determine the required sample size.
They are willing to accept a margin of error of 3 percent points.

How large should the sample be? (Rounding errors will be accepted).

Question 7:

Which measure is most appropriate for assessing the relationship between study hours and exam
scores?

a. Pearson’s or Spearman's correlation


b. Kendall's tau-c
c. Cramer's V
d. Kendall's tau-b
Question 8:

A researcher conducted a multiple regression analysis to examine the relationship between customer
satisfaction (Var satisfaction) and three predictor variables: service quality (Var quality), price (Var
price), and advertising expenditure (Var advertising). The researcher obtained the following
information:

- Variance of customer satisfaction (Var satisfaction) = 9.6


- Variance of service quality (Var quality) = 12.3
- Variance of price (Var price) = 8.9
- Variance of advertising expenditure (Var advertising) = 6.5
- Variance of residuals (Var residuals) = 4.2
- Variance of predicted values (Var predicted) = 16.8

What is the unadjusted R squared of the model?

Question 9:

Given:

In a large country some people want to support farmers in the transition towards a more sustainable
operation of their farm, while others think farmers should not get that support. Imagine selecting a
large number of samples (all sized 100) from a population. For each of the samples the percentage of
people who want to support farmers is calculated. You put all percentages in a histogram. The middle
of the histogram is at 0.5 (50%).

Two statements:

I: the middle of the histogram is the same as the population proportion


II: Its shape very much looks like a normal distribution

a. Both I and II are correct


b. Both I and II are incorrect
c. Only I is correct
d. Only II is correct
Question 10: R output expected , predicted and residual

Call:
lm(formula = exam_grade ~ study_time, data = data)

Residuals:
Min 1Q Median 3Q Max
-1.4200 -0.6884 -0.1655 0.5024 3.2341

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.07059 0.18992 5.637 1.67e-07 ***
study_time 0.58888 0.03311 17.784 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.939 on 98 degrees of freedom


Multiple R-squared: 0.7634, Adjusted R-squared: 0.761
F-statistic: 316.3 on 1 and 98 DF, p-value: < 2.2e-16

- What is the expected grade of a student who studied 5 hours?


- Suppose one person who studied 10 hours for the exam, had a residual of 2.5, what is the
observed (real) grade of this student ?

Question 11:

A team of researchers conducted a study to assess the fitness levels of individuals in a particular
population. They found that the mean fitness score in the population is 120, with a standard
deviation of 15.Now, they are interested in estimating the percentage of people in the population
who are likely to have a lower fitness level than John. John's fitness score is 150.

Using this information, what is the estimated percentage of people in the population who have a
lower fitness level than John?

Question 12:

A researcher is studying the proportion of smartphone users who have installed a specific social
media app. In a random sample of 300 smartphone users, it was found that 180 of them had the app
installed.

Given this information, what is the 95% confidence interval for the proportion ?
Question 13: Linear equation:

- Reference category
- Group 1

a- The value of the intercept (reference category) is:


b- The value of the statistic associated with group 1 is:
c- The value of the general slope is:

Question 14 :

A researcher investigates the association between political affiliation (measured with three
categories: Democrat, Republican, Independent) and voting preference for a specific policy (four
options are considered, focusing on the "most preferred policy" only). The researcher utilizes the chi-
square statistic to analyze the significance of this relationship.

How many degrees of freedom are associated with the chi-square statistic in this test?

Question 15:

The variable ‘income’ is measured in euros.


What is the measurement level of the variable ‘education’?

a. Interval
b. Nominal
c. Ordinal
d. Ratio
Question 16:

Given the following table calculate the chisquare statistic

x/y 0 1 Total
0 40 10 50
1 30 20 50
Total 70 30 100

Question 17 :

Suppose you calculated the chi square in a table by hand in a 3x3 table. The outcome of your
calculation is 7.14.

a- By using R, what is the associated p-value (2 decimals only).


b- Does this mean the association is significant at a level of 95%?

Question 18: Output Regression and confidence interval

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.27690 0.25924 -1.068 0.287
x1 2.03253 0.02767 73.447 <2e-16 ***
x2 -2.96574 0.02677 -110.767 <2e-16 ***
x3 0.04029 0.02616 ………. 0.125
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.05 on 232 degrees of freedom


Multiple R-squared: 0.9874, Adjusted R-squared: 0.9872
F-statistic: 6046 on 3 and 232 DF, p-value: < 2.2e-16

a- Create a 95% confidence interval for the variable x1


a. What is the lower bound:
b. What is the upper bound:
b- Create a 95% confidence interval for the variable x3
a. What is the lower bound:
b. What is the upper bound:
c- Which of the following statements are correct?
a. A high (and significant) F value means that the model estimated here is correct ?
b. Using the estimate and SE you can calculate t? Try to do it yourself for variable x3
i. Calculation of t for x3
c. In this model all variables are associated with the dependent variable?
Question 19: Linear equation

- Reference category
- Group 1
- Group 2

a- The value of the intercept (reference category) is:


b- The value of the b-coefficient associated with the dummy of group 1 is
c- The value of the b-coefficient associated with the dummy of group 2 is
d- If x is -4, the predicted y value for people in group 1 is:

Question 20:

A sociologist wants to examine the association between marital status (e.g., single, married,
divorced) and job satisfaction among employees in a company. The researcher aims to determine the
strength and nature of this relationship.

Which measure is the most suitable choice for analyzing the association between marital status and
job satisfaction?

a. Spearman's correlation
b. Pearson's correlation
c. Cramer's V
d. Kendall's tau-b
For the next part, please open R and copy paste the following code:

library(tidyverse)
library(haven)
library(broom)
library(modelr)
library(car)
library(lmtest)
library(dplyr)

##Made up assignment:

# Set seed for reproducibility


set.seed(123)

# Number of observations
n <- 236

# Create a modified dataset


dataset <- tibble(
marital_status = rep(c("married", "single"), each = n/2),
yoga = sample(c(0, 1), n, TRUE),
age = sample(18:70, n, TRUE),
weight_september = round(runif(n, 50, 100), 1),
weight_december = weight_september - runif(n, 0, 2),
happiness = ifelse(marital_status == "married",
ifelse(yoga == 0, round(runif(n, 6.0, 7.0), 1), round(runif(n, 7.0, 8.5), 1)),
ifelse(yoga == 0, round(runif(n, 4.0, 6.5), 1), round(runif(n, 4.0, 6.0), 1))),

diet = sample(c("No diet", "Vegan", "Vegetarian"), size = n, replace = TRUE)


)
Now answer the following questions:

Question 1: A researcher is interested in exploring the relationship between marital status and the
level of happiness in individuals. She believes that marital status could play a significant role in
shaping people's happiness levels. To investigate this relationship, the researcher gathers a dataset
consisting of information on individuals' marital status (married or single) and their corresponding
happiness scores, which are measured on a scale from 1 to 10. The researcher hypothesizes that
individuals who are married may experience higher levels of happiness compared to those who are
single.

a- Describe shortly which test is be used to assess the impact of marriage.


b- Explain in a few (two or three) lines why you selected for this test:
c- Show the output and commands
d- Perform a test. Do you conclude the fact that people get married has an effect on the level of
happiness? Explain shortly. A simple yes/no answer will not give you points (but omitting a
simple yes/no answer to the question will also make you lose points). Also shortly discuss the
confidence interval.

Question 2: A researcher is interested in exploring the participation in yoga classes among individuals
in a certain population. Yoga is known to offer various benefits for physical and mental well-being.
The researcher conducts a survey to collect data on individuals' involvement in yoga classes, with
response options indicating whether they have applied to yoga (1) classes or not (0)

a- Describe shortly which statistical analysis would be appropriate to assess the participation in
yoga classes.
b- Explain in a few lines why you selected this analysis.
c- Upload a screenshot displaying the relevant output
d- Upload a screenshot of the commands or steps you would use to perform the statistical
analysis (even if you were unable to execute the analysis).
e- Based on the output from the dataset
a. estimate the percentage of individuals who have applied to yoga classes.
b. Provide the lower bound of the confidence interval for the estimated percentage
c. Provide the upper bound of the confidence interval for the estimated percentage
Question 3:

"After the summer season, a group of people decided to focus on their weight (measured in kg) and
get healthier. They started a weight management program in September, which included changes in
their diet and exercise habits during 4 months so until December.

To assess the effectiveness of the program, the participants' weights were measured first in
September and then in December. Let's analyze the data and find out if the weight management
program led to noticeable weight loss."

a- Describe shortly which statistical analysis would be appropriate to assess the effectiveness of
the program.
b- Explain in a few lines why you selected this analysis
c- Upload a screenshot displaying the relevant output
d- Upload a screenshot of the commands or steps you would use to perform the statistical
analysis (even if you were unable to execute the analysis).
e- Based on your analysis, provide a conclusion regarding whether there was a significant
change in weight over the specified time period."
f- Use a 95% confidence interval to answer the question and explain what does this interval
mean.

Question 4:

Remember the weight management program from the previous question? Before starting the
program, some participants were already practicing yoga while others were not. Now, a question
arises: Some researchers claim that the weight loss was actually due to the fact that some individuals
were already engaged in yoga before the program experience (so in September). To find out, in the
dataset there are multiple weights measured at different times. By examining this dataset, we may
gain insights into the potential impact of pre-existing yoga practice.

a- Describe shortly which statistical analysis would be appropriate in this scenario.


b- Explain in a few lines why you selected this analysis
c- Upload a screenshot displaying the relevant output and commands
d- Based on your analysis, provide a conclusion regarding the potential impact of pre-existing
yoga practice.
e- Compute a 95% confidence interval and explain what does this interval mean in this context.
Question 5: A company offers five different training programs to its employees: Program A to
Program E. The company is interested in assessing whether the distribution of employees across
these programs matches the expected distribution based on their preferences. A random sample of
employees is taken, and their program preferences are recorded. The company believes that the
distribution of program preferences in the sample may deviate from the expected distribution.

a- Describe shortly which test is used to answer this question (name of the test)”
b- Explain in a few (two or three) lines why you selected this test:
c- Upload a screenshot displaying the relevant (numerical, not graphical) output of the test.
d- Upload a screenshot of the commands used to perform the test (even if you were unable to
execute the test).
e- Based on the output, do you conclude that the distribution of employees across training
programs matches the expected distribution?

Question 6: A researcher is interested in investigating the potential benefits of yoga practice on


individuals' happiness levels. It is believed that yoga can promote relaxation, reduce stress, and
enhance overall well-being. The researcher gathers a dataset consisting of information on individuals'
participation in yoga classes (yes or no) and their corresponding happiness scores, measured on a
scale from 1 to 10. The researcher hypothesizes that individuals who participate in yoga classes may
experience higher levels of happiness compared to those who do not engage in yoga practice.

a- Describe shortly which test is used to assess the impact of participating in yoga classes on
happiness levels.
b- Explain in a few lines why you selected this test for evaluating the relationship between yoga
participation and happiness.
c- Provide the output and commands used to conduct the test.
d- Perform the test and draw conclusions regarding the effect of participating in yoga classes on
happiness levels. Avoid giving a simple yes/no answer and instead provide a brief
explanation based on the statistical analysis. Additionally, discuss the confidence interval and
its role in interpreting the results."
Question 7: The management of a wellness center is interested in understanding whether there are
significant differences in the happiness levels of its clients based on their diets. The center offers
various wellness programs and wants to explore if there is a connection between dietary choices and
happiness. The center believes that people with different dietary preferences may experience varying
levels of happiness and wants to validate this claim:

a- Which statistical test would you use to explore whether there are significant differences in
happiness levels among individuals with different diets?
b- Explain in a few lines why you selected this test.
c- Show (copy paste) the commands or steps you would use to perform the statistical analysis
(even if you were unable to execute the analysis). Also upload the output.
d- Based on your analysis, provide a conclusion regarding whether there is a significant
difference in happiness levels.

A popular health magazine published a claim that individuals following a Vegan diet tend to exhibit
higher motivation levels compared to individuals with Vegetarian diets.

a- Which statistical test would you use to answer this question and why?
b- Show (copy paste) the commands or steps you would use to perform the statistical analysis
(even if you were unable to execute the analysis). Also upload the output.
c- Based on your analysis, provide a conclusion regarding the claim.

You might also like