PRACTICE QUIZ (1)

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

PRACTICE QUIZ

SECTION 1 – R Programming
1. What is R primarily used for?
A) Web development
B) Data analysis and statistics
C) Mobile application development
D) Graphic design
Answer: B
Explanation: R is a programming language primarily used for statistical
computing and data analysis.

2. Which of the following is used to assign a value to a variable in R?


A) <=
B) <-
C) =
D) Both B & C
Answer: D
Explanation: You can assign values to variables using <-, or = in R.

3. What will the following command return: class(TRUE)?


A) "numeric"
B) "logical"
C) "character"
D) "factor"
Answer: B
Explanation: TRUE is a logical value in R, so the class() function will return
"logical".
4. What is the output of sum(1:5)?
A) 10
B) 15
C) 20
D) Error
Answer: B
Explanation: The colon operator 1:5 creates a sequence from 1 to 5, and the
sum() function adds them up (1+2+3+4+5=15).

5. How do you create a data frame in R?


A) data.frame()
B) dataframe()
C) create.dataframe()
D) new.data.frame()
Answer: A
Explanation: The data.frame() function is used to create data frames in R.

6. How do you add a new column to an existing data frame?


A) df$new_column <- values
B) append(df, values)
C) add.column(df, values)
D) new_column(df, values)
Answer: A
Explanation: You can add a new column to a data frame using the $ operator,
e.g., df$new_column <- values.

7. Which of the following is not a valid data type in R?


A) Numeric
B) Logical
C) Matrix
D) ArrayList
Answer: D
Explanation: R does not have an ArrayList data type. It supports numeric,
logical, matrix, and other types.

8. How do you access the third element in a vector named v?


A) v[3]
B) v(3)
C) v[2]
D) v{3}
Answer: A
Explanation: You access elements of a vector in R using square brackets [], so
v[3] retrieves the third element.

9. What is the output of mean(c(2, 4, 6, 8))?


A) 5
B) 6
C) 4
D) 8
Answer: B
Explanation: The mean() function calculates the average, and (2 + 4 + 6 + 8) /
4 = 20 / 4 = 5.

10. What is the purpose of the library() function?


A) Install a new package
B) Unload a package
C) Load an installed package
D) Check for available packages
Answer: C
Explanation: The library() function is used to load an installed package into the
current R session.

11. Which function in R is used to read a CSV file?


A) read.table()
B) read.csv()
C) read.file()
D) read.dataset()
Answer: B
Explanation: The read.csv() function is used to import CSV files in R.

12. What does is.na() function check for?


A) Negative numbers
B) Missing values (NA)
C) Zeros
D) Data type
Answer: B
Explanation: The is.na() function checks whether a value is missing (NA) in R.

13. How do you create a matrix in R?


A) matrix()
B) mat()
C) mtx()
D) create.matrix()
Answer: A
Explanation: The matrix() function is used to create matrices in R.

14. What is the output of class(c(1, 2, 3))?


A) "character"
B) "list"
C) "numeric"
D) "factor"
Answer: C
Explanation: The class() function returns the type of the object, and the given
vector contains numeric values.
15. How do you check the dimensions of a data frame?
A) length()
B) dim()
C) nrow()
D) size()
Answer: B
Explanation: The dim() function returns the dimensions of an object like a
matrix or data frame.

16. Which of the following is an example of a logical operator in R?


A) &&
B) ||
C) !
D) All of the above
Answer: D
Explanation: &&, ||, and ! are all logical operators in R.

17. What will typeof(42L) return?


A) "numeric"
B) "logical"
C) "integer"
D) "double"
Answer: C
Explanation: The L suffix forces R to interpret 42 as an integer, so typeof() will
return "integer".

18. What does summary() do in R?


A) Provides a statistical summary of an object
B) Displays the structure of an object
C) Creates a plot
D) Merges two data frames
Answer: A
Explanation: The summary() function provides descriptive statistics like mean,
median, and range for an object.

19. From given list-

What is the output of student[[1]][2] and student[[3]][2] -


A) pinkey and hyd
B) Rohith and kakumanu
C) bobby and 78
D) 100 and pinkey

20. How do you combine two data frames by rows in R?


A) rbind()
B) cbind()
C) merge()
D) join()
Answer: A
Explanation: The rbind() function combines data frames by rows.

SECTION 2 – MISSING VALUE


1. Which method would you use when the data has a significant amount of
missing values in a feature, and the feature is not very important for the
model?
A) Mean imputation
B) K-Nearest Neighbors (KNN) imputation
C) Dropping the feature
D) Interpolation
Answer: C
Explanation: If a feature has a significant proportion of missing values and is not
critical for the analysis, dropping the feature may be an appropriate approach.

2. Which imputation technique is most suitable for categorical variables?


A) Mean imputation
B) Mode imputation
C) Median imputation
D) Regression imputation
Answer: B
Explanation: Mode imputation replaces missing values with the most frequent
category, which is ideal for categorical data.

3. Which of the following is a potential drawback of using mean


imputation for handling missing values?
● A) It introduces bias into the dataset

● B) It is computationally expensive

● C) It can only be applied to categorical data

● D) It leads to loss of variance in the feature

Answer: D
Explanation: Mean imputation reduces variability in the data since all missing
values are replaced by a single constant (the mean), which can distort the
relationships between variables.

5. In which of the following cases would median imputation be preferred


over mean imputation?
● A) When the data contains outliers
● B) When the dataset is normally distributed

● C) When the missing values are categorical

● D) When the missing values are in time series data

Answer: A
Explanation: Median imputation is more robust to outliers than mean imputation
because the median is not affected by extreme values.

6. Which of the following is a problem caused by using mean imputation


on skewed data?
● A) It increases the variance

● B) It introduces a strong bias toward the extremes

● C) It does not capture the central tendency accurately

● D) It increases computation time

Answer: C
Explanation: In skewed data, mean imputation may not accurately capture the
true central tendency, leading to biased estimates.

7. Why is it essential to consider the mechanism of missing data (MCAR,


MAR, MNAR) before applying imputation techniques?
● A) To select the most computationally efficient imputation method

● B) To avoid overfitting the model

● C) To apply an appropriate imputation technique based on the missing data


pattern
● D) To increase the number of missing data points

Answer: C
Explanation: Understanding the missing data mechanism helps choose the right
imputation technique, as different techniques are better suited for different patterns
(MCAR, MAR, MNAR).

SECTION 3 – OUTLIERS DETECTION AND HANDLING


1. What is an outlier in machine learning?
A) A data point that lies within the expected range
B) A data point that deviates significantly from the majority of the data
C) A data point that has missing values
D) A data point that perfectly fits the model
Answer: B
Explanation: Outliers are data points that differ significantly from other
observations. They may represent errors, variability in the data, or rare events.

2. Which of the following methods is commonly used to detect outliers?


A) Cross-validation
B) Z-Score
C) Gradient Descent
D) Regularization
Answer: B
Explanation: Z-Score measures how many standard deviations a data point is from
the mean. Data points with a Z-Score higher than a certain threshold (e.g., 3) are
considered outliers.

3. What is the main disadvantage of using Z-Score for outlier detection?


A) It is computationally expensive
B) It assumes the data is normally distributed
C) It requires labeled data
D) It can only be used for categorical data
Answer: B
Explanation: Z-Score assumes the data follows a normal distribution. If the data is
not normally distributed, this method may fail to identify true outliers.

4. Which of the following is a non-parametric method for outlier detection?


A) Z-Score
B) Linear Regression
C) K-Nearest Neighbors (KNN)
D) t-test
Answer: C
Explanation: K-Nearest Neighbors is a non-parametric method that can be used for
detecting outliers by considering the distances between neighboring points.
5. Which metric is typically used in Distance-based outlier detection?
A) Mean
B) Median
C) Euclidean Distance
D) Variance
Answer: C
Explanation: Euclidean distance is a common metric used in distance-based outlier
detection methods to measure the distance between data points.

6. Which method uses the interquartile range (IQR) for detecting outliers?
A) Box plot
B) Decision Trees
C) Z-Score
D) Random Forest
Answer: A
Explanation: Box plots visualize the spread of the data and use the IQR to detect
outliers. Values outside 1.5 times the IQR from the quartiles are considered outliers.

7. Which method would be least appropriate for categorical data outlier


detection?
A) One-Hot Encoding
B) Z-Score
C) Frequency Analysis
D) Mode-Based Detection
Answer: B
Explanation: Z-Score is designed for continuous data, not categorical data, making
it less suitable for detecting outliers in categorical datasets.

8. What is a common way to handle outliers in regression models?


A) Drop the outliers
B) Assign them a different weight
C) Use a non-linear model
D) Convert the data to categorical
Answer: A
Explanation: A common approach is to remove or drop the outliers if they are
deemed to be noise or errors, as they can negatively affect regression models.
Q 1. Consider the following probability distribution. Which of the following is
true?
x 0 1 2 3 4 5 6
p(x) 0.1 0.19 0.23 0.16 0.14 0.14 0.04

A. P(X< 3) = 0.68
B. P(X < 6) =1
C. P(2< X< 5) = 0.30
D. P(X > 3) = 0.48

Answer : c

Q 2. Thirty percent of all households have a DVD player. If you select 20 houses at
random, what is the probability that six or more of them have a DVD player?
A. BINOM.DIST(6, 20, 0.30, 1)
B. 1 – BINOM.DIST(6, 20, 0.30, 1)
C. 1 – BINOM.DIST(5, 20, 0.30, 1)
D. BINOM.DIST(5, 20, 0.30, 1)

Answer : c

Q 3. Consider the following probability distribution function.


x 0 1 2 3 4 5 6
p(x) 0.1 0.19 0.23 0.16 0.14 0.14 0.04

What is the expected value of X?


A. 3
B. 0.143
C. 2.63
D. 9.73

Answer : c

Q 4. A computer store receives a shipment of 14 computers, 5 of which already


have modem installed. Unfortunately, the boxes are not labelled so you are unsure
which computers are which. Assume you select 4 computers. What is the probability
that exactly two of them have modems?
5 14
A. C2 / C4
B. 5/14
9
C. C2 5C2 / 14C4
5
D. C4 / 14C4
Answer : c

Q 5. Which of the following is an example of a binomial experiment?

A. A business firm introducing a new product wants to know how many


purchases its clients will make each year.
B. A shopping mall is interested in the income level of its customers and is
taking a survey to gather information.
C. A sociologist is researching an area in an effort to determine the
proportion of households with a male head of household.
D. A study is concerned with the average number of hours worked by high
school students.

Answer : c

Q 6. The following table presents the probability distribution for the number of
claims processed per hour at an insurance agency.
No. of Claims 2 3 4 5 6 7
p(x) 0.11 0.16 0.27 0.23 0.13 0.10

What is the average number of claims processed?


A. 4.5
B. 21.53
C. 4.41
D. 0.17

Answer : c

Q 7. Which of the following statements is always true for any two events A and B
defined on a sample space S?
A. If events A and B are mutually exclusive, then A & B will always happen
together.
B. If events A and B are collectively exhaustive, then A & B will always
happen together.
C. The intersection of A and B is the set of all outcomes in either A or B or
both.
D. The complement of event A is event B.
Answer : c

Q 8. The finishing process on new furniture leaves slight blemishes. The table
shown below displays a manager's probability assessment of the number of
blemishes in the finish of new furniture.
Number of Blemishes 0 1 2 3 4 5

Probability 0.34 0.25 0.19 0.11 0.07 0.04

Let event A be that there are more than three blemishes and let event B be that there
are four or fewer blemishes. Which of the following statements is true?
A. Events A and B are mutually exclusive.
B. P(A OR B) = 0
C. Events A and B are collectively exhaustive.
D. P(A AND B) = 0.18

Answer : c

Q 9. The finishing process on new furniture leaves slight blemishes. The table
shown below displays a manager's probability assessment of the number of
blemishes in the finish of new furniture.
Number of Blemishes 0 1 2 3 4 5

Probability 0.34 0.25 0.19 0.11 0.07 0.04

Let event E1 be that there are less than three blemishes and let event E2 be
that there are more than two blemishes. Which of the following statements is
true?
A. Events E1 and E2 are mutually exclusive.
B. Events E1 and E2 are collectively exhaustive.
C. Both (A) and (B) are true.
D. Neither (A) nor (B) is true.

Answer : c

Q 10. The probability that an employee at a company uses illegal drugs is 0.08. The
probability that an employee is male is 0.55. Assuming that these events are
independent, what is the probability that a randomly chosen employee is a male
drug user?
A. 0.586
B. 0.630
C. 0.044
D. 0.470
Answer : c

Q 11. It was found that 84% of all stockbrokers drink coffee each day.
Furthermore, 64% eat candy bar each day. Finally, 8% of stockbrokers do not drink
coffee but eat candy bar. What is the probability that a stockbroker drinks coffee
given he/ she eats candy? [Hint: P(Ac|B) = 1 – P(A|B)]
A. 0.560
B. 0.280
C. 0.875
D. 0.360

Answer : c

Q 12. A survey of recent e-commerce start-up firms was undertaken at an industry


convention. Representatives of the firm where asked for the geographic location of
the firm as well as the firm's outlook for growth in the coming year. The results are
provided below.

What is the probability that one of these start-up firms was from the Northeast
provided the firm’s expected growth is medium?
A. 0.42
B. 0.12
C. 0.16
D. 0.31

Answer : c

Q 13. A survey of recent e-commerce start-up firms was undertaken at an industry


convention. Representatives of the firm where asked for the geographic location of
the firm as well as the firm's outlook for growth in the coming year. The results are
provided below.

What is the probability that one of these start-up firms’ expected growth is medium
provided the firm is from the Northeast provided the firm’s?
A. 0.16
B. 0.31
C. 0.42
D. 0.12

Answer : c

Q 14. Which of the following statements is not true?


A. When events A and B are independent, then P(A OR B) = P(A) + P(B) -
P(A) × P(B).
B. When events A and B are independent, then P(A AND B) = P(A) × P(B).
C. When events A and B are independent, then P(A AND B) =P(A) + P(B).
D. Events are independent when the occurrence of one event has no
effect on the probability that another will occur.

Answer : c

Q 15. An automatic pesticide packing machine is used to fill 500 g. pesticides in a


carton. The amount filled is a continuous random variable and cannot be 500 g.
exactly. There is some variability in the amount filled by the machine. In order to
ensure better customer satisfaction, the management programmes the filling
machine so as to fill mean amount 510 g and the standard deviation 4 g. The amount
filled by the machine follows normal distribution. The management thinks that now
no customer is getting less than 500 g. pesticides in a carton. The percentage of
customers getting less than 500 g. pesticides in a carton is
A. 50.621
B. 62.10
C. 0.621
D. 49.379

Answer : c

Q1.A corporation states that 60% of its employees participate in team initiatives. 30% of
employees participate in team initiatives and receive leadership training. What is the
probability that an employee will attend leadership training if s/he is involved in a team
project?
A. 0.30
B. 0.18
C. 0.50*
D. Cannot be determined from the information

Answer: c

Q2.A firm observes that 70% of their orders are online. Of these online orders, 90% are
delivered on time, while only 60% of in-store orders are delivered on time. If an
order is delivered on time, what is the probability that it was placed online?
A. 0.78*
B. 0.63
C. 0.81
D. 0.87

Answer: a

Q3.A software company tracks the success of its advertising campaigns. Campaign A
reaches 60% of the market; Campaign B reaches the remaining 40%. Campaign A and
Campaign B have conversion rates of 25% and 35%, respectively. When a customer
makes a purchase, what is the probability that they were reached by Campaign B?
A. 0.35
B. 0.41
C. 0.48*
D. 0.59

Answer: c

Q4.A delivery service uses three types of vehicles: vans (50%), motorcycles (30%), and
trucks (20%). The probabilities of late delivery for vans, motorcycles, and trucks are
5%, 10%, and 15%, respectively. If a delivery is late, what is the probability it was
made by a motorcycle?
A. 0.15
B. 0.25
C. 0.29*
D. 0.35
Answer: c

Q5.A marketing team launches an ad campaign targeting a market segment. The


probability of a customer engaging with the ad is 20%. If 10 customers from the
segment are targeted, what is the probability that exactly 2 engage with the ad?
A. BINOM.DIST(2,10,0.2,0)*
B. BINOM.DIST(2,10,0.2,1)
C. BINOM.DIST(2,10,0.8,0)
D. BINOM.DIST(2,10,0.8,1)

Answer: a

Q6.A software company estimates a 50% chance that a project will require an
extension. If the company manages 8 projects simultaneously, what is the
probability that at least half of them will require an extension?
A. BINOM.DIST(4,8,0.5,1)
B. 1 – BINOM.DIST(4,8,0.5,1)
C. 1 – BINOM.DIST(3,8,0.5,1)*
D. 0.5

Answer: c

Q7.The delivery times for Swiggy orders in Bengaluru follow a normal distribution with
a mean of 35 minutes and a standard deviation of 5 minutes. What is the probability
that an order is delivered within 30 to 40 minutes?
A. NORM.DIST(41,35,5,1) – NORM.DIST(29,35,5,1)
B. NORM.DIST(40,35,5,1) – NORM.DIST(30,35,5,1)*
C. NORM.DIST(40,35,5,0) – NORM.DIST(30,35,5,0)
D. NORM.DIST(39,35,5,1) – NORM.DIST(31,35,5,1)

Answer: b
Q8.A courier company has delivery times that are normally distributed with a mean of
2.5 days and a standard deviation of 0.6 days. What is the delivery time that
corresponds to the fastest 1% of deliveries?
A. NORM.INV(0.99,2.5,0.6)
B. NORM.INV(0.01,2.5,0.6,1)
C. NORM.INV(0.01,2.5,0.6)*
D. NORM.INV(0.99,2.5,0.6,1)
Answer: c

Q9.A retailer claims that the average amount spent by a customer is less than ₹1,000. A
sample of 100 customers shows a sample mean of ₹940 and a standard deviation of
₹150. What is the p-value for testing the retailer’s claim?
A. Close to zero*
B. Close to 1
C. Close to 0.5
D. Can’t obtain

Answer: a

Q10.Which of the following is a key assumption when conducting a one-sample t-test?


A. The sample size must be greater than 30.
B. The sample mean should be exactly equal to the population mean.
C. The sample must come from a normal population.*
D. The population variance must be known.

Answer: c

Q11.A retail store in Mumbai claims that the average spending of a customer is less than
₹1,000. A sample of 20 customers has a sample mean of ₹950, with a sample
standard deviation of ₹100. What is the primary purpose of performing the one-
sample t-test in this case?
A. To verify if the store's claim is valid based on the sample data.*
B. To calculate the exact sample mean for the entire population.
C. To find the population standard deviation.
D. To decide if the sample is a true representative of the population.
Answer: a
Q12.A retailer in Mumbai compares sales performance across three stores. They perform
an ANOVA and find a significant result. What is the next step for the retailer?
A. Perform a post-hoc test to determine which stores' sales are significantly
different from each other.*
B. Immediately implement changes based on the store with the highest sales.
C. Recalculate the ANOVA test using a larger sample size.
D. Fail to reject the null hypothesis, as no further analysis is needed.
Answer: a

Q13.A company in Delhi collects data on whether customers prefer online or in-store
shopping based on their income group (low, medium, high). After performing the Chi-
Square Test of Independence, the p-value is 0.12. What is the appropriate conclusion?
A. There is a significant association between income group and shopping
preference.
B. There is no significant association between income group and shopping
preference.*
C. The data does not meet the conditions for a Chi-Square Test.
D. Can not conclude as the information is inadequate.

Answer:B

Q14.A car dealership in Mumbai wants to verify if the distribution of cars sold by color
matches their expected proportions (30% white, 20% black, 25% red, 25% blue).
They observe sales of 200 cars. If the p-value obtained from the Chi-Square Test of
Goodness of Fit is 0.04, what is the conclusion?
A. There is no significant difference between the observed and expected
distribution of car colors.
B. The observed car color distribution is significantly different from the
expected distribution.*
C. The dealership should increase the number of cars sold.
D. The dealership should not perform the test because the sample size is too
small.

Answer:B

Q15.Which of the following conclusions can be drawn from a Two-Way ANOVA test that
shows significant effects for both marketing strategy (online vs. print) and customer
demographics (age groups), with no significant interaction effect?
A. The type of marketing strategy and customer demographics have a combined
effect on customer purchase decisions.
B. The type of marketing strategy and customer demographics each have an
independent effect on purchase decisions.*
C. There is no significant effect of marketing strategy or customer
demographics on customer purchase decisions.
D. Only marketing strategy, not customer demographics, significantly influences
purchase decisions.

Answer:B

You might also like