2a. Exploratory Data Analysis
2a. Exploratory Data Analysis
2a. Exploratory Data Analysis
Guidelines:
1. An assignment submission is considered complete only when the correct and executable code(s) is
submitted along with the documentation explaining the method and results. Failing to submit either
of those will be considered an invalid submission and will not be considered a correct submission.
2. Ensure that you submit your assignments correctly. Resubmission is not allowed.
3. Post the submission you can evaluate your work by referring to the keys provided. (will be available
only post the submission).
Make a table as shown above and provide information about the features such as its data
type and its relevance to the model building. And if not relevant, provide reasons and a
description of the feature.
Problem Statements:
Q3) Three Coins are tossed, find the probability that two heads and one tail are obtained.
Q4) Two Dice are rolled, find the probability that the sum is
a) Equal to 1
b) Less than or equal to 4
c) Sum is divisible by 2 and 3
Q5) A bag contains 2 red, 3 green, and 2 blue balls. Two balls are drawn at random. What is the
probability that none of the balls drawn is blue?
Q6) Calculate the Expected number of candies for a randomly selected child:
Below are the probabilities of the count of candies for children (ignoring the nature of the child-
Generalized view)
i. Child A – the probability of having 1 candy is 0.015.
ii. Child B – the probability of having 4 candies is 0.2.
Q7) Calculate Mean, Median, Mode, Variance, Standard Deviation, and Range & comment
about the values / draw inferences, for the given dataset.
Dataset: Refer to Hands-on Material in LMS - Data Types EDA assignment snapshot of the
dataset is given above.
Q9) Look at the data given below. Plot the data, find the outliers, and find out: μ , σ , σ 2
Hint: [Use a plot that shows the data distribution, and skewness along with the outliers; also
use Python code to evaluate measures of centrality and spread]
Q10) AT&T was running commercials in 1990 aimed at luring back customers who had switched
to one of the other long-distance phone service providers. One such commercial shows a
businessman trying to reach Phoenix and mistakenly getting Fiji, where a half-naked native on a
beach responds incomprehensibly in Polynesian. When asked about this advertisement, AT&T
admitted that the portrayed incident did not actually take place but added that this was an
enactment of something that “could happen.” Suppose that one in 200 long-distance telephone
calls is misdirected.
What is the probability that at least one in five attempted telephone calls reaches the wrong
number? (Assume independence of attempts.)
Hint: [Using the Probability formula evaluate the probability of one call being wrong out of five
attempted calls]
Q11) Returns on a certain business venture, to the nearest $1,000, are known to follow the
following probability distribution.
X P(x)
-2,000 0.1
-1,000 0.1
0 0.2
1000 0.2
2000 0.3
3000 0.1
(i) What is the most likely monetary outcome of the business venture?
Hint: [The outcome is most likely the expected returns of the venture]
(iv) What is a good measure of the risk involved in a venture of this kind? Compute
this measure.
Hint: [Risk here stems from the possible variability in the expected returns,
therefore, name the risk measure for this venture]
Hints:
For each assignment, the solution should be submitted in the below format.
1. Research and Perform all possible steps for obtaining the solution.
2. For Statistics calculations, an explanation of the solutions should be documented in detail
along with codes. Use the same word document to fill in your explanation.
Must follow these guidelines:
2.1 Be thorough with the concepts of Probability, Probability Distributions, Business
Moments, and Univariate & Bivariate visualizations.
2.2 For True/False Questions, or short answer type questions explanation is a must.
2.3 Python code for Univariate Analysis (histogram, box plot, bar plots, etc.) the data
distribution is to be attached.
3. All the codes (executable programs) should execute without errors
4. Code modularization should be followed
5. Each line of code should have comments explaining the logic and why you are using that
function