From sample mean
to population mean
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N
Alexander A. Ramírez M.
CEO @ Synergy Vision
Sample mean review
LAW OF LARGE NUMBERS
The sample mean approaches the
expected value as the sample size
increases.
FOUNDATIONS OF PROBABILITY IN PYTHON
Sample mean review (Cont.)
¯ x1 + x2
Sample mean = X2 =
2
FOUNDATIONS OF PROBABILITY IN PYTHON
Sample mean review (Cont.)
¯ x1 + x2 + x3
Sample mean = X3 =
3
FOUNDATIONS OF PROBABILITY IN PYTHON
Sample mean review (Cont.)
¯ x1 + x2 + ⋯ + xn
Sample mean = Xn =
n
FOUNDATIONS OF PROBABILITY IN PYTHON
Sample mean review (Cont.)
¯ x1 + x2 + ⋯ + xn
Sample mean = Xn = → E(X)
n
FOUNDATIONS OF PROBABILITY IN PYTHON
Generating the sample
# Import binom and describe
from scipy.stats import binom
from scipy.stats import describe
# Sample of 250 fair coin flips
samples = binom.rvs(n=1, p=0.5, size=250, random_state=42)
# Print first 100 values from the sample
print(samples[0:100])
[0 1 1 1 0 0 0 1 1 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0
0 1 0 0 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 1 1 0 0 1
1 1 1 0 0 0 1 1 0 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 0 0]
FOUNDATIONS OF PROBABILITY IN PYTHON
Calculating the sample mean
# Calculate the sample mean
print(describe(samples[0:10]).mean)
0.6
FOUNDATIONS OF PROBABILITY IN PYTHON
FOUNDATIONS OF PROBABILITY IN PYTHON
FOUNDATIONS OF PROBABILITY IN PYTHON
Plotting the sample mean
from scipy.stats import binom
from scipy.stats import describe
import matplotlib.pyplot as plt
# Define our variables
coin_flips, p, sample_size , averages = 1, 0.5, 1000, []
# Generate the sample
samples = binom.rvs(n=coin_flips, p=p, size=sample_size, random_state=42)
FOUNDATIONS OF PROBABILITY IN PYTHON
Plotting the sample mean (Cont.)
# Calculate the sample mean
for i in range(2,sample_size+1):
averages.append(describe(samples[0:i]).mean)
# Print the first values of averages
print(averages[0:10])
[0.5, 0.6666666666666666, 0.75, 0.6, 0.5, 0.42857142857142855, 0.5,
0.5555555555555556,0.6, 0.5454545454545454]
FOUNDATIONS OF PROBABILITY IN PYTHON
Plotting the sample mean (Cont.)
# Add population mean line and sample mean plot
plt.axhline(binom.mean(n=coin_flips, p=p), color='red')
plt.plot(averages, '-')
# Add legend
plt.legend(("Population mean","Sample mean"), loc='upper right')
plt.show()
FOUNDATIONS OF PROBABILITY IN PYTHON
Sample mean plot
FOUNDATIONS OF PROBABILITY IN PYTHON
Let's practice!
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N
Adding random
variables
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N
Alexander A. Ramírez M.
CEO @ Synergy Vision
The central limit theorem (CLT)
The sum of random variables tends to a normal distribution as the number of them grows to
in nity.
Conditions:
The variables must have the same distribution.
The variables must be independent.
FOUNDATIONS OF PROBABILITY IN PYTHON
FOUNDATIONS OF PROBABILITY IN PYTHON
FOUNDATIONS OF PROBABILITY IN PYTHON
FOUNDATIONS OF PROBABILITY IN PYTHON
FOUNDATIONS OF PROBABILITY IN PYTHON
Poisson population plot
# Add the imports
from scipy.stats import poisson, describe
from matplotlib import pyplot as plt
import numpy as np
# Generate the population
population = poisson.rvs(mu=2, size=1000, random_state=20)
# Draw the histogram with labels
plt.hist(population, bins=range(9), width=0.8)
plt.show()
FOUNDATIONS OF PROBABILITY IN PYTHON
FOUNDATIONS OF PROBABILITY IN PYTHON
Sample means plot
# Generate 350 sample means, selecting
# from population values
np.random.seed(42)
# Define list of sample means
sample_means = []
for _ in range(350):
# Select 10 from population
sample = np.random.choice(population, 10)
# Calculate sample mean of sample
sample_means.append(describe(sample).mean)
FOUNDATIONS OF PROBABILITY IN PYTHON
Sample means plot (Cont.)
# Draw histogram with labels
plt.xlabel("Sample mean values")
plt.ylabel("Frequency")
plt.title("Sample means histogram")
plt.hist(sample_means)
plt.show()
FOUNDATIONS OF PROBABILITY IN PYTHON
Let's add random
variables
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N
Linear regression
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N
Alexander A. Ramírez M.
CEO @ Synergy Vision
Linear functions
FOUNDATIONS OF PROBABILITY IN PYTHON
Linear function parameters
y = slope ∗ x + intercept
FOUNDATIONS OF PROBABILITY IN PYTHON
Linear function with random perturbations
y = slope ∗ x + intercept + random_number
FOUNDATIONS OF PROBABILITY IN PYTHON
Start from the data and find a model that fits
FOUNDATIONS OF PROBABILITY IN PYTHON
What model will fit the data?
What would be the criteria to determine which is the best model?
FOUNDATIONS OF PROBABILITY IN PYTHON
What model will fit the data? (Cont.)
FOUNDATIONS OF PROBABILITY IN PYTHON
FOUNDATIONS OF PROBABILITY IN PYTHON
FOUNDATIONS OF PROBABILITY IN PYTHON
Probability and statistics in action
FOUNDATIONS OF PROBABILITY IN PYTHON
Calculating linear model parameters
# Import LinearRegression
from sklearn.linear_model import LinearRegression
# sklearn linear model
model = LinearRegression()
model.fit(hours_of_study, scores)
# Get parameters
slope = model.coef_[0]
intercept = model.intercept_
# Print parameters
print(slope, intercept)
(1.496703900384545, 52.44845266434719)
FOUNDATIONS OF PROBABILITY IN PYTHON
Predicting scores based on hours of study
# Score prediction
score = model.predict(np.array([[15]]))
print(score)
[74.89901117]
FOUNDATIONS OF PROBABILITY IN PYTHON
Plotting the linear model
import matplotlib.pyplot as plt
plt.scatter(hours_of_study, scores)
plt.plot(hours_of_study_values, model.predict(hours_of_study_values))
plt.show()
FOUNDATIONS OF PROBABILITY IN PYTHON
FOUNDATIONS OF PROBABILITY IN PYTHON
Let's practice with
linear models
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N
Logistic regression
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N
Alexander A. Ramírez M.
CEO @ Synergy Vision
Original data
FOUNDATIONS OF PROBABILITY IN PYTHON
New data
FOUNDATIONS OF PROBABILITY IN PYTHON
Where would you draw the line?
FOUNDATIONS OF PROBABILITY IN PYTHON
Solution based on probability
FOUNDATIONS OF PROBABILITY IN PYTHON
The logistic function
logistic(t) = logistic(slope ∗ x + intercept)
FOUNDATIONS OF PROBABILITY IN PYTHON
Changing the slope
FOUNDATIONS OF PROBABILITY IN PYTHON
Changing the intercept
FOUNDATIONS OF PROBABILITY IN PYTHON
From data to probability
FOUNDATIONS OF PROBABILITY IN PYTHON
Outcomes
FOUNDATIONS OF PROBABILITY IN PYTHON
Misclassifications
FOUNDATIONS OF PROBABILITY IN PYTHON
Logistic regression
# Import LogisticRegression
from sklearn.linear_model import LogisticRegression
# sklearn logistic model
model = LogisticRegression(C=1e9)
model.fit(hours_of_study, outcomes)
# Get parameters
beta1 = model.coef_[0][0]
beta0 = model.intercept_[0]
# Print parameters
print(beta1, beta0)
(1.3406531235010786, -15.05906237996095)
FOUNDATIONS OF PROBABILITY IN PYTHON
Predicting outcomes based on hours of study
hours_of_study_test = [[10]]
outcome = model.predict(hours_of_study_test)
print(outcome)
array([False])
FOUNDATIONS OF PROBABILITY IN PYTHON
Probability calculation
# Put value in an array
value = np.asarray(9).reshape(-1,1)
# Calculate the probability for 9 hours of study
print(model.predict_proba(value)[:,1])
array([0.04773474])
FOUNDATIONS OF PROBABILITY IN PYTHON
Let's practice!
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N
Wrapping up
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N
Alexander A. Ramírez M.
CEO @ Synergy Vision
Fundamental concepts
FOUNDATIONS OF PROBABILITY IN PYTHON
Important probability distributions
FOUNDATIONS OF PROBABILITY IN PYTHON
The most important results
FOUNDATIONS OF PROBABILITY IN PYTHON
Linear and logistic regression
FOUNDATIONS OF PROBABILITY IN PYTHON
Keep learning at
DataCamp!
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N