Fundamentals of Data Science

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Statistics and Probability in Data Science

Presentation Summary
Introduction

▶ Statistics and probability are foundational to data science


▶ Essential for AI, machine learning, and deep learning
▶ Mathematics is embedded in every aspect of our lives
Data Types

▶ Qualitative Data
▶ Nominal: No inherent order (e.g., gender, race)
▶ Ordinal: Ordered series (e.g., ratings)
▶ Quantitative Data
▶ Discrete: Limited possible values (e.g., number of students)
▶ Continuous: Unlimited possible values (e.g., weight)
Variable Types

▶ Discrete variables (categorical)


▶ Continuous variables
▶ Independent variables
▶ Dependent variables
Statistics Overview

▶ Definition: Applied mathematics for data collection, analysis,


interpretation, and presentation
▶ Types:
▶ Descriptive Statistics: Summarizes data features
▶ Inferential Statistics: Makes predictions based on samples
▶ Key concepts: Population and Sample
Sampling Techniques

▶ Probability Sampling
▶ Random sampling
▶ Systematic sampling
▶ Stratified sampling
▶ Cluster sampling
▶ Non-probability Sampling
Information Gain and Entropy

▶ Entropy: Measure of uncertainty in data


▶ Information Gain: How much information a feature provides
about the final outcome
▶ Used in decision trees and random forests
▶ Example: Predicting if a match can be played based on
weather conditions
Probability Theory

▶ Probability: Measure of how likely an event will occur


▶ Key concepts:
▶ Random experiment
▶ Sample space
▶ Event
▶ Types of events: Disjoint and Non-disjoint
Types of Probability

▶ Marginal Probability: Unconditional on any other event


▶ Joint Probability: Measure of two events happening
simultaneously
▶ Conditional Probability: Probability based on the occurrence
of a previous event
Probability Distribution Functions

▶ Probability Density Function (PDF)


▶ Normal Distribution
▶ Central Limit Theorem
Bayes’ Theorem

▶ Shows relation between conditional probability and its inverse


▶ Formula: P(A|B) = P(B|A)∗P(A)
P(B)
▶ Used in naive Bayes algorithm (e.g., spam filtering)
Inferential Statistics

▶ Forms inferences and predictions about a population based on


a sample
▶ Point Estimation vs. Interval Estimation
▶ Confidence Interval and Margin of Error
▶ Methods of Estimation:
▶ Method of Moments
▶ Maximum Likelihood
▶ Bayes Estimator
▶ Bayes Unbiased Estimator
Conclusion

▶ Statistics and probability are crucial for data science


▶ Understanding these concepts helps in:
▶ Data analysis
▶ Machine learning model development
▶ Interpreting results
▶ Making informed decisions

You might also like