Coding Final Study Guide Notes

Uploaded by

antadiiagne

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views3 pages

Coding Final Study Guide Notes

Uploaded by

antadiiagne

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Lecture 5: Stats & Probability Lecture 7: Hypothesis Testing

Population vs Sample Central Limit Theorem

population: all possible values that could’ve been collected Distro of sample mean as sample size increases → approaches normal
sample: each singular data point actually collected Small N: sampling distro resembles original pop distro
rand num gen: pop= range of values that could’ve been, Moderate N (8): distro smooths, clusters toward true pop.mean (bell)
sample =values gen Large N >30: distro approaches normal
Calculate Stats & Discuss their Meaning Distro of raw data → approaches original pop distro
if np.mean & np.median = similar → distribution is not skewed Drawing Random Samples
np.std(name, ddof=1): measurements +/- std away from mean
range: np.max() - np.min() if large relative to mean → outliers
scipystats.mode: helpful if data = discrete values, unhelpful if
data= decimaled Manipulating Random Sample
scipystats.skew: negative means tail to left, positive =tail to right Np.random.rand(N): draws from uniform distro with default interval [0, 1]
scipystats.kurtosis(name, fisher=False): 3 = normal, <3 = flatter 0.5 * np.random.rand(N): multiply by decimal make interval smaller [0, 0.5]
(platykurtic), >3 =peaked (leptokurtic) 6.0 + np.random.rand(N): add a number shifts interval [6, 7]
Plotting Histogram w/ Correct Bins Calculate Bounds for 99% Confidence Interval:

Occurrence Probability for Theoretical Distros:

Prob that sample from norm distro w/mean 6.5 will be > than
5.5:

Performing Hypothesis Test for 2 : comparing 2 slices within dataset

Sampling Distribution, Sample Size & Number of Samples:

Population distr: total set of measurements
Sample distr of sample mean: distr of means collected from
diff samples
Number of Samples = # sets of data → increasing will make
distro converge at normal, no effect on mean
Sample size = # of measurements w/in each set → increasing
will make sample distro narrower & decrease uncertainty of
mean SEM = sigma/sqrt(n)
Practice Problems:
select data along specific coordinate values →sel()
timeseries = temp.mean(dim=('lon','lat'))
Best way to select data at specific lon & lat:
ds.temperature.sel(lat=34.05, lon=-118.25, method="nearest")
plot time-averaged spatial heatmap using temp variable from ds:
ds.temperature.mean(dim="time").plot()
“The t-stat x > the crit value y at a 90% significance level. At this sig level,
ds = xr.open_dataset(“path”) we reject the null hypothesis that noon mean pH is similar or < in the
morning and adopt the alt hypo that pH > in the afternoon”
Lecture 6: Time Series Analysis Lecture 7: Hypothesis Testing Continued
Fitting Polynomial Functions to Data: SubPlot Sample Distr of Sample Mean @ Sample Sizes:

Overfitting: model too complex & captures noise → poor generalization
to new data.
Underfitting: model too simple & fails to capture true pattern

Linear Interpolation:

easy to implement & no extreme oscillations, use on sparse data points
Spline Interpolation:

Lecture 8: Multi-Dimensional Data Analysis

Same as linear, add cubic argument to 3rd code line
Use when data has natural continuous variation & need smooth curve

Global Fit & Applied to a Value:

Extrapolation:
interp.interp1d(x, y, bounds_error=False, fille_value=”extrapolate”
How Polynomial Functions Fit Data to Curves: (LSR)
1 specify function form (polynomial, exponential, constant)
2 guess initial values for constants in function
3 define squared error residual metric quantifying mismatch between
observed data & current function values
4 use algorithm to change coefficient values to minimize error metric→
finds least-square solution best fitting data
Quality of Functional Fit Quality:
improves when quantity of data points increases or noise decreases
Higher order fits have extreme oscillations between data points, even if
data seems perfectly matched by a higher order fit → default is to
choose SIMPLEST fit matching data → less prone to high frequency
oscillations Using Xarray.plot(), .contour, etc.
Calculate Correlation Coefficient between Datasets:

always linear relationship, >0.7 strong, 0.3-0.7 moderate, <0.3 weak
2 independent datasets can still have strong correlation, indicating they
are impacted by a common 3rd variable
Other
Ddof: If pop std → Ddof = 1/n, if sample std → Ddof = 1/(n-1)
-matrices in format (#rows, #columns)
Calculating Degrees of Freedom
For confidence interval→ dof = n-1
For 2-sample t-test→ dof =n1+n2−2

Statistics Made Easy
100% (2)
Statistics Made Easy
412 pages
Full (Ebook PDF) Statistics Unplugged 4th Edition by Sally Caldwell Ebook All Chapters
100% (1)
Full (Ebook PDF) Statistics Unplugged 4th Edition by Sally Caldwell Ebook All Chapters
55 pages
Data Science Cheatsheet
100% (1)
Data Science Cheatsheet
5 pages
Homework8 Solutions EE131A
No ratings yet
Homework8 Solutions EE131A
3 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
Lecture 6 Python
No ratings yet
Lecture 6 Python
38 pages
Probability and Statistics Course
No ratings yet
Probability and Statistics Course
5 pages
Sampling and Standard Error
No ratings yet
Sampling and Standard Error
33 pages
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
No ratings yet
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
44 pages
CS-3361-Data-science-lab Manual
No ratings yet
CS-3361-Data-science-lab Manual
36 pages
Fitting Data - SciPy Cookbook Documentation PDF
No ratings yet
Fitting Data - SciPy Cookbook Documentation PDF
10 pages
Solution
No ratings yet
Solution
148 pages
Machine Learning Lab Word 12-1-2025. Document
No ratings yet
Machine Learning Lab Word 12-1-2025. Document
68 pages
Complete Data Science Questions
No ratings yet
Complete Data Science Questions
5 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
Static Tics
No ratings yet
Static Tics
47 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
TYCS Practical
No ratings yet
TYCS Practical
26 pages
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
No ratings yet
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
253 pages
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
No ratings yet
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
10 pages
Lecture 4 - Data Wrangling
No ratings yet
Lecture 4 - Data Wrangling
41 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Statistical Methods For Data Science
100% (2)
Statistical Methods For Data Science
406 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Data Science Manual
No ratings yet
Data Science Manual
16 pages
Unit 1,2
No ratings yet
Unit 1,2
17 pages
PML Ex3
No ratings yet
PML Ex3
20 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
UnivariateRegression Summary
No ratings yet
UnivariateRegression Summary
36 pages
Stat 509 Notes
100% (1)
Stat 509 Notes
195 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Confidence Interval and Credintial Interval
No ratings yet
Confidence Interval and Credintial Interval
15 pages
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
No ratings yet
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
4 pages
Datascience Lab
No ratings yet
Datascience Lab
24 pages
Random Variable
No ratings yet
Random Variable
10 pages
Nac PDF
No ratings yet
Nac PDF
23 pages
DS Lab Manual Lovesh 1
No ratings yet
DS Lab Manual Lovesh 1
15 pages
DAV Assign6
No ratings yet
DAV Assign6
8 pages
Ad3411 - Data Science and Analytics Laboratory
No ratings yet
Ad3411 - Data Science and Analytics Laboratory
26 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
اسايمنت
No ratings yet
اسايمنت
28 pages
ML LAB Mannual - Index
No ratings yet
ML LAB Mannual - Index
29 pages
Lab Manual (DAV)
No ratings yet
Lab Manual (DAV)
33 pages
EDA Document
No ratings yet
EDA Document
13 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
Data Sci HW1
No ratings yet
Data Sci HW1
8 pages
DS Chapter - 2
No ratings yet
DS Chapter - 2
73 pages
178 HW 9
No ratings yet
178 HW 9
153 pages
Assignment 02
No ratings yet
Assignment 02
6 pages
Ad3411-Data Science and Analytics Laboratory
No ratings yet
Ad3411-Data Science and Analytics Laboratory
27 pages
Maths 1
No ratings yet
Maths 1
31 pages
Data Science
No ratings yet
Data Science
15 pages
CS194 Lec 06 EDA
No ratings yet
CS194 Lec 06 EDA
40 pages
Teks DATA SCIENCE Syllabus - QR
No ratings yet
Teks DATA SCIENCE Syllabus - QR
26 pages
ML Lab Manual
No ratings yet
ML Lab Manual
28 pages
Python Programs
No ratings yet
Python Programs
7 pages
Data Science Practicals
No ratings yet
Data Science Practicals
47 pages
S 15 Notes
No ratings yet
S 15 Notes
216 pages
4.5-Bootstrap Variations
No ratings yet
4.5-Bootstrap Variations
25 pages
ML Lab
No ratings yet
ML Lab
12 pages
Lecture 10
No ratings yet
Lecture 10
19 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Uppp 146 PS3
No ratings yet
Uppp 146 PS3
8 pages
Econ Midterm
No ratings yet
Econ Midterm
4 pages
Spillers - Mamas Baby Papas Maybe
No ratings yet
Spillers - Mamas Baby Papas Maybe
19 pages
pnv4 Murillo 22untimely Dispatch 2422
No ratings yet
pnv4 Murillo 22untimely Dispatch 2422
9 pages
119686
No ratings yet
119686
24 pages
LR-Heteroskedastisitas Test-Log10 Method
No ratings yet
LR-Heteroskedastisitas Test-Log10 Method
4 pages
Estimation: Click To Edit Master Subtitle Style
No ratings yet
Estimation: Click To Edit Master Subtitle Style
18 pages
Mba 1 Sem Business Statistics Rmb104 2021
100% (1)
Mba 1 Sem Business Statistics Rmb104 2021
2 pages
SML Book Draft Latest (001 046)
No ratings yet
SML Book Draft Latest (001 046)
46 pages
D. Score of Entering Freshmen in The SPAMAST Admission Test SY 2018 - 2019
No ratings yet
D. Score of Entering Freshmen in The SPAMAST Admission Test SY 2018 - 2019
3 pages
Basic Statistical Tools in Research and Data Analysis
No ratings yet
Basic Statistical Tools in Research and Data Analysis
8 pages
Econometrics Edited Chapter-4
No ratings yet
Econometrics Edited Chapter-4
35 pages
Lessons
No ratings yet
Lessons
16 pages
Lecture Probabilistic Approach in Slope Stability Analyses
No ratings yet
Lecture Probabilistic Approach in Slope Stability Analyses
26 pages
Class 2 - Poisson Distribution
No ratings yet
Class 2 - Poisson Distribution
20 pages
CHAPTER 7 Project Management and Network Analysis
No ratings yet
CHAPTER 7 Project Management and Network Analysis
31 pages
Transfer Intervention
No ratings yet
Transfer Intervention
17 pages
Lecture On Bootstrap - Lecture Notes
No ratings yet
Lecture On Bootstrap - Lecture Notes
29 pages
Tarea Unidad IV AEF 1052 Unidad Iva
No ratings yet
Tarea Unidad IV AEF 1052 Unidad Iva
1 page
Regression Modeling Strategies - With Applications To Linear Models by Frank E. Harrell
100% (4)
Regression Modeling Strategies - With Applications To Linear Models by Frank E. Harrell
598 pages
Module III - Static Reliability Analysis and Design
No ratings yet
Module III - Static Reliability Analysis and Design
35 pages
Introduction To Logistic Regression: Implement Linear Equation
No ratings yet
Introduction To Logistic Regression: Implement Linear Equation
7 pages
6 - CFA-SEM Intro - 4-18-11
100% (1)
6 - CFA-SEM Intro - 4-18-11
94 pages
Forecasting
No ratings yet
Forecasting
20 pages
Null and Alternative Hypotheses: N or n/6. in Fact, For This Example, The Expected Number of Candies For Each
No ratings yet
Null and Alternative Hypotheses: N or n/6. in Fact, For This Example, The Expected Number of Candies For Each
2 pages
Pedersen Et Al 2019 - GAMS, Pacote MGCV
No ratings yet
Pedersen Et Al 2019 - GAMS, Pacote MGCV
42 pages
Probab Aug 2023
No ratings yet
Probab Aug 2023
4 pages
Summative 2 SP
No ratings yet
Summative 2 SP
3 pages
Examination: Subject CT4 Models (Includes Both 103 and 104 Parts) Core Technical
No ratings yet
Examination: Subject CT4 Models (Includes Both 103 and 104 Parts) Core Technical
20 pages
4th Sem Math Probability 7 Page Suggestions
No ratings yet
4th Sem Math Probability 7 Page Suggestions
17 pages
Abe 2011 Symmetric Circular Models Through Duplication and Cosine Perturbation
No ratings yet
Abe 2011 Symmetric Circular Models Through Duplication and Cosine Perturbation
12 pages

Coding Final Study Guide Notes

Uploaded by

Coding Final Study Guide Notes

Uploaded by

Lecture 5: Stats & Probability Lecture 7: Hypothesis Testing

Population vs Sample Central Limit Theorem

Occurrence Probability for Theoretical Distros:

Sampling Distribution, Sample Size & Number of Samples:​

Global Fit & Applied to a Value:​

You might also like

Sampling Distribution, Sample Size & Number of Samples:

Global Fit & Applied to a Value: