Data Science Course Agenda
Data Science Course Agenda
Data Science Course Agenda
Course Agenda
1
Statistical
Analytics
2
• Data types and its measures
• Random Variable, its applications and exercises
• Probability – Applications with examples
• Probability distribution with examples
• Sampling Funnel – why and how
• Measures of Central Tendency
• Mean
• Median
• Mode
• Measures of dispersion
• Variance
• Standard Deviation
• Range – its derivation
3
• Measures of Skewness and Kurtosis – Graphical representation and
application
• Various graphical representation of data for analysis
• Bar Chart
• Histogram
• Box Plot
• Scatter Plot
• Continuous Probability distribution
• Standard Normal distribution / Z distribution
• F – distribution
• Students t distribution
• Chi square distribution
4
• Discrete probability distribution
• Binomial distribution
• Negative Binomial distribution
• Poisson distribution
• Computing probability from Normal Distribution
• Building Normal Q-Q plots & its interpretation
• Central Limit Theorem for sampling variations
• Confidence Interval – Computation and analysis
5
Hypothesis testing
– what & how
6
• Formulating a hypothesis statement
• Parametric tests
• 1 sample, 2 sample test
• 1 sample Z test
• 1 Proportion, 2 Proportion test
• Paired t test
• One way ANOVA
• Chi- Square test
• Nonparametric tests
• 1 sample sign test
• Mann - Whitney test
• Kruskal – Wallis test
• Mood’s Median test
7
Regression
Analysis
8
• Measure of correlation coefficient and its analysis
• Regression model using “Ordinary Least Squares”
• Coefficient of determination as a strength of a model
• Prediction interval and Confidence interval
• Prerequisites to Regression
• Regression techniques
• Linear Regression
• Simple
• Multiple
• Logistic Regression
• Simple
• Multiple
9
• Advanced Regression
• Negative Binomial
• Poisson
• Zero – Inflated
• Hurdle
• LOESS
• Polynomial
• Logit and Probit analysis
• Model building using regression
• Measures of accuracy
• Model improvement techniques
• Analysis of regression output with case studies
10
• Imputation Techniques
• Listwise, Pairwise Deletion
• Mean / Mode Substitution
• Regression Imputation
• Hot Deck, KNN Imputation
• Survival Analysis
• Time to event data
• Prediction techniques
11
Data Mining /
machine Learning
12
• Supervised vs Unsupervised
• Basic Matrix Algebra
• Data Mining Unsupervised
• Clustering – its applications and limitation
• Hierarchal
• Non Hierarchal (K-Means)
• Network Analysis
• Measures of strength
• Introduction to Webpage ranking
• Affinity Analysis / Association Rules
• Measures of association Support, Confidence, Lift Ratio
• Sequential pattern mining
13
• Recommender Systems
• Methods and tricks of the trade
• Dimension Reduction Techniques
• Principle Component Analysis
• Singular Value Decomposition
• Data Mining – Supervised
• Black Box demystified
• Neural Networks
• Support Vector Machines
• Classification / Pattern mining
• K Nearest Neighbor
• Naïve Bayes
14
• Decision Tree & Random Forest
• Decision Tree C5.0
• Ensemble Techniques
• Boosting
• Bagging
• Gradient Boosting & Extreme Gradient Boosting
15
Text Mining &
Natural Language
Processing
16
• Text extraction from webpage
• Word clouds – analysis with context
• Negative and Positive words
• NLP
• Latent Dirichlet Allocation (LDA)
• Structured Extraction
• Emotion Mining
17
Forecasting
18
• Strategy for Forecasting
• Analysis by Graphical Representation
• Components in a time series data
• Plots of Time series data
• Autocorrelation function / Correlogram
• Visualizations – How to preform
• Methods of Forecast
• Naïve methods
• Simple and Moving Average
• Model driven
• Regression Model – Linear,
• Exponential, Quadratic
• Econometric models
19
• Seasonality factored model
• Autoregressive model
• Random walk
• Data Driven
• Smoothing
• Exponential Smoothing
• Advanced Exponential Smoothing
i. Holt’s Method
ii. Winter Method
• AR, MA, ARIMA models
• Analysis of errors in forecast
• Skewness of Error
20
• Types of error measure
• Mean Error (ME)
• Mean Absolute Deviation (MAD)
• Means Squared Error (MSE)
• Root Mean Squared Error (RMSE)
• Mean Percentage Error (MPE)
• Mean Absolute Percentage Error (MAPE)
21
Data Visualization
22
• 3 important principles of visualization
• Lie Factor
• Using consistent scales
• Presenting data in the context
• Data-ink Ratio
• Tufte’s Graphical Integrity Rules
• Tufte’s Principles for Analytical Design
• Various chart junks & how to avoid chart junks
• Dashboards – Good, Bad & Ugly
• Affordance Theory
23
TABleau
24
• Introduction to the various file types
• How to access help
• Quick Introduction to the user interface in Tableau
• How to connect to the data sources
• How to join the various data sources
• How to create data visualization using Tableau feature “Show Me”
• Reorder & remove visualization fields
• How to sort & filter data
• How to create a calculated field
• How to perform operations using cross-tab
25
• Working with workbook data & worksheets
• How to create a packaged workbook
• Creating various charts
• Creating maps & setting map options
• Creating dashboards & working with dashboard
26
R & R Studio
• Introduction to R
• Working with Packages
• Performing various regression and data mining techniques using R
Studio
NodeXL
Introduction to NodeXL and its application in Network Analysis
XLMiner
Using XLMiner for performing various forecasting techniques
27
Python
Performing various regression and data mining Techniques using Python
Minitab
Performing Hypothesis testing using Minitab
28
Thank You
29