Data Science Course Agenda

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Data Science

Course Agenda

1
Statistical
Analytics

2
• Data types and its measures
• Random Variable, its applications and exercises
• Probability – Applications with examples
• Probability distribution with examples
• Sampling Funnel – why and how
• Measures of Central Tendency
• Mean
• Median
• Mode
• Measures of dispersion
• Variance
• Standard Deviation
• Range – its derivation
3
• Measures of Skewness and Kurtosis – Graphical representation and
application
• Various graphical representation of data for analysis
• Bar Chart
• Histogram
• Box Plot
• Scatter Plot
• Continuous Probability distribution
• Standard Normal distribution / Z distribution
• F – distribution
• Students t distribution
• Chi square distribution

4
• Discrete probability distribution
• Binomial distribution
• Negative Binomial distribution
• Poisson distribution
• Computing probability from Normal Distribution
• Building Normal Q-Q plots & its interpretation
• Central Limit Theorem for sampling variations
• Confidence Interval – Computation and analysis

5
Hypothesis testing
– what & how

6
• Formulating a hypothesis statement
• Parametric tests
• 1 sample, 2 sample test
• 1 sample Z test
• 1 Proportion, 2 Proportion test
• Paired t test
• One way ANOVA
• Chi- Square test
• Nonparametric tests
• 1 sample sign test
• Mann - Whitney test
• Kruskal – Wallis test
• Mood’s Median test
7
Regression
Analysis

8
• Measure of correlation coefficient and its analysis
• Regression model using “Ordinary Least Squares”
• Coefficient of determination as a strength of a model
• Prediction interval and Confidence interval
• Prerequisites to Regression
• Regression techniques
• Linear Regression
• Simple
• Multiple
• Logistic Regression
• Simple
• Multiple

9
• Advanced Regression
• Negative Binomial
• Poisson
• Zero – Inflated
• Hurdle
• LOESS
• Polynomial
• Logit and Probit analysis
• Model building using regression
• Measures of accuracy
• Model improvement techniques
• Analysis of regression output with case studies

10
• Imputation Techniques
• Listwise, Pairwise Deletion
• Mean / Mode Substitution
• Regression Imputation
• Hot Deck, KNN Imputation
• Survival Analysis
• Time to event data
• Prediction techniques

11
Data Mining /
machine Learning

12
• Supervised vs Unsupervised
• Basic Matrix Algebra
• Data Mining Unsupervised
• Clustering – its applications and limitation
• Hierarchal
• Non Hierarchal (K-Means)
• Network Analysis
• Measures of strength
• Introduction to Webpage ranking
• Affinity Analysis / Association Rules
• Measures of association Support, Confidence, Lift Ratio
• Sequential pattern mining

13
• Recommender Systems
• Methods and tricks of the trade
• Dimension Reduction Techniques
• Principle Component Analysis
• Singular Value Decomposition
• Data Mining – Supervised
• Black Box demystified
• Neural Networks
• Support Vector Machines
• Classification / Pattern mining
• K Nearest Neighbor
• Naïve Bayes

14
• Decision Tree & Random Forest
• Decision Tree C5.0
• Ensemble Techniques
• Boosting
• Bagging
• Gradient Boosting & Extreme Gradient Boosting

15
Text Mining &
Natural Language
Processing

16
• Text extraction from webpage
• Word clouds – analysis with context
• Negative and Positive words
• NLP
• Latent Dirichlet Allocation (LDA)
• Structured Extraction
• Emotion Mining

17
Forecasting

18
• Strategy for Forecasting
• Analysis by Graphical Representation
• Components in a time series data
• Plots of Time series data
• Autocorrelation function / Correlogram
• Visualizations – How to preform
• Methods of Forecast
• Naïve methods
• Simple and Moving Average
• Model driven
• Regression Model – Linear,
• Exponential, Quadratic
• Econometric models
19
• Seasonality factored model
• Autoregressive model
• Random walk
• Data Driven
• Smoothing
• Exponential Smoothing
• Advanced Exponential Smoothing
i. Holt’s Method
ii. Winter Method
• AR, MA, ARIMA models
• Analysis of errors in forecast
• Skewness of Error

20
• Types of error measure
• Mean Error (ME)
• Mean Absolute Deviation (MAD)
• Means Squared Error (MSE)
• Root Mean Squared Error (RMSE)
• Mean Percentage Error (MPE)
• Mean Absolute Percentage Error (MAPE)

21
Data Visualization

22
• 3 important principles of visualization
• Lie Factor
• Using consistent scales
• Presenting data in the context
• Data-ink Ratio
• Tufte’s Graphical Integrity Rules
• Tufte’s Principles for Analytical Design
• Various chart junks & how to avoid chart junks
• Dashboards – Good, Bad & Ugly
• Affordance Theory

23
TABleau

24
• Introduction to the various file types
• How to access help
• Quick Introduction to the user interface in Tableau
• How to connect to the data sources
• How to join the various data sources
• How to create data visualization using Tableau feature “Show Me”
• Reorder & remove visualization fields
• How to sort & filter data
• How to create a calculated field
• How to perform operations using cross-tab

25
• Working with workbook data & worksheets
• How to create a packaged workbook
• Creating various charts
• Creating maps & setting map options
• Creating dashboards & working with dashboard

26
R & R Studio
• Introduction to R
• Working with Packages
• Performing various regression and data mining techniques using R
Studio

NodeXL
Introduction to NodeXL and its application in Network Analysis

XLMiner
Using XLMiner for performing various forecasting techniques

27
Python
Performing various regression and data mining Techniques using Python

Minitab
Performing Hypothesis testing using Minitab

28
Thank You

29

You might also like