AI & ML Syllabus
AI & ML Syllabus
OBJECTIVES
Ÿ Acquire advanced Data Analysis skills.
Ÿ Stay Industry relevant and grow in your career.
Ÿ Create AI/ML solutions for various business problems.
Ÿ Build and deploy production grade AI/ML applications.
Ÿ Apply AI/ML methods, techniques and tools
COVERAGE
20% 20%
Descriptive Statistics
Ÿ Data exploration (histograms, bar chart, box plot, line graph, scatter plot)
Ÿ Qualitative and Quantitative Data
Ÿ Measure of Central Tendency (Mean, Median and Mode),
Ÿ Measure of Positions (Quartiles, Deciles, Percentiles and Quantiles),
Ÿ Measure of Dispersion (Range, Median, Absolute deviation about median, Variance and
Standard deviation), Anscombe's quartet
Ÿ Other Measures: Quartile and Percentile, Interquartile Range
Probability
Ÿ Probability (Joint, marginal and conditional probabilities)
Ÿ Probability distributions (Continuous and Discrete)
Ÿ Density Functions and Cumulative functions
This is foundational to Data Scientists. This requires a nontrivial understanding of the real-world problems. It
involves judgments such as those about the relevance and representativeness of the data. This module helps
participants to have a good understanding of the methods, methodologies and techniques from the basics of
statistics and probability obtain supporting evidence through data, isolate or identify factors to construct
models that can uncover relationships and variation in processes.
Ÿ Science of Visualization
Ÿ Visualization Periodic Table
Ÿ Aesthetics and Story telling
Ÿ Concepts of measurement - scales of measurement
Ÿ Design of data collection formats with illustration
Ÿ Principles of data visualization - different methods
of presenting data in business analytics.
Ÿ Concepts of Size, Shape, Color
Unit 11: Inferential Statistics
Ÿ Various Visualization types
Ÿ Develop an intuition how to understand the data,
Ÿ Bubble charts
attributes, distributions
Ÿ Geo-maps (Chlorpeths)
Ÿ Procedure for statistical testing, etc.
Ÿ Gauge charts
Ÿ Test of Hypothesis (Concept of Hypothesis
Ÿ Tree map
testing, Null Hypothesis and Alternative
Ÿ Heat map
Hypothesis)
Ÿ Motion charts
Ÿ Cross Tabulations (Contingency table and their
Ÿ Force Directed Charts etc.,
use, Chi-Square test, Fisher's exact test),
Ÿ One Sample t test (Concept, Assumptions,
Unit 10: Sampling and Estimation Hypothesis, Verification of assumptions,
Ÿ Sample versus population Performing the test and interpretation of results)
Ÿ Sample techniques (simple, stratified, clustered, Ÿ Independent Samples t test
random) Ÿ Paired Samples t test
Ÿ Sampling Distributions
Ÿ One way ANOVA (Post hoc tests: Fisher's LSD,
Ÿ Parameter Estimation
Tukey's HSD).
Ÿ Unbalanced data treatment
Ÿ z-test and F-test
Predictive analytics is an area of statistics that deals with extracting information from data and using it to
predict trends and behavior patterns. Predicting an outcome, predicting counts, predicting a value - all these
have immumerable use cases in CRM, rFaud detection, Portfolio Management, Sales and Marketing. Predictic
Analytics is approached from Regression (glm) and Time Series models in this module.
K-Nearest Neighbors
Ÿ Computational geometry; Voronoi Diagrams; Delaunay Triangulations
Ÿ K-Nearest Neighbor algorithm; Wilson editing and triangulations
Ÿ Aspects to consider while designing K-Nearest Neighbor
Decision Trees
Ÿ ID4, C4.5, CART
Ensembles methods
Ÿ Bagging & boosting and its impact on bias and variance
Ÿ C5.0 boosting
Ÿ Random forest
Ÿ Gradient Boosting Machines and XGBoost
Case study 3: Sentiment Analysis or Topic Mining from New York Times
Ÿ Similarity measures (Cosine Similarity, Chi-Square, N Grams)
Ÿ Part-of-Speech Tagging
Ÿ Stemming and Chunking