Data Science Master Course
Python Specialisation
35,000+ Participants | 3,000+ Trainings | 55+ Countries | 9+ Years
Data Science Master Course - Our Offering
6 60+
Live Class
Courses Hour
3 50+
Capstone Placement
Projects Partners
10+ 100+
Industry Assignment
Experts Hours
www.digitalvidya.com
Course Highlights
This Course is for Anyone with Programming Knowledge
Salient Features
3 Hrs/Week Live Instructor-Led 3 Weeks of Active Q/A Forum Class Labs/Home Assignment
Online Sessions Project Work (10 hours/Week Learning Time)
Govt. of India
Placement Support Individual Attention to Lifetime Access to
(Vskills Certified Course)
Each Learner Updated Content and Videos
Industry and
Top Python Internal Competitions Industry’s Top Python Advisors
Academia Faculty
Tools Covered with Prizes
Industry Relevant Curriculum Career Mentoring Hands-on Approach Money Back Guarantee
www.digitalvidya.com
Data Science using Python (18 Sessions)
Instructor-Led Online Course
Introduction to Data Science Data Visualization
Getting started with Jupyter Notebook Simple & multi-line plots, Multiple figures Simple plot with X and Y axis
Introduction to the Open Data Science learning and competitive platforms Linestyles and color
Mutiple lines on same plot
Python Programming Controlling line properties
Introduction Adding Lables, gridlines, annotations
Operators X and Y ticks and rotations
Data Types Splines
Loops: while & for Legends
Conditionals: if-else Working with Multiple figures and axes
Functions: defining functions, anonymous functions Share X and Y axis
Adding subplots
Scientific computing with Python
Numerical Python (NumPy) Matplotlib and Seaborn
Array Creation Line Graphs
Data Types Bar plots
Shape Manipulation Histograms
Array Indexing Box plot
Broadcasting Stacked plots
Universal Functions Scatter plot
Statistical Methods Pie Chart
Introduction to Pandas Statistics
Data Analysis workflow in Python using Pandas Data Structures Normal Distribution
Indexing and selecting data Hypothesis testing
Statistical Operations Introduction to z-test and t-test
Applying Functions Introduction to Chi-Square distribution
Groupby: split-apply-combine
Handling missing data
Merging multiple datasets
www.digitalvidya.com
Logistic Regression
Machine Learning Introduction
Simple Linear Regression Sigmoid function
Hypothesis testing in Linear regression Logistic regression Model Evaluation Evaluation Metrics
Interpreting slope and intercept coefficients Scoring Confusion Matrix Gain
Cost Function in Linear regression Lift Chart
Residuals analysis Concordant – Discordant Ratio
Interpreting R-square Text Mining
Dummy variables encoding
Clustering
K Means Clustering
Multiple Linear regression
Elbow method
Multicollinearity issue
Hierarchical clustering
Interpreting Adjusted R-square
Kolmogorov Smirnov Chart
Outlier detection and treatment
AUC – ROC Curve
Missing values treatment
Decision Trees
Gini Index
Entropy concept
Classification mechanism
Issues – Overfitting
bias-variance trade-off
Different types of decision trees
Bagging and Boosting concept
Random Forest
Introduction to boosting algorithms
Data Analysis workflow in Python using Pandas Data Structures
Indexing and selecting data
Statistical Operations
Applying Functions
Groupby: split-apply-combine
Handling missing data
Merging multiple datasets
www.digitalvidya.com
Introduction to Tableau (3 Sessions)
Instructor-led Online Course
Introduction to BI Visual Analytics
Connecting to Data Drill Down and Hierarchies
Getting Started with Data Sorting
Managing Extracts Grouping
Saving and Publishing Data Sources Additional Ways to Group
Data Prep with Text and Excel Files Creating Sets
Join Types with Union Working with Sets
Cross-database Joins Parameters
Data Blending Formatting
The Formatting Pane
Dashboards and Stories Tooltips
Getting Started with Dashboards and Stories Trend Lines
Building a Dashboard Reference Lines
Dashboard Objects Forecasting
Dashboard Formatting Clustering
Dashboard Interactivity Using Actions
Story Points Calculations
Getting Started with Calculations
Introduction to Pandas
Data Analysis workflow in Python using Pandas Data Structures
Indexing and selecting data
Statistical Operations
Applying Functions
Groupby: split-apply-combine
Handling missing data
Merging multiple datasets
www.digitalvidya.com
Python Programming Foundation
(10 Chapters)Self Study Course
Python – Starting your journey Functions
in Programming Lambda Expressions
Install Jupyter Notebook Why Function
Basic Data type and Variables Writing simple functions
Basic Math Operators Advanced functions – map, reduce, filter and zip
Comparison Operators Input, Output parameters
String Manipulations Errors, File handling
String datatype Syntax Errors
String operations Exceptions
Date and Time
Compound DataTypes Input/Output
Set File Handling
Tuples
Lists Object-Oriented Programming
List Comprehension Introduction to Object Oriented Programming
Dictionary Creating a Class
Generators & Iterators
Databases
Connection with database Regular Expressions
Import data from CSV to database Introduction to regular expressions
Create, read, update, delete (CRUD) How to write a regular expression
Orderby Various operations on strings using re module
Groupby re.search vs re.findall
Where
String operations
Conditionals and Control Flow
Joins If else
For Loop
Range
Break and continue
www.digitalvidya.com
Statistics Foundation
(17 Chapters) Self Study Course
Data and Statistics Continuous Probability
Elements, Variables, and Observations
Distributions
Scales of Measurement Uniform Probability Distribution
Categorical and Quantitative Data Normal Curve
Cross-Sectional and Time Series Data Standard Normal Probability Distribution
Descriptive Statistics Computing Probabilities for Any Normal Probability Distribution
Statistical Inference
Sampling and Sampling
Descriptive Statistics: Tabular Distributions
and Graphical Uniform Probability Distribution
Normal Curve
Summarizing Categorical Data
Standard Normal Probability Distribution
Summarizing Quantitative Data
Computing Probabilities for Any Normal Probability Distribution
Crosstabulations and Scatter Diagrams
Introduction to Probability
Descriptive Statistics: Simple random sample and its importance
Numerical Measures Difference between descriptive and inferential statistics
Sampling distribution
Measures of Location
Mean and standard deviation
Measures of Variability
Central Limit Theorem and its importance
Measures of Distribution Shape, Relative Location, and Detecting Outliers
Mean and standard deviation for the sampling distribution of the
Box Plot sample proportion
Measures of Association Between Two Variables Sampling distributions of sample variances
Discrete Probability Distributions Anova
An Introduction to Analysis of Variance
Random Variables
Analysis of Variance: Testing for the Equality of k Population Means
Discrete Probability Distributions
Multiple Comparison Procedures
Expected Value and Variance
An Introduction to Experimental Design
Binomial Probability Distribution
Completely Randomized Designs
Poisson Probability Distribution
Randomized Block Design
www.digitalvidya.com
Interval Estimation Simple Linear Regression
Point estimate and confidence interval estimate Simple Linear Regression Model
Construct and interpret confidence interval estimate Regression Model and Regression Equation
Form and interpret confidence interval estimate Estimated Regression Equation
Confidence Intervals for the Population Mean, μ Least Squares Method
Confidence Intervals for the Population Proportion, (large samples) Coefficient of Determination
Correlation Coefficient
Confidence Interval Model Assumptions
Point estimate and confidence interval estimate Testing for Significance
Construct and interpret confidence interval estimate Using the Estimated Regression Equation for Estimation and Prediction
Form and interpret confidence interval estimate Residual Analysis: Validating Model Assumptions
Confidence Intervals for the Population Mean, μ Residual Analysis: Outliers and Influential Observations
Confidence Intervals for the Population Proportion, (large samples)
Model building in Regression
Hypothesis Tests
Regression model-building methodology
Developing Null and Alternative Hypotheses
Dummy variables for categorical variables with more than two
Type I and Type II Errors categories
Population Mean: Known Dummy variables usage in experimental design models
Population Mean: Unknown Lagged values of the dependent variable is regressors
Specification bias and multicollinearity
Inference About Means and Heteroscedasticity and autocorrelation
Proportions with Two Populations
Nonparametric Methods
Inferences About the Difference Between Two Population Means
Sign Test
Inferences About a Population Variance
Wilcoxon Signed-Rank Test
Inferences About Two Population Variances
Mann-Whitney-Wilcoxon Test
Multiple Regression Kruskal-Wallis Test
Rank Correlation
Multiple Regression Model
Least Squares Method
Tests of Goodness of Fit and
Multiple Coefficient of Determination
Independence
Model Assumptions
Goodness of Fit Test: A Multinomial Population
Testing for Significance
Test of Independence
Categorical Independent Variables
Residual Analysis
www.digitalvidya.com
SQL Foundation
(6 Chapters) Self Study Course
Database Basics: Concepts and SQL: Complex Query Building
need of a database MySQL SubQuery
What is a database? MySQL INNER, OUTER, LEFT, RIGHT, CROSS
What is SQL? MySQL UNION – Complete
Database Learn Data Modeling
What is Normalization? 1NF, 2NF, 3NF & BCNF
SQL: Query Optimization
Views in MySQL: Create, Join & Drop
SQL Fundamental: Selecting MySQL INDEXES – Create, Drop & Add Index
and Filtering Data
MySQL Installation
MySQL Create Database & MySQL Data Types
MySQL SELECT Statement
MySQL WHERE Clause with- AND, OR, IN, NOT IN
SQL Fundamental: Updating Data
MySQL query INSERT INTO Table
MySQL UPDATE & DELETE Query
Sorting in MySQL ORDER BY, DESC and ASC
SQL: Data Aggregation and
Functions
MySQL GROUP BY and HAVING Clause
MySQL Wildcards : Like, NOT Like, Escape, ( % ), ( _ )
MYSQL Regular Expressions (REGEXP)
MySQL Functions
MySQL Aggregate Functions: SUM, AVG, MAX, MIN COUNT, DISTINCT
MySQL IS NULL & IS NOT NULL
MySQL AUTO_INCREMENT
MYSQL – ALTER, DROP, RENAME, MODIFYMySQL LIMIT & OFFSET
www.digitalvidya.com
Data Science using R
(15 Chapters) Self Study Course
Introduction to Data Science Exploratory analysis in R
Briefing about analytics domain Descriptive Statistical analysis
Business solving day to day problems using data Sampling in R
Technology platforms Merging data
Reshaping data
Introduction to R programming Central tendencies
The basics of coding on R studio platform Measurements of Dispersion
R nuts and bolts Test of Normality
Basics of R programming Null value treatment
Installing predefined packages Outlier treatment
Inputs and R objects (vector, matrix, dataframes and factors) Correlation analysis
R datatypes
Visualization
Using dplyr package
Text manipulations using Stringr
RStudio Visualizations
Categorical data: Barplot,Pie chart
Reading data (csv file) in R
Numeric: boxplot
Data manipulations and Histogram
looping in R Scatter plot
Line chart
Data manipulations Subsetting dataset
Libraries like ggplot2, Rcolorbrewer
Date and time in R
Interactive dashboard
Loops: while & for
Shiny for interactive graphical dashboards
Conditionals: if-else
Functions: defining functions, anonymous functions Inferential Analysis in R
Apply family of functions Parametric Statistical tests
Basics theory of inferential statistics
Hypothesis tests using Z test
T-statistics test
Two sampled z test and T test
ANOVA
Post-hoc test
www.digitalvidya.com
Non-Parametric Statistical Test Random Forest
Wilcoxen test
Mann-whitney U test
Decision Tree
K.S. test
Runn Test
Support Vector Machines
Chi-square test
Data Loading and file formats Naïve Bayes
Loading JSON files
Unsupervised learning techniques
XML and HTML web scraping
Interacting with HTML and web APIs
Interacting with Databases
Clustering
Text mining/text analytics in R
K-means Clustering
Machine Learning
What is Machine Learning Hierarchical Clustering
Machine Learning Real World Example
Time series analysis
Supervised learning techniques
Linear regression
Linear regression assumptions checks
Building Linear Regression model
Case Study- Linear Regression
Logistic Regression
Understanding Logistic Regression
Classification model building using logistic model
Confusion matrix
www.digitalvidya.com
Course Advisors and Instructors
Course Advisors
SHWETA GUPTA MANAS GARG VISHAL MISHRA
Vice President, Tech. Architect CEO & Co-Founder
Shweta Gupta has 19+ years of Technology Manas Garg heads the Analytics for Vishal is a Technology Influencer and
Leadership experience. She holds a patent and Marketing at Paypal. He takes Data CEO of Right Relevance.
number of publications in ACM, IEEE and IBM Driven Decisions for Marketing Success. (A platform used by millions for content
journals like Redbook and developerWorks. & influencer discovery)
Course Instructors
GANESH NAIK VAISHALI GARG PRITESH SHRIVASTAVA
Ganesh Naik is the author of several books Vaishali Garg is a self-taught data analyst Pritesh is a Data Science enthusiast with an
such as“Learning Linux Shell Scripting”, “Bash with a health-care background. She use ability to turn data into actionable insights and
Cookbook” and “Mastering Python Scripting for Python with Pandas, Numpy, Matplotlib and meaningful stories. He possesses solid
System Administrators.He is an awesome Scikit. She has keen interest in data analysis knowledge and hands-on experience of both
techie working on various Smart City Projectsin using Pandas and is actively answer Pandas quantitative and qualitative analysis and data
India.He also has worked as a corporate related ques-tions on StackOverflow mining.
trainer for ISRO,Intel, GE, Samsung, Motorola, (Vaishaligarg, alias: A-Za-z). Some of her Apart from his profession, he also procures
PSDC(Malaysia),various companies in analysis is available on Kaggle. passion and talent in Dramatics, Travel,
Singapore, Malaysia and India. Story-telling, Martial Artist
www.digitalvidya.com
Capstone Projects (3 Weeks)
Every participant is mandated to solve one Capstone Project for
Certification. The learner is encouraged to solve all available projects to
sharpen the skills across several domains.
Natural Language Processing Bank Marketing
Project Description: Project Description:
This is one of the most applied areas for AI, Data The banking industry is working in a very
Science, and Machine Learning across domains and competitive environment and needs to strat-
industries. The real world is filled with mostly messy egize to grow its business. This project is
text data, and handling text is an important step related to the marketing campaigns related
towards making smarter algorithms. Using IMDB to term deposits, making an interesting
dataset from the movie domain, the learner will multi-disciplinary work that mixes both the
apply the most common concepts of NLP. finance and the marketing domain.
Key Takeaway: Key Takeaway:
This project will empower the learners to build The approach to this project is to think,
intermediate skills in the natural language define, design, code, test and tune your
processing domain. A few of the fundamentals of solution, in such a way that you apply all
working with textual data covered in this project are: aspects of the data science process. The
data is a real-world data with unclean and
Remove stop words null values.
Apply Stemming and Lemmatization Build the model to predict if a customer will
Create a cluster of words Identify influential factors to form marketing
Build a sentiment analysis model and a clustering model Improve long-term relationship with the clients
www.digitalvidya.com
Healthcare Analysis Deep Learning Based Project
Project Description: Project Description:
Electroencephalography (EEG) is an electro- E-Commerce has experienced considerable
physiological monitoring method to record the growth since the dawn of the internet as a
electrical activity of the brain. For this project, we commercial enterprise. Deep Learning excels at
will use the large EEG database at UCI Machine identifying patterns in unstructured data and
learning repository. This data arises from a large can predict the class of an uploaded image
study to examine EEG correlates of genetic applied on eCommerce context. This project is
predisposition to alcoholism. One fascinating an attempt to replicate virtual store assistance
question is whether the patterns are different for through image recognition over an eCommerce
an alcoholic and regular subject? Fashion MNIST dataset.
Key Takeaway: Key Takeaway:
This capstone project focuses on EEG data This project focuses on the implementation of
analysis, giving an opportunity for students to Neural Networks to solve complex unstructured
learn through complexities in dealing with such data problems. The objective is to:
complex real-world data. The project contains
the following exercises:
Build the model to classify the various categories
(analytic vertical) of clothing/fashion related
Parse and store in an easily understandable and
images.
readable form
Understanding the implementation of deep learning
Exploratory data analysis to better understand the
concepts through Tensorflow and Keras.
data
Model optimization by tuning hyper-parameters
Using Statistical concepts like Hypothetical testing
and implementing dropout layers.
Identify features to predict whether a subject is alco-
holic or not Duration: 3 Weeks
Price: ₹5000 (Including Tax)
Use machine learning algorithms to develop a
suitable classifier
www.digitalvidya.com
Tools Covered Placement Services
We partner with 10+ organizations who directly source their Data Science
Language: Python manpower needs from us. From resume creation to helping you crack the
final interview, our dedicated place-ment team is always on toes to
connect talent with the right opportunity.
Python is becoming the first choice for Data Scientists.
The learners will be learning to use all the relevant
libraries, NumPy, Pandas, scikit-learn, Matplotlib.
The Placement Process
Tool: Jupyter Notebook
An open-source web application that contains live
code, visualizations and narrative text. Learners will be The Candidates resume is refined and
using this for all their data science work. polished as per Market Standards to help
them be searchable.
Platform: Kaggle
Kaggle is an online community of data scientists and
machine learners, owned by Google. Learners will be
introduced and mentored to use the platform for
practice and competitions.
The Resume is shared with relevant
organisations by our placement team.
Language: R
R is a language and environment for statistical
computing and graphics. Learners will have the
opportunity to build skills for using R for data science.
Tool: RStudio The Candidates are prepared for an initial
quiz and a coding test.
RStudio provides open source and enterprise-ready
professional software for the R environment. Learners
will be using this editor for Data Science using R
assignments.
Tool: Tableau The Candidates resume is refined and
polished as per Market Standards to help
them be searchable.
Tableau is the analytics platform that disrupted the
world of business intelligence. Learners will be
introduced to this tool.
www.digitalvidya.com
What Makes us Proud?
“ ”
Good to see Digital Vidya becoming increasingly more involved in covering data science vertical,
look forward to collaborate with DV to help shape this industry.
- Naresh Mehta
AVP – Data Science & Analytics ,
“ ”
Yes, I like the huge investment Digital Vidya is doing to create the next generation of talent. Initial
feedback suggests Digital Vidya produces high-quality Data Analysts.
-Ajay Ohri
Data Scientist,
“ ”
I can see a good course structure and well-designed syllabus for those who are passionate
enough to enter into the analytics world. The platform helps people grow professionally and in
very less time.
-Madhu Vadlamani
Lead Analytics,
rthis Speak
“ ”
I was looking for customized content and I found the same in Digital Vidya. Content is structured
and well planned. Classes were very interactive and trainer’s presentation skills were very good.
People who are new to the subject can also understand clearly. Thank you so much!
-Vani Ananthamurthy
(Business Operations Senior Analyst, Accenture)
“ ”
This course gets you started from very basics, makes you think and solve the assignments, and
suddenly you find yourself doing Data Science all by yourself!
-Nanddeep Nasnodkar
(Sr. Software Developer - Remote Software Solutions)
www.digitalvidya.com
Interested? Contact Us! Duration
18 Weeks
+91-84680-02880
info@digitalvidya.com Fee
Rs. 34,900+GST
www.digitalvidya.com
Batch Options
Weekend