Data Science
Introduction to Data Science
Copyright IntelliPaat. All rights reserved
Agenda
01 Need of Data Science 02 Life Cycle of Data Science
03 Applications of Data Science 04 Introduction to R
05 Introduction to R-Studio 06 Data Types & Operators in R
Copyright IntelliPaat. All rights reserved
Data
What is Data?
Copyright IntelliPaat. All rights reserved
Data
Well, it’s just a -0.879
“My name
collection of facts! is Sam”
348
0 1
𝐀= 𝛑𝐫 𝟐
1 0
Copyright IntelliPaat. All rights reserved
Data Back Then
Small Sized
Structured
Single Format
Copyright IntelliPaat. All rights reserved
Data Today
Unstructured
M ultiple Formats
Humongous Sized
Copyright IntelliPaat. All rights reserved
Need of Data Science
Now, what should I
do with all of this
huge
unstructured
data?
The need to understand and analyze data to make better decisions is what gave birth to Data
Science.
Copyright IntelliPaat. All rights reserved
Need of Data Science
Understand the data
Find interesting insights
Make informed decisions
Copyright IntelliPaat. All rights reserved
What is Data Science?
Applying science on data to make the data talk to us
Data
Data Science
Science
Copyright IntelliPaat. All rights reserved
What is Data Science?
Data Science is an umbrella term which encompasses multiple domains.
Data Data Statistical M achine
Visualization M anipulation Analysis Learning
Copyright IntelliPaat. All rights reserved
Types of Data Analytics
Prescriptive
Comprehensive, accurate and effective visualization
Predictive
Prescriptive
Ability to drill down to the root cause
Diagnostic Diagnostic
Historical patterns being used to predict specific outcomes using
algorithms Predictive Descriptive
Descriptive
Applying advanced analytical algorithms to make specific
recommendations and strategies
Copyright IntelliPaat. All rights reserved
Life Cycle of Data Science
Copyright IntelliPaat, All rights reserved
Life Cycle of Data Science
Knowledge
Representation
Pattern Evaluation
Model Building
Data Preprocessing
Data Acquisition
Copyright IntelliPaat. All rights reserved
Data Acquisition
Data comes from multiple sources and is present in multiple formats. This data has to be integrated and
stored in one single location.
Data from multiple
Data Warehouse Target Data
sources
Copyright IntelliPaat. All rights reserved
Data Preprocessing
Once data acquisition is done, the raw data has to be processed to bring it to the right format.
Summarize
Transform Normalize
Aggregat
e
Copyright IntelliPaat. All rights reserved
Model Building
Model building is the process where we apply different scientific algorithms to find interesting insights
from the data.
Linear Random
K-Means
Regression Forest
Copyright IntelliPaat. All rights reserved
Pattern Evaluation
The model gives us some patterns/information. These patterns have to be evaluated, i.e., here we have
to evaluate whether the obtained information is new, correct and useful.
Model Pattern Evaluation
Copyright IntelliPaat. All rights reserved
Knowledge Representation
Once the information is validated, it can be represented with simple aesthetic graphs.
Copyright IntelliPaat. All rights reserved
Application of Data Science
in Different Industries
Copyright IntelliPaat, All rights reserved
Application of Data Science in Telecom
Analytical Customer Relationship Management
(ACRM)
Fraud Reduction
Bad Debt Reduction
Price Optimization
Copyright IntelliPaat. All rights reserved
Application of Data Science in Banking
Acquire and Retain Customers
Detect Fraud
Improve Risk Control
Optimize Product and Portfolio Model
Copyright IntelliPaat. All rights reserved
Application of Data Science in E-commerce
Enhance Customer Engagement
Customize Offers and Promotions
Maintain Effective Supply Chain Management
Improve User Experience
Copyright IntelliPaat. All rights reserved
Introduction to R
Copyright IntelliPaat, All rights reserved
Introduction to R
R is a language for data analysis and statistical analysis.
Copyright IntelliPaat. All rights reserved
Introduction to R
R is a visualization tool.
Copyright IntelliPaat. All rights reserved
Introduction to R
R is an open-source, cross-platform compatible software.
Copyright IntelliPaat. All rights reserved
Introduction to R
R is a Turing complete language.
Copyright IntelliPaat. All rights reserved
Installing R
Copyright IntelliPaat, All rights reserved
Installing R
You can install R from https://cran.r-project.org/
Copyright IntelliPaat. All rights reserved
R-Studio
Copyright IntelliPaat, All rights reserved
R-Studio
R-Studio is a set of integrated tools designed to help you be more productive with R. It includes a console,
syntax-highlighting editor that supports direct code execution and a variety of robust tools for plotting, viewing
history, debugging and managing your workspace.
Copyright IntelliPaat. All rights reserved
Setting Working Directory
Change your working directory with the setwd() function, such as:
setwd("~/mydirectory")
Note that slashes always have to be forward slashes, even if you're on a Windows system.
For Windows, the command might look something like: setwd("C:/Sham/Documents/RProjects")
Copyright IntelliPaat. All rights reserved
Customizing R-Studio
R-Studio options are accessible from the Options dialog, Tools > Options menu (R-Studio > Preferences on a Mac)
and include the following categories:
Default CRAN mirror, initial working directory, workspace and history
General R Options
behavior
Enable/disable line numbers, selected word and line highlighting, soft-
Source Code Editing wrapping for R files, parent matching, right margin display, console syntax
highlighting, configure tab spacing and set default text encoding
Appearance & Themes Specify the font size and visual theme for the console and source editor
Locations of console, source editor and tab panes; set which tabs are
Pane Layout
included in each pane
Copyright IntelliPaat. All rights reserved
Customizing R-Studio
Packages Set default CRAN repository and specify package development options
Sweave Configure Sweave compiling options and PDF previewing
Spelling Choose main dictionary language and specify spell checking options
Configure locations of Git and SVN binaries and create and/or view SSH
Git/SVN
RSA keys
Publishing Enable publishing apps and documents from IDE and set account
Copyright IntelliPaat. All rights reserved
R-Studio GUI
Copyright IntelliPaat, All rights reserved
R-Studio GUI
Copyright IntelliPaat. All rights reserved
R-Studio GUI
In the top left corner of the screen, one can see a script editor window. Within this pane, one can edit his or her R
script.
Script
Window
Copyright IntelliPaat. All rights reserved
R-Studio GUI
Results of the script execution, together with the script lines that generated these results, will be displayed in the Console
window located in the bottom left corner of the screen.
Console
Window
Copyright IntelliPaat. All rights reserved
R-Studio GUI
The top right pane of the screen provides information about the variables and data structures used or generated by the
script. This is the so-called “Environment” window.
Environment
Window
Copyright IntelliPaat. All rights reserved
R-Studio GUI
The window on the bottom right corner of the screen shows information about the files and packages used by the
project and allows one to view plots (or visualizations) generated by R and also access help for various elements of R
syntax.
Plots/Help
Copyright IntelliPaat. All rights reserved
R Packages
Copyright IntelliPaat, All rights reserved
R Packages
Packages are collections of R functions, data and compiled code in a well-defined format.
The directory where packages are stored is called the library.
R comes with a standard set of packages. Others are available for download and installation.
R Package Function
.libPaths() # Get library location
library() # See all packages installed
search() # See packages currently loaded
detach(“package:pkg”) # Unload the loaded package
Install.packages(“package”) # Install the package
library(“package”) # Load the package
library(help= “package”) # List package contents
Copyright IntelliPaat. All rights reserved
Steps to Install R Packages
1 Run R-Studio
Click on the Packages tab in the bottom-right section and then click on
2 Install. The following dialog box will appear.
In the Install Packages dialog, write the package name you want to install
under the Packages field and then click Install. This will install the package
3 you searched for or give you a list of matching packages based on your
package text.
Copyright IntelliPaat. All rights reserved
Getting Help with R
Copyright IntelliPaat, All rights reserved
Getting Help with R
The help() function and ? help operator in R provide access to the documentation pages for R functions, data
sets and other objects, both for packages in the standard R distribution and for contributed packages.
The help() function can be used to access information about a package in your library—for
example, help(package="MASS")—which displays an index of available help pages for the package, along
with some other information.
Help Command Function
help.start () # General help
help(lm) # Help about function lm
example(lm) # Show an example of function lm
help(package) # List help page for “package”
?package # short form for “help(package)”
Copyright IntelliPaat. All rights reserved
Variables in R
Copyright IntelliPaat, All rights reserved
Variables in R
A variable is a temporary storage space where you can keep changing values.
Variable Variable Variable
Copyright IntelliPaat. All rights reserved
Data Types in R
Copyright IntelliPaat, All rights reserved
Data Types in R
Every variable is associated with a data type.
“hello
5 1ooo “z” TRUE FALSE 30-2i 2+5i
world”
“This is
−33 −0.45i
Sparta”
Numeric Character Logical Complex
Copyright IntelliPaat. All rights reserved
Operators in R
Copyright IntelliPaat, All rights reserved
Operators in R
Operators help in performing certain manipulations on top of the data and variables.
Assignment Operators
Arithmetic Operators
Relational Operators
Logical Operators
Copyright IntelliPaat. All rights reserved
Assignment Operators
Assignment operators are used to assign a value to an object.
Operators
<- ->
Example
x = 10 y <- 20 30 -> z
Copyright IntelliPaat. All rights reserved
Arithmetic Operators
Arithmetic operators are used to perform basic mathematical operations.
+ Addition
− Subtraction
* Multiplication
/ Division
Copyright IntelliPaat. All rights reserved
Relational Operators
Relational operators are used to test/define a relationship between two operands.
< Less than
Less than or equal
<=
to
> Greater than
Greater than or
>=
equal to
== Is equal to
!= Not equal to
Copyright IntelliPaat. All rights reserved
Logical Operators - AND
Logical operators are used to make a decision on the basis of a condition.
& AND
FALSE
+ FALSE FALSE
FALSE
+ TRUE FALSE
TRUE
+ FALSE FALSE
TRUE
+ TRUE TRUE
Copyright IntelliPaat. All rights reserved
Logical Operators - OR
Logical operators are used to make a decision on the basis of a condition
| OR
FALSE
+ FALSE FALSE
FALSE
+ TRUE TRUE
TRUE
+ FALSE TRUE
TRUE
+ TRUE TRUE
Copyright IntelliPaat. All rights reserved
Project-based Data Science
Course
Copyright IntelliPaat, All rights reserved
Data Science Project
We’ll learn Data
Science with this
“customer churn”
dataset.
Copyright IntelliPaat. All rights reserved
Problem Statement
You are the Data Scientist at a telecom company “Neo” whose customers are churning out to its
competitors. You have to analyse the data of your company and find insights.
I’ll analyse my
company’s data
completely to find
why customers
are churning out.
Neo
Copyright IntelliPaat. All rights reserved
Tasks to be Performed
1
Data
Manipulation
Find out hidden patterns in the “customer_churn” dataset by
using apply family of functions and dplyr package I’ll start off by
manipulating the
data.
Copyright IntelliPaat. All rights reserved
Tasks to be Performed
2
Data
Visualization
Represent the data with graphs by using ggplot2 package I’ll depict the data
pictorially to get a
better
understanding.
Copyright IntelliPaat. All rights reserved
Tasks to be Performed
3
Linear
Regression
I’ll build a linear
regression
Understand how the Monthly Charges of the customers vary algorithm on top of
with respect to other factors the
“customer_churn”
data.
Copyright IntelliPaat. All rights reserved
Tasks to be Performed
4
Logistic Regression
I’ll build a logistic
Get the probability of customers churning out with respect to other regression
factors algorithm on top
of the
‘customer_churn’
data.
Copyright IntelliPaat. All rights reserved
Tasks to be Performed
5
Decision Tree &
Random Forest
Classify whether the customer will churn or not on the basis of
other factors I’ll build decision
tree and random
forest algorithms.
Copyright IntelliPaat. All rights reserved
Tasks to be Performed
6
Clustering
Divide the customers into different clusters with k-means I’ll build k-means
clustering on top of the
“customer_churn”
dataset.
Copyright IntelliPaat. All rights reserved
Individual Modules
Copyright IntelliPaat, All rights reserved
Individual Modules
Individual Modules which are not based on ‘customer_churn’ dataset
Market Basket Analysis Recommendation
Engine
Time Series Deep Learning
Copyright IntelliPaat. All rights reserved
Quiz
Copyright IntelliPaat, All rights reserved
Quiz
Which of the following is not a type of analytics?
a. Descriptive
b. Business Intelligence
c. Predictive
d. Prescriptive
e. None of the above
Copyright IntelliPaat. All rights reserved
Quiz
Which of the following is not a type of analytics?
Solution:
b. Business Intelligence
Copyright IntelliPaat. All rights reserved
Quiz
Which of the following are the 4 Vs or dimensions of Big Data ?
a. Volume, Velocity, Variable & Vacuum
b. Volume, Velocity, Variety & Veracity
c. Volume, Vaccine, Variety & Variable
d. All of the above
Copyright IntelliPaat. All rights reserved
Quiz
Which of the following are the 4 Vs or dimensions of Big Data ?
Solution:
b. Volume, Velocity, Variety & Veracity
Copyright IntelliPaat. All rights reserved
Thank You
Copyright IntelliPaat. All rights reserved
India : +91-7847955955
US : 1-800-216-8930 (TOLL FREE)
sales@intellipaat.com
24/7 Chat with Our Course Advisor
Copyright IntelliPaat. All rights reserved