Data Science

Uploaded by

Janvi Soni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views33 pages

Data Science

Uploaded by

Janvi Soni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

DATA SCIENCE: TOOLS,

TECHNIQUES and APPLICATIONS

Dr. Meenakshi Srivastava

Dr. Ranjana Rajnish
Assistant Professor
Amity University
msrivastava@lko.amity.edu
What and Why ???
• WHAT is Data Science?

• WHY Data Science is Important?

• WHY Data Scientist are High in Demands?

• WHY Data Science : In Academia ?

Application of Data Science :
Some Examples
I. HEALTHCARE
• Survival analysis
– Analyze survival statistics for different patient attributes
(age, blood type, gender, etc) and treatments.
• Medication (dosage) effectiveness
– Analyze effects of admitting different types and dosage of
medication for a disease.
• Re-admission risk
– Predict risk of re-admittance based on patient attributes,
medical history, diagnose & treatment.
II. MARKETING
• Predicting Lifetime Value (LTV)
What for: if you can predict the
characteristics of high LTV customers, this
supports customer segmentation, identifies up
sell opportunities and supports other
marketing initiatives.
• Demand Forecasting
III. LOGISTICS
• How many of what thing Customer needs and
where will they need them?
(Enables lean inventory and prevents out of
stock situations.)
MOST IMPORTANT QUESTION
HOW DATA SCIENCE DO ALL THIS
What is Data Science ?

Data Science, is Broad Umbrella Term

whereby the Scientific Methods, Math,
Statistics etc are applied to Data sets in
order to extract KNOWLEDGE and
INSIGHT.
DATA SCIENCE : A MESH UP OF DISCIPLINES
• Another View
THE DATA SCIENCE UNICORN
• In medieval times, a Unicorn was a rare
and mythical creature with great
powers.
• In today’s world, a similar mythical
creature is a Data Science Unicorn,
who knows equally well the Technology,
Data Science, and Business.
• Such professional is a most valuable
resource of any data science team.
• Many data professionals are experts in
the first two areas – technology and
data science, but lack business/domain
skills.
You All Are OUR FUTURE UNICORN
How To Become A Data Science
UNICORN ?
Data Science UNICORN: Do Whatever Is
Necessary To Extract Value from the Data
• Statistics: Take a sample (data), answer questions about the process that
produced this sample Is it a normal distribution? Estimate it’s mean.

• Machine Learning: Take a sample(data), build a model to answer

questions about future samples.
– Given a sample of named faces, design a model for naming a new unseen face.

• Data Mining: mine huge data store for interesting patterns or

relationships.
– Given DB of transactions, apply tools and algorithms to find frequent product
bundles
Machine Learning
Machine Learning
refers to a computer’s
ability to learn from a
dataset and adapt
accordingly without
having been explicitly
programmed to do so.
Examples : Regression,
Decision Tree, Neural
Network etc.
Data Mining
• To most of people data mining goes
something like this: Tons of data is
collected, then quant wizards work
their arcane magic, and then they
know all of this amazing stuff.
• BUT WHAT THEY DO ?
• They can tell us that "
one of these things is not like the oth
er
“, or it can show us categories and
then sort things into pre-determined
categories/ class.
HOW TO DO ALL THIS ??
COMPUTATIONAL TOOLS
• With the help of existing computational tools
you all can very easily analyze your data.
• No Programming Skills Required.
• No in depth knowledge of Statistics, Machine
Learning, Data Mining etc is required.
Common Computational Tool
• Rapid Miner (Open Source and Free):
This is very popular since it is a readymade,
open source, no-coding required software, which
gives advanced analytics. Written in Java, it
incorporates multifaceted data mining functions
such as data preprocessing, visualization,
predictive analysis, and can be easily integrated
with WEKA and R-tool to directly give models
from scripts written in the former two.
• WEKA (Open Source & Free):
This is a JAVA based customization tool, which is
free to use. It includes visualization and
predictive analysis and modeling techniques,
clustering, association, regression and
classification.
• R-Programming Tool (Open Source and Free) :
This is written in C and FORTRAN, and allows the
data miners to write scripts just like a programming
language/platform. Hence, it is used to make
statistical and analytical software for data mining. It
supports graphical analysis, both linear and
nonlinear modeling, classification, clustering and
time-based data analysis.
• Python based Orange and NTLK:
Python is very popular due to ease of use and its
powerful features. Orange is an open source tool
that is written in Python with useful data
analytics, text analysis, and machine-learning
features embedded in a visual programming
interface. NTLK, also composed in Python, is a
powerful language processing data mining tool,
which consists of data mining, machine learning,
and data scraping features that can easily be
built up for customized needs.
• Rattle (Open source and FREE)
A rattle is a GUI tool that uses R
Stats programming language. Rattle exposes the
statistical power of R by providing considerable
data mining functionality. Although Rattle has an
extensive and well-developed UI. Also, it has an
inbuilt log code tab that generates duplicate
code for any activity happening at GUI.
• DataMelt (Availability: Open source and Free)

DataMelt, also known as DMelt is a computation

and visualization environment. Also, provides an
interactive framework to do data analysis and
visualization. It is designed mainly for engineers,
scientists & students.
How Computational Tools Work
• Have methods developed using Statistics,
Machine Learning and Data Mining are used.
• These pre-developed methods can be easily
applied on your data set.
• They provide you in build support for data
visualization.
What ALL I CAN DO WITH MY DATA ?

• Regression:
In statistics, regression is a classic technique to
identify the scalar relationship between two
or more variables by fitting the state line on
the variable values.
Cont…
• Classification:
This is a machine-learning technique used for labeling
the set of observations provided for training
examples. With this, we can classify the
observations into one or more labels. The likelihood
of sales, online fraud detection, and cancer
classification (for medical science) are common
applications of classification problems. Google Mail
uses this technique to classify e-mails as spam or
not.
• Clustering:
This technique is all about organizing similar items
into groups from the given collection of items.
User segmentation and image compression are the
most common applications of clustering. Market
segmentation, social network analysis, organizing
the computer clustering, and astronomical data
analysis are applications of clustering.
• Google News
Uses these techniques to group similar news items
into the same category.
Cont…
• Recommendation:
The recommendation algorithms are used in
recommender systems where these systems are
the most immediately recognizable machine
learning techniques in use today. Web content
recommendations may include similar websites,
blogs, videos, or related content. Also,
recommendation of online items can be helpful
for cross-selling and up-selling.
• Association Rules:

This data mining technique helps to find the

association between two or more Items. It
discovers a hidden pattern in the data set.
• Outlier Detection:
This type of data mining technique refers to
observation of data items in the dataset which
do not match an expected pattern or expected
behavior. This technique can be used in a variety
of domains, such as intrusion, detection, fraud
or fault detection, etc. Outer detection is also
called Outlier Analysis or Outlier mining.
• Prediction:
Prediction has used a combination of the other
data mining techniques like trends, sequential
patterns, clustering, classification, etc. It
analyzes past events or instances in a right
sequence for predicting a future event.
ADVANTAGES
 Use Computational Tools to predict the
behavior of your compound.
 Use Computational Tools to analyze the same
data with a different vision.
 Cos Cutting.
 Time Saving
 Very Clean perfect vision for your Research
QUESTIONS ?
THANKS

Essential Data Science Notes - A Concise PDF Guide
No ratings yet
Essential Data Science Notes - A Concise PDF Guide
20 pages
Data Whare House PDF
No ratings yet
Data Whare House PDF
51 pages
Module 1
No ratings yet
Module 1
192 pages
Data Science in IOT
No ratings yet
Data Science in IOT
220 pages
Introduction Am
No ratings yet
Introduction Am
74 pages
DS Module 1
No ratings yet
DS Module 1
112 pages
Lesson - 1 Course Introduction Data Science Foundation
100% (1)
Lesson - 1 Course Introduction Data Science Foundation
26 pages
Crash Course_Introduction to Data Science
No ratings yet
Crash Course_Introduction to Data Science
121 pages
Data Science
No ratings yet
Data Science
132 pages
Lecture 1 Data Mining
No ratings yet
Lecture 1 Data Mining
51 pages
Data Science Foundation
No ratings yet
Data Science Foundation
51 pages
Unit 2 Data Science
No ratings yet
Unit 2 Data Science
53 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Datascience Notes
No ratings yet
Datascience Notes
161 pages
DATA SCIENCE
No ratings yet
DATA SCIENCE
49 pages
ds sem
No ratings yet
ds sem
71 pages
DS Mod 1 To 2 Complete Notes
No ratings yet
DS Mod 1 To 2 Complete Notes
63 pages
Introduction of Data Science
No ratings yet
Introduction of Data Science
15 pages
Chapter 01 2
No ratings yet
Chapter 01 2
19 pages
MAT8033 Lecture Slides (3)
No ratings yet
MAT8033 Lecture Slides (3)
62 pages
JobRecord MUHAMMAD NAEEM f70a3eba Db3d 11ef a12f 96f32f87411b
No ratings yet
JobRecord MUHAMMAD NAEEM f70a3eba Db3d 11ef a12f 96f32f87411b
63 pages
L1 - Introduction To Data Science
No ratings yet
L1 - Introduction To Data Science
33 pages
Data Mining and BI - Student Notes 2
No ratings yet
Data Mining and BI - Student Notes 2
40 pages
Data Science
No ratings yet
Data Science
59 pages
Unit I
No ratings yet
Unit I
52 pages
Data Analytics PDF
0% (1)
Data Analytics PDF
6 pages
Unit 1 Full Notes
No ratings yet
Unit 1 Full Notes
52 pages
MAT8033 Lecture Slides
No ratings yet
MAT8033 Lecture Slides
29 pages
Industrial Training Report
No ratings yet
Industrial Training Report
24 pages
Session1-DataCharacteristics
No ratings yet
Session1-DataCharacteristics
41 pages
Machine Learning Unit-1.1
No ratings yet
Machine Learning Unit-1.1
43 pages
slidesgo-unlocking-insights-the-power-of-data-science-and-machine-learning-20241121074638h5ME
No ratings yet
slidesgo-unlocking-insights-the-power-of-data-science-and-machine-learning-20241121074638h5ME
14 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
11 pages
DS3 Data Science Introduction
No ratings yet
DS3 Data Science Introduction
18 pages
DataScientistTrack PDF
No ratings yet
DataScientistTrack PDF
5 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
24 pages
Unit 1
No ratings yet
Unit 1
21 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Report Print
No ratings yet
Report Print
22 pages
DA-1,2,3[1]_merged
No ratings yet
DA-1,2,3[1]_merged
39 pages
Basics of Data Science KPK
No ratings yet
Basics of Data Science KPK
38 pages
Project Report
No ratings yet
Project Report
29 pages
Selected Topics - Datascience
No ratings yet
Selected Topics - Datascience
17 pages
Question Bank Syllbuswise
No ratings yet
Question Bank Syllbuswise
16 pages
EXPLORATORY DATA ANALYSIS WITH PYTHON
No ratings yet
EXPLORATORY DATA ANALYSIS WITH PYTHON
24 pages
Seminar On Data Science
100% (7)
Seminar On Data Science
25 pages
6th Sem Cse Data Science Analytics SM o
No ratings yet
6th Sem Cse Data Science Analytics SM o
40 pages
Machine Learning Unit-1.1
No ratings yet
Machine Learning Unit-1.1
29 pages
01. Introduction
No ratings yet
01. Introduction
20 pages
Kadir
No ratings yet
Kadir
84 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
Fda 1
No ratings yet
Fda 1
5 pages
Question 1 (1)
No ratings yet
Question 1 (1)
5 pages
Data Science Modern Technology5
No ratings yet
Data Science Modern Technology5
6 pages
BCA Lecture I
No ratings yet
BCA Lecture I
20 pages
A Review On Data Science Technologies
No ratings yet
A Review On Data Science Technologies
3 pages
The Field of Data Science
No ratings yet
The Field of Data Science
4 pages
ML & AI-Introduction To Data-Science Tools
No ratings yet
ML & AI-Introduction To Data-Science Tools
7 pages
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
No ratings yet
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
4 pages