0% found this document useful (0 votes)
47 views33 pages

Data Science

Uploaded by

Janvi Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views33 pages

Data Science

Uploaded by

Janvi Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

DATA SCIENCE: TOOLS,

TECHNIQUES and APPLICATIONS

Dr. Meenakshi Srivastava


Dr. Ranjana Rajnish
Assistant Professor
Amity University
msrivastava@lko.amity.edu
What and Why ???
• WHAT is Data Science?

• WHY Data Science is Important?

• WHY Data Scientist are High in Demands?

• WHY Data Science : In Academia ?


Application of Data Science :
Some Examples
I. HEALTHCARE
• Survival analysis
– Analyze survival statistics for different patient attributes
(age, blood type, gender, etc) and treatments.
• Medication (dosage) effectiveness
– Analyze effects of admitting different types and dosage of
medication for a disease.
• Re-admission risk
– Predict risk of re-admittance based on patient attributes,
medical history, diagnose & treatment.
II. MARKETING
• Predicting Lifetime Value (LTV)
What for: if you can predict the
characteristics of high LTV customers, this
supports customer segmentation, identifies up
sell opportunities and supports other
marketing initiatives.
• Demand Forecasting
III. LOGISTICS
• How many of what thing Customer needs and
where will they need them?
(Enables lean inventory and prevents out of
stock situations.)
MOST IMPORTANT QUESTION
HOW DATA SCIENCE DO ALL THIS
What is Data Science ?

Data Science, is Broad Umbrella Term


whereby the Scientific Methods, Math,
Statistics etc are applied to Data sets in
order to extract KNOWLEDGE and
INSIGHT.
DATA SCIENCE : A MESH UP OF DISCIPLINES
• Another View
THE DATA SCIENCE UNICORN
• In medieval times, a Unicorn was a rare
and mythical creature with great
powers.
• In today’s world, a similar mythical
creature is a Data Science Unicorn,
who knows equally well the Technology,
Data Science, and Business.
• Such professional is a most valuable
resource of any data science team.
• Many data professionals are experts in
the first two areas – technology and
data science, but lack business/domain
skills.
You All Are OUR FUTURE UNICORN
How To Become A Data Science
UNICORN ?
Data Science UNICORN: Do Whatever Is
Necessary To Extract Value from the Data
• Statistics: Take a sample (data), answer questions about the process that
produced this sample Is it a normal distribution? Estimate it’s mean.

• Machine Learning: Take a sample(data), build a model to answer


questions about future samples.
– Given a sample of named faces, design a model for naming a new unseen face.

• Data Mining: mine huge data store for interesting patterns or


relationships.
– Given DB of transactions, apply tools and algorithms to find frequent product
bundles
Machine Learning
Machine Learning
refers to a computer’s
ability to learn from a
dataset and adapt
accordingly without
having been explicitly
programmed to do so.
Examples : Regression,
Decision Tree, Neural
Network etc.
Data Mining
• To most of people data mining goes
something like this: Tons of data is
collected, then quant wizards work
their arcane magic, and then they
know all of this amazing stuff.
• BUT WHAT THEY DO ?
• They can tell us that "
one of these things is not like the oth
er
“, or it can show us categories and
then sort things into pre-determined
categories/ class.
HOW TO DO ALL THIS ??
COMPUTATIONAL TOOLS
• With the help of existing computational tools
you all can very easily analyze your data.
• No Programming Skills Required.
• No in depth knowledge of Statistics, Machine
Learning, Data Mining etc is required.
Common Computational Tool
• Rapid Miner (Open Source and Free):
This is very popular since it is a readymade,
open source, no-coding required software, which
gives advanced analytics. Written in Java, it
incorporates multifaceted data mining functions
such as data preprocessing, visualization,
predictive analysis, and can be easily integrated
with WEKA and R-tool to directly give models
from scripts written in the former two.
• WEKA (Open Source & Free):
This is a JAVA based customization tool, which is
free to use. It includes visualization and
predictive analysis and modeling techniques,
clustering, association, regression and
classification.
• R-Programming Tool (Open Source and Free) :
This is written in C and FORTRAN, and allows the
data miners to write scripts just like a programming
language/platform. Hence, it is used to make
statistical and analytical software for data mining. It
supports graphical analysis, both linear and
nonlinear modeling, classification, clustering and
time-based data analysis.
• Python based Orange and NTLK:
Python is very popular due to ease of use and its
powerful features. Orange is an open source tool
that is written in Python with useful data
analytics, text analysis, and machine-learning
features embedded in a visual programming
interface. NTLK, also composed in Python, is a
powerful language processing data mining tool,
which consists of data mining, machine learning,
and data scraping features that can easily be
built up for customized needs.
• Rattle (Open source and FREE)
A rattle is a GUI tool that uses R
Stats programming language. Rattle exposes the
statistical power of R by providing considerable
data mining functionality. Although Rattle has an
extensive and well-developed UI. Also, it has an
inbuilt log code tab that generates duplicate
code for any activity happening at GUI.
• DataMelt (Availability: Open source and Free)

DataMelt, also known as DMelt is a computation


and visualization environment. Also, provides an
interactive framework to do data analysis and
visualization. It is designed mainly for engineers,
scientists & students.
How Computational Tools Work
• Have methods developed using Statistics,
Machine Learning and Data Mining are used.
• These pre-developed methods can be easily
applied on your data set.
• They provide you in build support for data
visualization.
What ALL I CAN DO WITH MY DATA ?

• Regression:
In statistics, regression is a classic technique to
identify the scalar relationship between two
or more variables by fitting the state line on
the variable values.
Cont…
• Classification:
This is a machine-learning technique used for labeling
the set of observations provided for training
examples. With this, we can classify the
observations into one or more labels. The likelihood
of sales, online fraud detection, and cancer
classification (for medical science) are common
applications of classification problems. Google Mail
uses this technique to classify e-mails as spam or
not.
• Clustering:
This technique is all about organizing similar items
into groups from the given collection of items.
User segmentation and image compression are the
most common applications of clustering. Market
segmentation, social network analysis, organizing
the computer clustering, and astronomical data
analysis are applications of clustering.
• Google News
Uses these techniques to group similar news items
into the same category.
Cont…
• Recommendation:
The recommendation algorithms are used in
recommender systems where these systems are
the most immediately recognizable machine
learning techniques in use today. Web content
recommendations may include similar websites,
blogs, videos, or related content. Also,
recommendation of online items can be helpful
for cross-selling and up-selling.
• Association Rules:

This data mining technique helps to find the


association between two or more Items. It
discovers a hidden pattern in the data set.
• Outlier Detection:
This type of data mining technique refers to
observation of data items in the dataset which
do not match an expected pattern or expected
behavior. This technique can be used in a variety
of domains, such as intrusion, detection, fraud
or fault detection, etc. Outer detection is also
called Outlier Analysis or Outlier mining.
• Prediction:
Prediction has used a combination of the other
data mining techniques like trends, sequential
patterns, clustering, classification, etc. It
analyzes past events or instances in a right
sequence for predicting a future event.
ADVANTAGES
 Use Computational Tools to predict the
behavior of your compound.
 Use Computational Tools to analyze the same
data with a different vision.
 Cos Cutting.
 Time Saving
 Very Clean perfect vision for your Research
QUESTIONS ?
THANKS

You might also like