Data Science 5
Data Science 5
Data Science 5
MACHINE LEARNING
B. TECH
II YEAR – II SEM (Sec-A & B)
Academic Year 2022-23
Pre-requisite:
Database Management Systems, Data Structures
Course Objectives:
This course will enable students to:
• Know about the fundamental concepts and technologies of Data Science.
• Explore the various Data collection and storage methods.
• Understand the Data Analysis, statistics, and various machine learning algorithms.
• Investigate about the visualization of data and apply coding techniques to data for
securing the data.
• Study the Applications of Data Science, Technologies for visualization Handling of
variables using Python.
Textbooks:
1. Cathy O’Neil, Rachel Schutt, Doing Data Science, Straight Talk from the Frontline. O’Reilly,
2013.
2. Jure Leskovek, Anand Rajaraman, Jeffrey Ullman, Mining of Massive Datasets. v 2.1,
Cambridge University Press, 2014.
Reference Books:
1. Joel Grus, “Data Science from scratch”, O'Reilly, 2015.
2. Gupta, S.C. and Kapoor, V.K.: “Fundamentals of Mathematical Statistics”, Sultan &
Chand & Sons, New Delhi, 11th Ed, 2002.
3. Hastie, Trevor, et al. “The elements of Statistical Learning”, Springer, 2009.
4. Wes Mc Kinney, “Python for Data Analysis”, O'Reilly Media, 2012
Course Outcomes:
The student will be able to
• Identify the basic concepts of data science and identify the types of data.
• Analyse about how to collect the data, manage the data, explore the data, store the data.
• Implement the basic measures of central tendency and classify the data using SVM and
navie Bayesian.
• Interpret the visualization of data and apply coding techniques to data for securing the
data.
• Analyse the various concepts of data science and can be able to handle simple
applications of data science using python.
UNIT– V
Ø Applications of Data
Science
UNIT– V
Ø Technologies & Tools
for Visualisation
Ø Visualisation
Techniques
UNIT– V
Ø Recent Data
Collection &
Analysis
Ø DS Case study
DR. G. ARUN SAMPAUL THOMAS
Associate Professor & HOD – Department of AI&ML
J.B. Institute of Engineering and Technology
Hyderabad, Telangana
1
arunsam.infotech@gmail.com arunthomas.ai_ml@jbiet.edu.in
Data All Around
2
What To Do With These Data?
3
Statistical and Critical Thinking
Analyzing Data: Potential Pitfalls
• Misleading Conclusions
When forming a conclusion based on a statistical analysis, we should make statements that are clear
even to those who have no understanding of statistics and its terminology.
• Sample Data Reported Instead of Measured
When collecting data from people, it is better to take measurements yourself instead of asking
subjects to report results.
• Loaded Questions
If survey results are not worded carefully, the results of a study can be misleading.
• Order of Questions
Sometimes survey questions are unintentionally loaded by the order of the items being considered.
• Nonresponse
A nonresponse occurs when someone either refuses to respond or is unavailable.
• Percentages
Some studies cite misleading percentages. Note that 100% of some quantity is all of it, but if there
are references made to percentages that exceed 100%, such references are often not justified.
5
Types of Data, Key Concept
A major use of statistics is to collect and use sample data to make conclusions
about populations.
• Parameter
a numerical measurement describing some
characteristic of a population
• Statistic
a numerical measurement describing some
characteristic of a sample
1
0
Types of Data
1
2
Types of Data, Quantitative Data
Data
Qualitative Quantitative
Categorical Numerical,
Can be ranked
Discrete Continuous
Countable Can be decimals
5, 29, 8000, etc. 2.59, 312.1, etc.
1
3
Types of Data, Levels of Measurement:
Another way of classifying data: 4 levels of measurement: nominal, ordinal, interval, and ratio.
(F)
3
Example 2:
4
Example 3:
Parameter or Statistic?
Statistic
Parameter
5
Example 4:
Discrete or Continuous?
Continuous
Discrete
6
Example 5:
Determine the measurement level.
Nominal
Ratio
Ordinal
Interval
7
Example 6:
Determine the measurement level & what’s wrong with the conclusion?
8
Structured vs Unstructured
https://www.youtube.com/watch?v=WBU7sW1jy2o
Big Data & Data Science
Not
78
Data Science Applications
79
Data Science: Case Study
Cancer Research
• Cancer is an incredibly complex disease; a single tumor can have
more than 100 billion cells, and each cell can acquire mutations
individually. The disease is always changing, evolving, and adapting.
• Employ the power of big data analytics and high-performance
computing.
• Leverage sophisticated pattern and machine learning algorithms to
identify patterns that are potentially linked to cancer
• Huge amount of data processing and recognition
80
Data Science: Case Study
Health Care
http://med.stanford.edu/news/all-news/2016/08/stanford-medicine-google-team-up-to-harness-power-of-data-science.html 81
Data Science: Case Study
Elections
• The Obama campaigns in 2008 and 2012 are credited for their
successful use of social media and data mining.
• Micro-targeting in 2012
– http://www.theatlantic.com/politics/archive/2012/04/the-
creepiness-factor-how-obama-and-romney-are-getting-to-know-
you/255499/
– http://www.mediabizbloggers.com/group-m/How-Data-and-Micro-
Targeting-Won-the-2012-Election-for-Obama---Antony-Young-
Mindshare-North-America.html
• Micro-profiles built from multiple sources accessed by aps, real-
time updating data based on door-to-door visits, focused media
buys, e-mails and Facebook messages highly targeted.
• 1 million people installed the Obama Facebook app that gave
access to info on “friends”.
22
Data Science: Case Study
Internet of Things (IoT)
• The Internet of Things is rapidly growing. It is predicted that more than 25 billion devices
will be connected by 2020.
• The Internet of Things (IOT) will soon produce a massive volume and variety of data at
unprecedented velocity. If "Big Data" is the product of the IOT, "Data Science" is it's
soul. 23
Data Science: Case Study
Customer Analytics
84
Case Study - How Recommender Systems Work
(Netflix/Amazon)
https://www.youtube.com/watch?v=n3RKsY2H-NE
INTRODUCTION TO DATA SCIENCE
UNIT– V
Ø Apps, Business
Development in
DS
Ø Case Study
DR. G. ARUN SAMPAUL THOMAS
Associate Professor & HOD – Department of AI&ML
J.B. Institute of Engineering and Technology
Hyderabad, Telangana
1
arunsam.infotech@gmail.com arunthomas.ai_ml@jbiet.edu.in
INTRODUCTION TO DATA SCIENCE
UNIT– V
Supplementary
Notes
Ø DS Case Study - How
People Leveraging
Chat GPT
DR. G. ARUN SAMPAUL THOMAS
Associate Professor & HOD – Department of AI&ML
J.B. Institute of Engineering and Technology
Hyderabad, Telangana
1
arunsam.infotech@gmail.com arunthomas.ai_ml@jbiet.edu.in