0% found this document useful (0 votes)
13 views

Lecture 1 - Introduction to Data Science

The document outlines a course on Data Science, emphasizing the importance of data analysis in extracting valuable information from large datasets across various sectors. It covers the data science pipeline, skills required, and specific applications in India, highlighting a significant skill gap in the field. The course includes practical components such as labs and a project focused on data visualization and analysis.

Uploaded by

gokulmohan4002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Lecture 1 - Introduction to Data Science

The document outlines a course on Data Science, emphasizing the importance of data analysis in extracting valuable information from large datasets across various sectors. It covers the data science pipeline, skills required, and specific applications in India, highlighting a significant skill gap in the field. The course includes practical components such as labs and a project focused on data visualization and analysis.

Uploaded by

gokulmohan4002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

EC2023E: Foundations of Data

Science
ANUP APREM
Big Data Phenomenon
• We are collecting and storing data at an unprecedented
rate.
• Examples: –
• YouTube, Facebook, MOOCs, news sites.
• Credit cards transactions and Amazon purchases.
• Transportation data (Google Maps, Waze, Uber)
• Gene expression data and protein interaction assays. –
• Maps and satellite data.
• Large hadron collider and surveying the sky.
• Phone call records and speech recognition results.
• Video game worlds and user actions.
Data Science
• What to do with all this data?
• Too much data to search through manually
• But there is valuable information in the data
• How can we use it for fun, profit, and/or greater good
• Process of extracting information from raw data is called data analysis.

• Why Data Analytics


• Understand mechanism of data generation
• Forecast response ➔ Machine learning model
• Data visualization: Communicate hidden information to business.
Data Science Pipeline
Classic Example: Consumer churn in wireless market
• Churn: Customers who switch from one wireless provider at the end
of contract.
• Incentives: Offered to a particular customer to stay in the contract
• Data extraction? History, Frequent callers
• Data cleaning? Missing address, age, occupation
• Data exploration & visualization? Difference in churn between
males and females, based on occupation, address
• Predictive model: Can we build a ML model that predicts whether
churn occurs?
Python and Data Science
• Generic Programming and Scripting
• Large Libraries for data analytics and predictive modelling
• Numpy, Pandas, Matplotlib

• Interface to databases
• SQL, NoSQL

• API programming, Web scraping


Data Science and India
• The NITI Aayog Indian recognizes data science as a technology which
can solve Indian needs across sectors such as: education and health.
• Coursera's 2021 Global Skills report, India ranks 66th globally in data
science, with an estimated 58% skill gap.
• Reports of 1 lac jobs vacancies in the data analytics domain.
• Increased focus on data analytics on healthcare, agriculture,
personalized education, smart cities and transportation
• This Course: Knowledge of data science and tools in Python
Data Science – Skills
• Python and Data Science (This course)
• Foundations of Machine Learning (EC2011E)
• Deep Learning (EC3057E)
• Artificial Intelligence (EC3051E)
• Application areas: Computer Vision (EC3055E), Autonomous
Intelligent Systems (EC3052E), Reinforcement Learning (EC3059E)
• Domain Knowledge
Course Outline
• Core Database concepts for Data science
• Structured, Unstructured, BigData
• Data Cleaning (Pandas)
• Data Pre-processing (Pandas)
• Data Visualization – Grammar of Graphics (Matplotlib, Seaborn)
• Exploratory data analysis
• Statistics for data science
Course Schedule
• Credit: 2-0-2-3
• Lectures (NLHC 301) (I reserve the right to use any 2 out of 3)
• Mon 5-6pm
• Thu 5-6pm
• Wed 8-9am (Used in lieu)

• Lab (IC Lab, ECED Block II)


• TA1, TA2, TB1, TB2 (20 students per slot) – Please give your choice in the
spreadsheet in Eduserver
• Individual lab (10 computers in IC Lab)
• Google Classroom for lab and submission
• Evaluation: 75% lab assessment, 25%
Course Evaluation
Evaluation Contribution
Midterm 25
Labs (Individual) 30
Course Project (including viva) 20
End Exam 25

Course Project: Identify a suitable data problem, obtain the dataset, create a
database and perform data visualization on the problem
Proposal due: One week after Midterm
Course Project due: Last but one week of class (one week for evaluation/viva)
Acknowledgement
• Couse developed in 2021 through British Council Going Global
Exploratory Grant in partnership with Oxford Brookes University, UK

You might also like