0% found this document useful (0 votes)

15 views9 pages

ho

The document outlines the course structure and objectives for 'Introduction to Data Science' at the Birla Institute of Technology & Science, Pilani, detailing the course content, textbooks, and evaluation methods. It covers key topics such as data preprocessing, classification, prediction, and ethical considerations in data science. The course aims to provide students with a comprehensive understanding of data science applications in various fields and the necessary skills for data analysis and visualization.

Uploaded by

kamalesh p

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views9 pages

ho

Uploaded by

kamalesh p

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

WORK INTEGRATED LEARNING PROGRAMMES

Digital
Part A: Content Design
Course Title Introduction to Data Science

Course No(s)
Credit Units 5

Content Authors Ms. Seetha Parameswaran

Version 2.0c (May 24)

Date August 5th 2022

Course Objectives
No Course Objective

CO1 Gain basic understanding of the role of Data Science in various scenarios in the real-world
of business, industry and government.

CO2 Understand various roles and stages in a Data Science Project and ethical issues to
be considered.

CO3 Explore the processes, tools and technologies for collection and analysis of structured
and unstructured data.

CO4 Appreciate the importance of techniques like data visualization, storytelling with data
for the effective presentations of the outcomes with the stakeholders

CO5 Understand techniques of preparing real-world data for data analytics.

CO6 Implement data analytic techniques for discovering interesting patterns from data.
Text Book(s)
T1 Introduction to Data Mining, by Tan, Steinbach and Vipin Kumar 2nd Ed, Pearson
2021

T2 Introducing Data Science by Cielen, Meysman and Ali

T3 Storytelling with Data, A data visualization guide for business professionals, by

Cole Nussbaumer Knaflic; Wiley

T4 Data Mining: Concepts and Techniques, 4th Edition by Jiawei Han and others
Morgan Kaufmann Publishers, 2023

Reference Book(s) & other resources

R1 The Art of Data Science by Roger D Peng and Elizabeth Matsui
(https://bookdown.org/rdpeng/artofdatascience/)

R2 Ethics and Data Science by DJ Patil, Hilary Mason, Mike Loukides

R3 Python Data Science Handbook: Essential tools for working with data by Jake
VanderPlas

R4 KDD, SEMMA and CRISP-DM: A Parallel Overview , Ana Azevedo and M.F. Santos ,
IADS-DM, 2008

Content Structure
1 Fundamentals of Data Science (2 hrs)
1.1 Real World applications
1.2 Data Science Challenges
1.3 Data Science Teams and Roles
1.4 Data Science Process
a) CRISP-DM Methodology
b) SEMMA
c) BIG DATA LIFE CYCLE
d) SMAM
1.5 Software Engineering for Data Science
1.5.1 DataOps
1.5.2 MLOps

2. Data Quality and Data Infrastructure (2 hrs)

2.1. Types of Data and Datasets
2.2. Data Quality and Issues: An overview
2.3. Data Models
2.4. Data Pipelines and patterns;
2.5. Data Pipeline Stages
2.6 Modern Data Infrastructure
2.6.1 Diverse data sources
2.6.2 Cloud data warehouses and lakes
3. Data Preprocessing (6 hrs)
3.1 Data cleaning
3.2 Data Aggregation, Sampling,
3.3 Statistical descriptions of data
3.4 Measuring data similarity & dissimilarity
3.5. Handling Numeric Data
3.5.1 Discretization, Binarization
3.5.2 Normalization
3.5.3 Data Smoothening
3.6 Feature Engineering
3.7 Managing Categorical Attributes
3.7.1 Transforming Categorical to Numerical Values
3.7.2 Encoding techniques
3.8 Overview of visualization techniques for Data Exploratory analysis

4. Classification and Prediction (6 hrs)

4.1. Concepts of classification and prediction
4.2. Decision trees for classification - ID3 algorithm using entropy and Gini Index
4.3 Rule based classification
4.4. Feature Subset Selection Methods
4.4. Evaluation of classification algorithms
4.5 Prediction using Regression

5. Association Analysis (4 hrs)

5.1. Association analysis concepts
5.2. Apriori Algorithm for frequent itemsets
5.3 FP Growth for frequent itemsets
5.4. Mining association rules

6. Clustering (6 hrs)
6.1. Cluster analysis concepts.
6.2. Partitioning methods – k-Means algorithm
6.3. Hierarchical methods for cluster analysis
6.4. Density based methods for cluster analysis - DBSCAN
6.5. Evaluation of clustering algorithms

7. Anomaly Detection ( 2 hr)

7.1. Concepts of Outliers
7.2. Statistical approaches
7.3. Proximity and Density based outlier detection

8. Storytelling with Data (1 hr)

8.1. The final deliverable
8.2. The Narrative - report / presentation structure
8.3. Building narrative with Data
8.4. Effective storytelling
9. Ethics for Data Science ( 1 hr)
9.1. Bias and Fairness in Data
9.2 Being a data skeptic – examples of misuse of Data
9.3 Five C’s
9.4 Ethical guidelines for Data Scientist
9.5 Ethics of data scraping and storage
9.6 Case Study
Part B: Learning Plan
Academic Term

Course Title Introduction to Data Science

Course No
Lead Instructor

Sessio
n No. Topic Title Resource Reference

Introduction to Data Science

1
• Fundamentals of Data Science
• Real World applications
• Data Science Challenges T3 – Ch 1
• Data Science Teams and Roles T4 – Ch1
• Data Science Process T1 – Ch1
◦ CRISP-DM Methodology
Class Room Discussion
◦ SEMMA Class Notes
◦ BIG DATA LIFE CYCLE Additional Reading (AR) material provided
LMS
◦ SMAM
• Software Engineering for Data
Science
◦ DataOps
◦ MLOps (intro)

2 Data Quality and Data Infrastructure

• Types of Data and Datasets
T1 – Ch 2.1, 2.2
• Data Quality and Issues: An
overview
• Data Models R1 – Ch 2, Ch 7
• Data Pipelines and patterns
• Data Pipeline Stages Class room discussions
• Modern Data Infrastructure
◦ Diverse data sources
◦ Cloud data warehouses and
lakes

Data Preprocessing T1 – Ch2.3, 2.4

3-5
• Data cleaning T4 – Ch 2
• Data Aggregation, Sampling,
• Statistical descriptions of data
• Measuring data similarity &
dissimilarity
• Handling Numeric Data
◦ Discretization, Binarization
◦ Normalization
◦ Data Smoothening
• Feature Engineering
• Managing Categorical Attributes
◦ Transforming Categorical to
Numerical Values
◦ Encoding techniques
• Overview of visualization
techniques for Data Exploratory
analysis

Classification and Prediction (2 hrs)

6
• Concepts of classification and
prediction
• Decision trees for classification - T4 – Ch6.1, 6.2, 6.3
ID3 algorithm using entropy and T4 – 6.6, 6.7
Gini Index, Occam’s razor
• (Mutual Information and Gini Index are used
as Feature subset selection techniques. )

Classification and Prediction (2 hrs)

7 T4 – Ch6.1, 6.2, 6.3
• Rule Based Classification 6.6, 6.7
• Feature subset selection methods

Classification and Prediction (2 hrs)

8 T4 – Ch6.1, 6.2, 6.3
• Evaluation of classification 6.6, 6.7
algorithms Class Notes
• Prediction Approaches

Association Analysis (2 hrs)

9
• Association analysis concepts T1 – Ch 4
• Apriori Algorithm for frequent T4 – Ch 4
itemsets

Association Analysis (2 hrs)

10 T1 – Ch 4
• FP Growth for frequent itemsets T4 – Ch 4
• Mining association rules
Clustering
11
• Cluster analysis concepts. T1 – Ch 5
• Partitioning methods – k-Means T4 – Ch 8
algorithm

Clustering
12
• Density based methods for cluster T1 – Ch 5
analysis – DBSCAN T4 – Ch 8
• Hierarchical methods for cluster
analysis

Clustering
13 T1 – Ch 5
• Evaluation of clustering
algorithms

Anomaly Detection
14
• Concepts of Outliers
• Statistical approaches T1 – Ch 9
• Proximity and Density based T4 – Ch 11
outlier detection

Storytelling with Data

15
• The final deliverable
• The Narrative - report / T3 – Ch10
presentation structure
• Building narrative with Data
• Effective storytelling

Ethics for Data Science

16
• Bias and Fairness
◦ Types of Bias
◦ Identifying Bias
https://hbr.org/2013/04/the-hidden-biases-in-
◦ Evaluating Bias big-data
• Being a data skeptic – examples of
https://www.oreilly.com/data/free/files/being-a-
misuse of Data data-skeptic.pdf
• Five C’s
• Ethical guidelines for Data T4 – Ch12.4
R2 – Ch1, Ch3
Scientist
• Ethics of data scraping and
storage
• Case Study: IBM AI Fairness 360
(PS: Ethics for Data is the focus.)
Detailed Plan for Lab work

Lab Sheet Session

Lab No. Lab Objective Reference
Access URL
Introduction to Python, Numpy, Scipy, Python Pandas, 2
1
Data ingestion and extraction, data aggregation 3
2 techniques
Exploration and Visualizing using Matplotlib, Seaborn 4
3
Data pre-processing in Python - Discretization,
Binarization, Normalization, Data Smoothening, 5
4
Managing Categorical Attributes
Feature Engineering using Filter methods, wrapper 7
5 methods, PCA
Data pre-processing and Feature Engineering 8
6 techniques for text, images, audio, video
Decision trees for classification using Scikit learn 9
7
Association Analysis using Scikit learn 11
8
Clustering analysis by kmeans, hierarchical methods, 13
9 DBScan using Scikit learn

Evaluation Scheme:
Legend: EC = Evaluation Component; AN = After Noon Session; FN = Fore Noon Session

Name Type Duration Weight Day, Date, Session, Time

No
Quizzes Online 10%
EC-1(a)
Assignments Take Home 20%
EC-1(b)
Mid-Semester Test Closed Book 25%
EC-2
Comprehensive Exam Open Book 45%
EC-3

Note:
Syllabus for Mid-Semester Test (Closed Book): Topics in Session Nos. 1 to 8
Syllabus for Comprehensive Exam (Open Book): All topics (Session Nos. 1 to 16)
Important links and information:

Elearn portal: https://elearn.bits-pilani.ac.in or Canvas

Students are expected to visit the Elearn portal on a regular basis and stay up to date with
the latest announcements and deadlines.

Contact sessions: Students should attend the online lectures as per the schedule provided
on the Elearn portal.

Evaluation Guidelines:
1 EC-1 consists of two Quizzes. Students will attempt them through the course pages
on the Elearn portal. Announcements will be made on the portal, in a timely manner.
2 EC-2 consists of either one or two Assignments. Students will attempt them through
the course pages on the Elearn portal. Announcements will be made on the portal, in
a timely manner.
3 For Closed Book tests: No books or reference material of any kind will be permitted.
4 For Open Book exams: Use of books and any printed / written reference material
(filed or bound) is permitted. However, loose sheets of paper will not be allowed.
Use of calculators is permitted in all exams. Laptops/Mobiles of any kind are not
allowed. Exchange of any material is not allowed.
5 If a student is unable to appear for the Regular Test/Exam due to genuine exigencies,
the student should follow the procedure to apply for the Make-Up Test/Exam which
will be made available on the Elearn portal. The Make-Up Test/Exam will be
conducted only at selected exam centres on the dates to be announced later.

It shall be the responsibility of the individual student to be regular in maintaining the self-
study schedule as given in the course hand-out, attend the online lectures, and take all the
prescribed evaluation components such as Assignment/Quiz, Mid-Semester Test and
Comprehensive Exam according to the evaluation scheme provided in the hand-out.

Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
Data Science Training in Naresh I Technologies
100% (3)
Data Science Training in Naresh I Technologies
18 pages
Review Article: Deep Learning For Computer Vision: A Brief Review
No ratings yet
Review Article: Deep Learning For Computer Vision: A Brief Review
14 pages
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes Part A: Content Design
No ratings yet
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes Part A: Content Design
6 pages
Intro To Data-Science Final
No ratings yet
Intro To Data-Science Final
3 pages
Introduction to Data Science Course Outline
No ratings yet
Introduction to Data Science Course Outline
5 pages
Handout
No ratings yet
Handout
4 pages
BA ZG523 Introduction To Data Science
50% (2)
BA ZG523 Introduction To Data Science
12 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
Foundations of Data Science
No ratings yet
Foundations of Data Science
139 pages
UNIT I - Introduction - DataScience - New
No ratings yet
UNIT I - Introduction - DataScience - New
34 pages
Data Science 1
100% (3)
Data Science 1
133 pages
Data Science and Machine Learning Syllabus V1.0
No ratings yet
Data Science and Machine Learning Syllabus V1.0
6 pages
hammad raza.
No ratings yet
hammad raza.
28 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Part 1 Lectures
No ratings yet
Part 1 Lectures
100 pages
Data Science & Analytics: Course Code: CSE3105 Credits: 02 Credit Hours: 02/week Exam Hours: 03
No ratings yet
Data Science & Analytics: Course Code: CSE3105 Credits: 02 Credit Hours: 02/week Exam Hours: 03
2 pages
1 introduction-to-data-science
No ratings yet
1 introduction-to-data-science
43 pages
Lesson 1
No ratings yet
Lesson 1
37 pages
CS3352 - Foundations of Data Science
No ratings yet
CS3352 - Foundations of Data Science
142 pages
Data Science
No ratings yet
Data Science
15 pages
Data Mining and BI - Student Notes 2
No ratings yet
Data Mining and BI - Student Notes 2
40 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
DSI Detailed Syllabus v10.2
No ratings yet
DSI Detailed Syllabus v10.2
4 pages
Data Science Immersive Syllabus: Course
No ratings yet
Data Science Immersive Syllabus: Course
4 pages
UNIT _ Introduction_DataScience_new (1)
No ratings yet
UNIT _ Introduction_DataScience_new (1)
55 pages
CH1 Introduction To Data Science BS
No ratings yet
CH1 Introduction To Data Science BS
69 pages
Fds Question Bank
No ratings yet
Fds Question Bank
116 pages
Internship Report: T.J.Instituteoftechnology
No ratings yet
Internship Report: T.J.Instituteoftechnology
29 pages
DSBDA_UNIT1
No ratings yet
DSBDA_UNIT1
232 pages
FoDS MIDSEM Syllabus
No ratings yet
FoDS MIDSEM Syllabus
3 pages
Introduction To Data Science: Cpts 483-06 - Syllabus
No ratings yet
Introduction To Data Science: Cpts 483-06 - Syllabus
5 pages
DSP U1
No ratings yet
DSP U1
89 pages
1
No ratings yet
1
32 pages
DSP U2
No ratings yet
DSP U2
172 pages
DS Questionbank
No ratings yet
DS Questionbank
5 pages
Foundations of Data Science
No ratings yet
Foundations of Data Science
4 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
Mba ZG536 Course Handout
No ratings yet
Mba ZG536 Course Handout
7 pages
Datascience Notes
No ratings yet
Datascience Notes
161 pages
MAT8033 Lecture Slides (3)
No ratings yet
MAT8033 Lecture Slides (3)
62 pages
MAT8033 Lecture Slides
No ratings yet
MAT8033 Lecture Slides
29 pages
Data Science Topics
No ratings yet
Data Science Topics
7 pages
Introduction To Datascience (R20DS501)
No ratings yet
Introduction To Datascience (R20DS501)
162 pages
21css303t Datascience Unit 1 Notes (1)
No ratings yet
21css303t Datascience Unit 1 Notes (1)
246 pages
Kadir
No ratings yet
Kadir
84 pages
Ids PPT and PDF
No ratings yet
Ids PPT and PDF
493 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
Summer Term 2024 Course Handout: Date: 28.05.2024
No ratings yet
Summer Term 2024 Course Handout: Date: 28.05.2024
3 pages
GE 461 Introduction To Data Science: Spring 2021
No ratings yet
GE 461 Introduction To Data Science: Spring 2021
39 pages
Sem 6
No ratings yet
Sem 6
12 pages
Data Science
No ratings yet
Data Science
9 pages
CS3352 FDS
No ratings yet
CS3352 FDS
23 pages
Selected Topics - Datascience
No ratings yet
Selected Topics - Datascience
17 pages
Data Science Report - Compress
No ratings yet
Data Science Report - Compress
31 pages
CHO AI 105 - Data Analytics-As Shared
No ratings yet
CHO AI 105 - Data Analytics-As Shared
8 pages
EXPLORATORY DATA ANALYSIS WITH PYTHON
No ratings yet
EXPLORATORY DATA ANALYSIS WITH PYTHON
24 pages
Data Science 7th Sem AIML ITE Notes Complete LONG
No ratings yet
Data Science 7th Sem AIML ITE Notes Complete LONG
106 pages
Mastering Data Science: From Basics to Expert Proficiency
From Everand
Mastering Data Science: From Basics to Expert Proficiency
William Smith
No ratings yet
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
From Everand
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet
Unit III Deep Learning Chapter Notes
No ratings yet
Unit III Deep Learning Chapter Notes
23 pages
Ch17 Presn PDF
No ratings yet
Ch17 Presn PDF
29 pages
6 Code MLP Export
No ratings yet
6 Code MLP Export
2 pages
Radial Basis Function Neural Network RBFNN
No ratings yet
Radial Basis Function Neural Network RBFNN
14 pages
Deep Learning Book
100% (1)
Deep Learning Book
1,029 pages
M.E. Cse.
No ratings yet
M.E. Cse.
11 pages
Backpropagation and Resilient Propagation
No ratings yet
Backpropagation and Resilient Propagation
6 pages
Hands On Machine Learning with Scikit Learn and TensorFlow Concepts Tools and Techniques to Build Intelligent Systems 1st Edition by Aurelien Geron ISBN 1491962291 9781491962299 pdf download
100% (5)
Hands On Machine Learning with Scikit Learn and TensorFlow Concepts Tools and Techniques to Build Intelligent Systems 1st Edition by Aurelien Geron ISBN 1491962291 9781491962299 pdf download
75 pages
Nueral Network Mcqs
No ratings yet
Nueral Network Mcqs
6 pages
Ch5- Review Question
No ratings yet
Ch5- Review Question
3 pages
MLSP Exp04 60002200083
No ratings yet
MLSP Exp04 60002200083
5 pages
ML model set 2
No ratings yet
ML model set 2
2 pages
Question Bank
No ratings yet
Question Bank
2 pages
DOC-20250509-WA0027.
No ratings yet
DOC-20250509-WA0027.
34 pages
seminar final presentation
No ratings yet
seminar final presentation
13 pages
Be - Information Technology Engineering - Semester 7 - 2023 - May - Deep Learning DL Pattern 2019
No ratings yet
Be - Information Technology Engineering - Semester 7 - 2023 - May - Deep Learning DL Pattern 2019
2 pages
ML Project Report PDF
No ratings yet
ML Project Report PDF
26 pages
3 Month AI ML Roadmap Fixed
No ratings yet
3 Month AI ML Roadmap Fixed
4 pages
Association Rules 1. Data Yang Digunakan Adalah Sebagai Berikut
No ratings yet
Association Rules 1. Data Yang Digunakan Adalah Sebagai Berikut
7 pages
Deep Learning
No ratings yet
Deep Learning
24 pages
05 Attention Slides
No ratings yet
05 Attention Slides
69 pages
Write A Program For Imlementing Perceptron Learning Algorithm
100% (1)
Write A Program For Imlementing Perceptron Learning Algorithm
17 pages
Simplilearn Deep Learning
No ratings yet
Simplilearn Deep Learning
6 pages
Ensemble Interview Questions
No ratings yet
Ensemble Interview Questions
3 pages
Ugrd Cybs6101 Artificial Intelligence Fundamentals Final Exams
No ratings yet
Ugrd Cybs6101 Artificial Intelligence Fundamentals Final Exams
69 pages
MSU-Deep Learning
No ratings yet
MSU-Deep Learning
18 pages
ML Module 5
No ratings yet
ML Module 5
14 pages
ICS423 IoT syllabus
No ratings yet
ICS423 IoT syllabus
2 pages
Nasa Fy23 Ai Inventory CSV Final
No ratings yet
Nasa Fy23 Ai Inventory CSV Final
3 pages

ho

Uploaded by

ho

Uploaded by

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

WORK INTEGRATED LEARNING PROGRAMMES

Content Authors Ms. Seetha Parameswaran

Version 2.0c (May 24)

Date August 5th 2022

CO5 Understand techniques of preparing real-world data for data analytics.

T2 Introducing Data Science by Cielen, Meysman and Ali

T3 Storytelling with Data, A data visualization guide for business professionals, by

Reference Book(s) & other resources

R2 Ethics and Data Science by DJ Patil, Hilary Mason, Mike Loukides

2. Data Quality and Data Infrastructure (2 hrs)

4. Classification and Prediction (6 hrs)

5. Association Analysis (4 hrs)

7. Anomaly Detection ( 2 hr)

8. Storytelling with Data (1 hr)

Course Title Introduction to Data Science

Introduction to Data Science

2 Data Quality and Data Infrastructure

Data Preprocessing T1 – Ch2.3, 2.4

Classification and Prediction (2 hrs)

Classification and Prediction (2 hrs)

Classification and Prediction (2 hrs)

Association Analysis (2 hrs)

Association Analysis (2 hrs)

Storytelling with Data

Ethics for Data Science

Lab Sheet Session

Name Type Duration Weight Day, Date, Session, Time

Elearn portal: https://elearn.bits-pilani.ac.in or Canvas

You might also like