We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 2
B.Sc Computer Science (Artificial Intelligence and Data Science) 2021-2022
Semester V
21ADUSOL Advanced Data Science 4H-4C
Instruction Hours/week: L:4T:0P:0— Marks: Internal:40 External:60 Total:100
End Semester Exam:3 Hours
Course Objectives
The goal of this course is for the students
‘* To leam the fundamentals of data science and big data.
‘+ To gain in-depth knowledge on descriptive data analytical techniques,
‘+ To gain knowledge to implement simple to complex analytical algorithms in big data
frameworks.
To learn the use of big data processing methods in data science
To understand and perform data visualization, web scraping and machine learning
using various Data Science tools.
To build data science products that can be used by a broad audience
Course Outcomes (COs)
Upon Completion of this course the students will be able to:
1, Understand the deseribe the fundamentals of advanced data science
2. Understand about Hadoop file system
3. Apply suitable statistical testing by converting any real-world decision-making
problem to hypothesis
4, Analyze various open-source frameworks for modelling and storing data and data
analytics methods to choose best approaches.
5. Develop simple applications involving analytics using Hadoop and MapReduce.
6. Build data science products that can be used by a broad audience
Unit I-Data Science Fundamentals
Data Science — Fundamentals and Components — Data Scientist — Terminologies Used in
Big Data Environments — Types of Digital Data — Classification of Digital Data —
Introduction to Big Data — Characteristics of Data — Evolution of Big Data — Big Data
Analytics - Classification of Analytics — Top Challenges Facing Big Data - Importance of
Big Data Analytics ~ Data Analytics Tools. Linear Regression — Polynomial Regression —
Multivariate Regression
Unit I1- Introduction to Hadoop
Introducing Hadoop —Hadoop Overview — RDBMS versus Hadoop — HDFS (Hadoop
Distributed File System): Components and Block Replication — Processing Data with
Hadoop — Introduction to MapReduce — Features of MapReduce
Unit III Introduction to NoSQL
Introduction to NoSQL: CAP theorem — MongoDB: RDBMS Vs MongoDB ~ Mongo DB
Database Model ~ Data Types and Sharding — Introduction to Hive — Hive Architecture —
Hive Query Language (HQL).B.Sc Computer Science (Artificial Intelligence and Data Science) 2021-2022
Unit IV- Introduction to Essential Data Science Packages
Introduction to Essential Data Science Packages: Numpy, Seipy, Jupyter, Statsmodels and
Pandas Package ~ Data Munging: Introduction to Data Munging, Data Pipeline and Machine
Learning in Python
Unit V- Data Visualization Using Matplotlib
Data Visualization Using Matplotlib — Interactive Visualization with Advanced Data
Learning Representation in Python.
Suggested Readings:
1. Frank Pane. (2017). Hands on Data Science and Python Machin Learning, 1" Edition Packt
Publishers,
2. Yuxi (Hayden) Liu. (2017), Python Machine Learning by Example, 2” Edition, Packt
Publication.
3. Alberto Boschetti and Luca Massaron, (2016). Python Data Science Essentials, 2" Edition,
Packt Publishers.
4, Seema Acharya and Subhashini Chellapan. (2015). Big Data and Analytics, 2“ Edition, Wiley
Publishers
5. DT Editorial Services. (2015). Big Data, Black Book, 1" Edition Dream Tech Press.
Websites:
www.nptel.ac.in/courses/106/106/106106179/
‘wow. nptel.ac.in/eourses/106/106/106106212/
. www.nptel.ac.in/noc/courses/noc] 7/SEM2/no17-mg24/
‘www. nptel.ac.in/courses/106/104/106104189;
www.coursera.org/specializations/advanced-data-science-ibm