Diploma in Data Science Syllabus
Diploma in Data Science Syllabus
II. Problem analysis Identify, formulate, review research literature, and analyze complex engineering
III. Design solutions for complex engineering problems and design system components or processes
that meet the specified needs with appropriate consideration for the public health and safety, and
IV. Use research-based knowledge and research methods including design of experiments, analysis
and interpretation of data, and synthesis of the information to provide valid conclusions Manage
Construction Projects for Planning, Analyzing, Costing, Scheduling, Predicting and complete
V. Modern tool usage Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities
VI. Apply reasoning informed by the contextual knowledge to assess societal, health, safety, legal and
cultural issues and the consequent responsibilities relevant to the professional engineering
practice.
VII. Understand the impact of the professional engineering solutions in societal and environmental
contexts, and demonstrate the knowledge of, and need for sustainable development, Function
settings.
VIII. Communicate effectively on complex engineering activities with the engineering community and
1|Page
with society at large, such as, being able to comprehend and write effective reports and design
documentation, make effective presentations, and give and receive clear instructions.
IX. Demonstrate knowledge and understanding of the engineering and management principles and
apply these to one’s own work, as a member and leader in a team, to manage projects and in
multidisciplinary environments
X. Recognize the need for, and have the preparation and ability to engage in independent and life-
2|Page
PROGRAMME GUIDELINES
1 Credit = 10 hours of effort (10 hours of learning time which includes everything a learner has to do to
achieve the outcomes in a qualification including the assessment procedures and practical’s).
Guided Learning Hour for first year is 480 hours and second year is 480 hours.
Total Guided Learning Hours for Diploma in Data Science is 960 hours.
3|Page
DIPLOMA IN DATA SCIENCE
COURSE STRUCTURE
NO.
UNIT
YEAR SEMESTER OF. UNIT CREDIT CREDIT/YEAR
SPECIFICATION
UNITS
Essential unit 1 20
SEMESTER 1 Essential unit 1 20
Essential unit 1 20
I Essential unit 1 20 120
4|Page
LIST OF UNITS
S. Subject UNIT
CREDIT
No. Code SPECIFICATION
UNIT
1 I/719/2021 Technical Drawings with Engineering Graphics Essential unit 20
2 I/719/2022 Applied Mathematics for Information Technology Essential unit 200
3 I/719/2023 Foundations of Data Management Essential unit 20
4 I/719/2024 Data Management in Machine Learning Workflow Essential unit 20
5 Distributed Data Processing and Machine
I/719/2025 Essential unit 20
Learning Experimentation
6 I/719/2026 Data Privacy and anonymity Essential unit 20
7 I/719/2027 Streaming Data Systems Architecture Essential unit 20
8 I/719/2028 Streaming Data Frameworks Essential unit 20
9 I/719/2029 Data Pipelines and Data Models Essential unit 20
10 I/719/2030 Kafka Fundamentals and Programming Essential unit 20
11 Project Special unit 600
I/719/4041
(Essential)*
Elective Units
12 I/719/6511 Programming in Data Science Elective Unit 20
13 I/719/6512 Advanced Streaming Applications Elective Unit 20
14 I/719/6513 Streaming Analytics with Cloud Elective Unit 20
15 I/719/6514 Systems for Data Analytics Elective Unit 20
16 I/719/6515 Storytelling with Data and Ethics for Data Science Elective Unit 20
17 I/719/6516 Machine Learning Elective Unit 20
5|Page
Semester : I
Year : 1
Credit : 60
UNIT UNIT
UNIT CREDIT
CODE SPECIFICATION
Essential unit
I/719/2021 Technical Drawings with Engineering Graphics 20
Essential unit
I/719/2022 Applied Mathematics for Information Technology 20
I/719/2023 Foundations of Data Management Essential unit 20
I/719/2024 Data Management in Machine Learning Workflow Essential unit 20
I/719/6511 Programming in Data Science Elective Unit
Semester : II
Year : 1
Credit : 60
UNIT UNIT
UNIT CREDIT
CODE SPECIFICATION
Distributed Data Processing and Machine Essential unit
I/719/2025 20
Learning Experimentation
I/719/2026 Data Privacy and anonymity Essential unit 20
I/719/2027 Streaming Data Systems Architecture Essential unit 20
I/719/2028 Streaming Data Frameworks Essential unit 20
I/719/6512 Advanced Streaming Applications Elective Unit 20
Semester : III
Year : 2
Credit : 60
UNIT UNIT
UNIT CREDIT
CODE SPECIFICATION
I/719/2029 Data Pipelines and Data Models Essential unit 20
I/719/2030 Kafka Fundamentals and Programming Essential unit 20
I/719/6513 Streaming Analytics with Cloud Elective Unit 20
I/719/6514 Systems for Data Analytics Elective Unit 20
I/719/6515 Storytelling with Data and Ethics for Data Elective Unit
20
Science
6|Page
Semester : IV
Year : 2
Credit : 60
UNIT UNIT
UNIT CREDIT
CODE SPECIFICATION
I/719/4041 Project Special unit (Essential)* 600
7|Page
UNIT CODE : I/719/2021
UNIT TITLE : Technical drawings with Engineering Graphics
HOURS : 200
SPECIFICATION : Essential Unit
UNIT DESCRIPTION
This unit enables students to understand about the technical drawing and its importance. This unit
teaches the students about the vital role of technical drawings in engineering documents and
communication. This unit covers angle of projection, Multiview, section, detail drawing and symbol.
ULO3 - Ability to provide required information in technical drawing according to process and
operation.
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
ULO1 M M M M M M M
ULO2 M M M M
ULO3 M M M M M
8|Page
UNIT CODE : I/719/2022
UNIT DESCRIPTION
This course intends to provide an overview of analytical and numerical techniques to solve ordinary
and partial differential equations, which we apply to solve many engineering problems.
ULO1- Determine the solution of second and higher order linear differential equation and apply
knowledge of LDE to solve all the engineering problems.
ULO2- Classify, formulate and solve the first order and second order linear, non-linear partial
differential equations and apply the knowledge of partial differential equations
ULO3- Able to find approximate solution of ordinary differential equations of first order and find the
convergence and stability of the approximate solutions.
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
ULO1 M M M M M M M
ULO2 M M M M
ULO3 M M M M M
9|Page
UNIT CODE : I/719/2023
CREDIT : 20
UNIT DESCRIPTION
This unit aims developer to walk through the data engineering lifecycle and show you how to stitch
together a variety of cloud technologies to serve the needs of downstream data consumers. You'll
understand how to apply the concepts of data generation, ingestion, orchestration, transformation,
storage, and governance that are critical in any data environment regardless of the underlying
technology. It get a concise overview of the entire data engineering landscape and assess data
engineering problems using an end-to-end framework of best practice. This unit helps in understand
the marketing hype when choosing data technologies, architecture, and processes. It aims the data
engineering lifecycle to design and build a robust architecture and incorporate data governance and
security across the data engineering lifecycle.
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
ULO1 M M M M M M M
ULO2 M M M M
10 | P a g e
UNIT CODE : I/719/2024
CREDIT : 20
UNIT DESCRIPTION
This unit aims to develop learners’ knowledge in Data management for machine learning. Learners
understand the processes and practices involved in organizing, storing, preprocessing, and
maintaining data to support machine learning tasks. It encompasses various activities to ensure that
data is high quality, accessible, and suitable for training and validating machine learning models.
Effective data management is crucial for successful machine learning projects as the performance
and accuracy of models heavily rely on the quality and availability of data. By effectively managing
data for machine learning, organizations can enhance the accuracy and reliability of their models,
improve decision-making processes, and derive valuable insights from data-driven solutions.
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
ULO1 M M M M M M M
ULO2 M M M M
11 | P a g e
UNIT CODE : I/719/2025
CREDIT : 20
UNIT DESCRIPTION
This unit aims to build big data systems using an architecture that takes advantage of clustered
hardware along with new tools designed specifically to capture and analyse web-scale data. It
describes a scalable, easy-to-understand approach to big data systems that can be built and run by a
small team. This unit guides learner through the theory of big data systems, how to implement them
in practice, and how to deploy and operate them once they're built. In addition to discovering a
general framework for processing big data, you'll learn specific technologies like Hadoop, Storm,
and NoSQL databases.
ULO1 - Understand the Real-time processing of web-scale data using tools like Hadoop,
Cassandra, and Storm
ULO2 - Deploy Machine Learning models in production
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
ULO1 M M M M M M M M
ULO2 M M M
ULO3 M M M M M M M
12 | P a g e
UNIT CODE : I/719/2026
CREDIT : 20
UNIT DESCRIPTION
This unit aims to understand the data privacy and anonymity on top search engines, social media and
other powerful Internet players from tracking and profiling your online activities, gain unrestricted
access to all the content and downloads the Internet has to offer use social media to stay connected
with friends in ways that don't compromise your privacy or safety. This units helps to use the best
privacy, anonymity and security apps that really work mask your IP address with a proxy, The Onion
Router (Tor) or a virtual private network (VPN). Use encryption to keep your digital items,
downloads and personal information completely hidden and safe. Prevent surveillance and the
monitoring of your activities by Internet service providers (ISP), governments, adversaries and other
unwelcome snoops
ULO1 - Understand the data privacy and anonymity on top search engines
ULO2 - Understand the Causes of ML System failure and Problems with ML Production Monitoring
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
ULO1 M M M M M M M
ULO2 M M M M
13 | P a g e
UNIT CODE : I/719/2027
UNIT TITLE : Streaming Data Systems Architecture
CREDIT : 20
SPECIFICATION : Essential Unit
UNIT DESCRIPTION
This unit aims understand the components of streaming data systems with their capabilities and
characteristics. Learn the relevant architecture and best practices for processing and analysis of
streaming data. Gain knowledge about the development of system for data aggregation, delivery and
storage using Open-source tools. Get familiarity with the advance streaming applications like
Streaming SQL, Streaming machine learning. Learner can understand streaming of data systems
with the relevant examples and illustrated use cases, you'll explore designs for applications that read,
analyze, share, and store streaming data. Along the way, you'll discover the roles of key technologies
like Spark, Storm, Kafka, and more.
ULO1 - Understand the streaming of data systems and various data platforms
ULO2 - Understand the algorithms for streaming data application for data aggregation, delivery and
storage
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
ULO1 M M M M M M M
ULO2 M M M M
14 | P a g e
UNIT CODE : I/719/2028
UNIT TITLE : Streaming Data Frameworks
CREDIT : 20
SPECIFICATION : Essential Unit
UNIT DESCRIPTION
The unit aims to understand stream processing framework that can process data in real-time from
multiple sources, including Apache Kafka, which Samza was developed in conjunction with. It is
written in Java and Scala, uses Apache YARN for resource management, and provides exactly-once
processing semantics. This unit provides core principles and concepts behind robust out-of-order
data processing. It provides watermarks track progress and completeness in infinite datasets and data
processing techniques ensure correctness. The concepts of streams and tables form the foundations
of both batch and streaming data processing.
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
ULO1 M M M M M M M
ULO2 M M M M
15 | P a g e
UNIT CODE : I/719/2029
CREDIT : 20
UNIT DESCRIPTION
This unit aims the learner to understand about Data pipelines are the foundation for success in data
analytics. Moving data from numerous diverse sources and transforming it to provide context is the
difference between having data and actually gaining value from it. This pocket reference defines data
pipelines and explains how they work in today's modern data stack. Learner will learn common
considerations and key decision points when implementing pipelines, such as batch versus streaming
data ingestion and build versus buy. This book addresses the most common decisions made by data
professionals and discusses foundational concepts that apply to open-source frameworks, commercial
ULO1- Understand the products and design for building data pipelines
ULO2- Understand the Data Ingestion from Extracted Data
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
ULO1 M M M M M M M
ULO2 M M M M
16 | P a g e
UNIT CODE : I/719/2030
UNIT DESCRIPTION
This units understand the learners to create enterprise application data, whether it consists of log
messages, metrics, user activity, or outgoing messages. Moving all this data is just as important as the
data itself. With this updated edition, application architects, developers, and production engineers new
to the Kafka streaming platform will learn how to handle data in motion. Additional chapters cover
Kafka's Admin Client API, transactions, new security features, and tooling changes. Learner learn to
working with Kafka and how to deploy production Kafka clusters, write reliable event-driven
microservices, and build scalable stream processing applications with this platform. Through detailed
examples, you'll learn Kafka's design principles, reliability guarantees, key APIs, and architecture
details, including the replication protocol, the controller, and the storage layer.
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
ULO1 M M M M M M M
ULO2 M M M M
17 | P a g e
UNIT CODE : I/719/4041
UNIT TITLE : Project
HOURS : 600
SPECIFICATION : Essential Unit
UNIT DESCRIPTION
The module aims to enable you to complete a substantial piece of individual work and build on your
expertise in a selected area of study. It aims to develop your research, time management,
presentation and written communication skills.
ULO1 - Identify a research question, problem or hypothesis and establish aims and objectives to
support the investigation.
ULO2 - Communicate the planned project work using standard methods and tools.
ULO3 - Develop a research and data collection strategy appropriate to the research question /
problem posed.
ULO4 – Critically evaluate the research findings using reasoned and logical arguments within a
structured written framework.
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
ULO1 M M M M M M M M
ULO2 M M M
ULO3 M M M M M M M
ULO4 M M M M M M
18 | P a g e
UNIT CODE : I/719/6511
UNIT DESCRIPTION
This course will cover the main principles of computer programming with a focus on data science
applications by following the entire pathway from raw data to databases, data wrangling and
ULO1- Gain knowledge on the main principles of programming in the Data science context
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
ULO1 M M M M M M M
ULO2 M M M M
ULO3 M M M M
19 | P a g e
UNIT CODE : I/719/6512
UNIT TITLE : Advanced Streaming Applications
CREDIT : 20
UNIT DESCRIPTION
This unit aims the learner to understand the unbounded and fast-moving data streams has historically
been difficult. But with Kafka Streams and ksq lDB, building stream processing applications is easy
and fun. This practical guide shows data engineers how to use these tools to build highly scalable
stream processing applications for moving, enriching, and transforming large amounts of data in real
time. This unit provide learners to understand important stream processing concepts against a backdrop
of several interesting business problems. You'll learn the strengths of both Kafka Streams and ksq lDB
to help you choose the best tool for each unique stream processing project. Non-Java developers will
find the ksq lDB path to be an especially gentle introduction to stream processing.
ULO1- Understand Kafka communication pattern and build stateless and stateful stream processing
applications.
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
ULO1 M M M M M M M
ULO2 M M M M
20 | P a g e
UNIT CODE : I/719/6513
UNIT TITLE : Streaming Analytics with Cloud
CREDIT : 20
UNIT DESCRIPTION
This unit introduced to Azure Stream Analytics, and see how you can use the tools and functions in
Azure to develop your own Streaming Analytics. Over the course, you will be given comparative
analytic guidance on using Azure Streaming with other Microsoft Data Platform resources such as Big
Data Lambda Architecture integration for real time data analysis and differences of scenarios for
architecture designing with Azure HDInsight Hadoop clusters with Storm or Stream Analytics. The
unit also provides you how you can manage, monitor, and scale your solution for optimal performance.
You will be well-versed in using Azure Stream Analytics to develop an efficient analytics solution that
can work with any type of data. Style and approach A comprehensive guidance on developing real-time
ULO1- Understand the Azure Stream Analytics to develop an efficient analytics solution
ULO2- Understand the serverless streaming data service that makes it easy to capture, process,
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
21 | P a g e
ULO1 M M M M M M M
ULO2 M M M M
UNIT DESCRIPTION
The unit introduce the systems perspective of data analytics: to leverage systems effectively,
understand, measure, and improve performance while performing data analytics tasks. Enable learner
to develop a working knowledge of how to use parallel and distributed system for data analytics.
Enable learner to apply best practices in storing and retrieving data for analytics. Enable learner to
leverage commodity infrastructure (such as scale-out clusters, distributed datastores, and the cloud) for
data analytics. The units extend the basis for effective decisions. Whoever has the data has the ability
to extract information promptly and effectively to make pertinent decisions. The premise of this unit is
to empower users and tool developers with the appropriate collection of formulas and techniques for
data analytics and to serve as a quick reference to keep pertinent formulas within fingertip reach of
learner.
ULO1 - Understand the Systems Attributes and Data Storage for Data Analytics
ULO2 - Understand the Strategies for data access: Partition, Replication, and Messaging
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
22 | P a g e
ULO1 M M M M M M M M
ULO2 M M M
UNIT TITLE : Storytelling with Data and Ethics for Data Science
CREDIT : 20
UNIT DESCRIPTION
This unit Storytelling with Data teaches the learner about the fundamentals of data visualization and
how to communicate effectively with data. Learner will discover the power of storytelling and the way
to make data a pivotal point in your story. The content in this illuminative text are grounded in theory,
but made accessible through numerous real-world examples—ready for immediate application to your
next graph or presentation. Storytelling is not an inherent skill, especially when it comes to data
visualization, and the tools at our disposal don't make it any easier. It provides impact of data science
continues to grow on society there is an increased need to discuss how data is appropriately used and
how to address misuse. Yet, ethical principles for working with data have been available for decades.
The real issue today is how to put those principles into action.
ULO2- Understand the deliberate practice of data ethics for better products, better teams, and
better outcomes
MAPPING
23 | P a g e
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
ULO1 M M M M M M M
ULO2 M M M M
CREDIT : 20
UNIT DESCRIPTION
The objective of the course is to understand the basic theory underlying machine learning and be able
to formulate machine learning problems corresponding to different applications. After completing this
course, the student will be able to Understand a very broad collection of machine learning algorithms
and problems
MAPPING
PLO1 PLO2 PLO3 PLO4 PLO5 PLO6 PLO7 PLO8 PLO9 PLO10
ULO1 M M M M M M M
ULO2 M M M M
24 | P a g e
ASSESSMENT METHODS AND TECHNIQUES FOR
DIPLOMA IN DATA SCIENCE
Assessment Type of Formative or
Description
technique Assessment Summative
Case studies Oral/ Problem Students are required to work through a case Formative
based/ Practical study to identify the problem(s) and to offer
potential solutions; useful for assessing students’
understanding and for encouraging students to
see links between theory and practice. Case
studies could b e provided in advance of a
time-
constrained assessment.
Concept maps Written/ Oral Students map out their understanding of a Formative
particular concept. This is a useful (and
potentially quick) exercise to provide feedback to
staff on students’ understanding.
‘Doing it’ exam Written An exam which requires students to do Formative /
something, like read an article, analyze and Summative
interpret data etc.
Field report Written/ Oral Students are required to produce a written/ oral Formative
report relating to a field/ site visit.
Laboratory books Practical/ Students are required to write a report for all (or Summative
/ Reports Written a designated sample) of practical’s in a single lab
book. A sample of lab books will be collected
each week to mark any reports of labs done in
previous weeks; this encourages students to keep
their lab books up to date. Each student should be
sampled the same number of times throughout
the module with a designated number
contributing to the
assessment mark.
Multiple choice Written Can be useful for diagnostic, formative Formative /
questions assessment, in addition to summative assessment. Summative
(MCQs) Well-designed questions can assess more than
factual recall of information, but do take time to
design.
Online discussion Written Students are assessed on the basis of their Formative
boards contributions to an online discussion for
25 | P a g e
example, with their peers; this could be hosted
on a virtual
learning environment (VLE).
Open book exams Written Students have the opportunity to use any or Summative
specified resources to help them answer set
questions under time constraints. This method
removes the over-reliance on memory and recall
and models the way that professionals manage
information.
Students are asked to give an oral presentation on
a particular topic for a specified length of time Summative
Oral presentations Oral / Written and
could also be asked to prepare associated
26 | P a g e
handout(s). Can usefully be combined with self-
and peer-assessment.
Problem sheets Written Students complete problem sheets, e.g. on a Formative
weekly basis. This can be a useful way of
providing students with regular formative
feedback on their work and/or involving elements
of self- and peer assessment.
Research projects Written/ Potential for sampling wide range of practical, Formative /
/ Group projects Practical/ Oral/ analytical and interpretative skills. Can assess Summative
Performance/ wide application of knowledge, understanding
Problem based/ and skills.
Work placement
Short answer Written Useful to assess a wide range of knowledge/skills Summative
questions across a module.
Simulations Practical/ Text or virtual computer-based simulations are Formative
Written/ Oral/ provided for students, who are then required to
Problem-based answer questions, resolve problems, perform tasks
and take actions etc. according to changing
circumstances within the simulation. Useful for
assessing a wide range of skills, knowledge and
competencies.
Viva voce Oral Often used for assessing ‘borderline’ degree Summative
classifications but also useful to explore
students’ understanding of a wide range of
topics. Depending on class size however, they
can be
time consuming for staff.
27 | P a g e
28 | P a g e