0% found this document useful (0 votes)
20 views

TB-Data Engineering - Syllabus-2024

The document outlines a 40+ hour cloud and data engineering syllabus covering topics like data warehousing, SQL, Python, Linux, Apache Hadoop, HDFS, Spark, HBase, Airflow, Kafka, cloud computing on AWS, S3, EC2, EMR, Athena, CLI, DynamoDB, Lambda, Glue, and Step Functions. Students can enroll in the course by contacting the numbers or email provided to learn data engineering basics, core concepts, and how to use AWS services for data storage, processing, querying and orchestrating workflows.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

TB-Data Engineering - Syllabus-2024

The document outlines a 40+ hour cloud and data engineering syllabus covering topics like data warehousing, SQL, Python, Linux, Apache Hadoop, HDFS, Spark, HBase, Airflow, Kafka, cloud computing on AWS, S3, EC2, EMR, Athena, CLI, DynamoDB, Lambda, Glue, and Step Functions. Students can enroll in the course by contacting the numbers or email provided to learn data engineering basics, core concepts, and how to use AWS services for data storage, processing, querying and orchestrating workflows.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Cloud & Data Engineering Syllabus (40+ hours)

Data Engineering Roadmap:

Data Engineering Roadmap Video: https://youtu.be/8uVRbry5A2U?feature=shared

Data Engineering basics: (Following will be covered as part of Data Engineering basics)

 Data warehousing
 SQL
 Python
 Linux
Data Engineering Core:

Apache Hadoop - (HDFS - Landing Zone for all the incoming files)

 Introduction to Big Data & Hadoop Fundamentals


 Dimensions of Big data
 Type of Data generation
 Apache ecosystem & its projects
 Hadoop distributors
 HDFS core concepts
 Modes of Hadoop employment
 HDFS Flow architecture
 HDFS MRV1 vs MRV2 architecture
 Types of Data compression techniques
 Rack topology/awareness
 HDFS utility commands with usages
 Min h/w requirements for a cluster & property files changes

Apache Spark - (Spark Jobs deployment with Python Programming)

 Introduction to Spark & features


 Spark Core & SparkSQL concepts
 Actions & Transformations logics
 Spark script to read & write table in Hbase & S3 buckets.

Apache Hbase - (No-SQL Database)

 Introduction to Hbase concepts & features


 Introduction to NoSQL/CAP theorem concepts
 Hbase design/architecture flow
 Hbase table commands

Apache Airflow - (Workflows orchestration)

 Airflow Introduction
 Installation
 Architecture
 Sample Project

Apache Kafka+ Streaming - (Streaming data from datasource)

 Introduction to Kafka and what is streaming data


 Working with Kafka & Installing
 Projects in Kafka

Cloud Computing Services

 Basic overview of the cloud


 Different types of cloud models
 Different types of cloud services
 Different vendors of cloud implementation
 Why to choose AWS?
 Features of AWS and key offerings

AWS S3 (create buckets to have the data ingested & also the transformed data)

 What is AWS S3 & where it is used for?


 What is AWS S3 buckets and how to create buckets in AWS Console?
 How to upload and manage files in AWS S3
 Features & advantages of S3
 How does AWS S3 works?

AWS EC2 - (Instance creation with VPC, Networking, Security Groups etc.)

 What is EC2 and its important features?


 Types of EC2 computing instances
 How to create EC2 instances with selecting AMI, Security Groups, VPC and connect using Putty
 What are the advantages of EC2 instances?

AWS EMR - (Spin up cluster for deploying Spark jobs)

 What is the usage the EMR and big data concepts?


 How to launch and configure the EMR service
 Run a sample Spark program to view the job details to analyse the big data

AWS Athena (Query data in S3 buckets)

 What is Amazon Athena and its features?


 How to create database, tables in Athena from S3 buckets and from DDL?
 How to use Athena with other AWS Services with usecase

AWS CLI (To query data using CLI commands)

 What is Amazon CLI and its features?


 How to use cloudshell for accessing aws services?
 How to use command line interface for triggering & querying datasets in S3 buckets

AWS DynamoDB (No-SQL DB to have the configuration mapping loaded)

 What is Amazon DynamoDB and its features?


 How to Create, Insert and Query A Table In DynamoDB
 How to integrate DynamoDB with other AWS services

AWS Lambda (Serverless computation service)

 What is Amazon Lambda and its features?


 How to write simple & basic Lambda function
 How to integrate Lambda+ S3 with other AWS services

AWS Glue (to convert file format conversion)


 Use AWS Glue Crawlers to discover the schema of your data in S3.
 Create an AWS Glue Data Catalog to store metadata information.
 Develop AWS Glue ETL jobs to transform the data using SparkSQL.
 Utilize AWS Glue Dynamic Frames for schema flexibility.
 Schedule and orchestrate ETL jobs using AWS Glue Triggers and Workflows.

AWS Step Functions (to orchestrate all the workflows step by step)

 What is Step functions and its features?


 How to orchestrate workflows with different AWS Services?
 How to define Tasks, States and create State Machines in AWS
 How to integrate Step functions in AWS with other services

How to enrol this course: If you're interested in joining this course, Please feel free to contact us:

Call: +91 90424 63272, +91 93422 72961

WhatsApp - +91 96196 63272

email id : admin@tamilboomi.com

You might also like