0% found this document useful (0 votes)

44 views11 pages

DE Python

Uploaded by

subrahmanya02_203915

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views11 pages

DE Python

Uploaded by

subrahmanya02_203915

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Data Engineering and Machine Learning Using Python

Module 1: Introduction to Machine Learning

▪ Introduction To Machine Learning

▪ Life Cycle of Machine Learning
▪ Skills required for Machine Learning
▪ Careers Path in Machine Learning
▪ Applications of Machine Learning

Module 3: Python for Machine Learning

▪ Python programming:
▪ Environment Setup
▪ Jupyter Notebook Overview
▪ Data types:Numbers,Strings,Printing,Lists,Dictionaries,Booleans,Tuples
,Sets
▪ Comparison Operators
▪ if,elif, else Statements
▪ Loops:for Loops,while Loops
▪ range()
▪ list comprehension
▪ functions
▪ lambda expressions
▪ map and filter
▪ methods
▪ Programming Exercises.
▪ Object Oriented Programming
▪ Modules and packages
▪ Errors and Exception Handling
▪ Python Decorators
▪ Python generators
▪ Collections
▪ Regular Expression
▪ Python for Exploratory Data Analysis:
▪ NumPy:
▪ Installing numpy
▪ Using numpy
▪ NumPy arrays
▪ Creating numpy arrays from python list
▪ Creating arrays using built in
methods(arrange(),zeros(),ones(),linspace(),eye(),rand(),etc.
▪ Array attributes :shape, type
▪ Array methods: Reshape(),min(),max(),argmax(),argmin(),etc.
▪ Pandas:
▪ Introduction to Pandas
▪ Series
▪ DataFrames
▪ Missing Data
▪ GroupBy
▪ Merging, Joining and Concatenating
▪ Operations
▪ Data Input and Output
▪ Python for Data Visualization:
▪ Matplotlib:
▪ Installing Matplotlib,Basic Matplotlib commands
▪ Creating Multiplot on same canvas
▪ Object Oriented Method:figure(),plot(),add_axes(),subplots(),etc.
▪ MatplotlibExercise
▪ Seaborn:
▪ Categorical plot
▪ Distribution plot
▪ Regression plot
▪ Seaborn Exercise
▪ Pandas built in visualization:
▪ Scatter plot
▪ Histograms
▪ Box plot
▪ CAPSTONE PROJECT FOR DATA ANALYSIS

Module 4: Deep dive into Machine Learning

▪ Introduction To Machine Learning:

▪ Relationship between Data Science and Machine Learning
▪ Supervised Learning
▪ Unsupervised Learning

Supervised Learning (Regression AND Classification Algorithms):

▪ Linear Regression
▪ Ridge Regression
▪ Lasso Regression
▪ Polynomial Regression
▪ Support vector regression
▪ Decision Tree Regression
▪ Random Forest Regression
▪ Logistic Regression
▪ Support Vector Machines
▪ Kernel SVM
▪ Decision Trees and Random Forest
▪ Ensemble Of Decision Trees
▪ Model Evaluation and Improvement

Unsupervised Learning:

▪ Challenges in Unsupervised Learning

▪ Preprocessing AND Scaling
▪ Dimensionality Reduction, Feature Extraction
▪ Principle Component Analysis (PCA)
▪ Clustering
▪ KMEANS
▪ Model evaluation and improvement
▪ Cross validation, Grid search, Evaluation metrics and scoring
▪ Working with text data

Module 5: NLP & Recommender Systems:

▪ Corpus
▪ Text preprocessing using Bag of words technique
▪ TF(Term Frequency)
▪ IDF(Inverse Document Frequency)
▪ Normalization
▪ Vectorization
▪ NLP with Python

Hadoop Developer Course

During this course you will learn:

• Linux (Ubuntu/Centos) - Tips and Tricks

• Basic Java Programming – Core Java Oops Concepts
• Introduction to Big Data and Hadoop
• Hadoop ecosystem concepts
• Hadoop MapReduce concepts and features
• Developing MapReduce applications
• Pig concepts
• Hive concepts
• Impala
• Oozie workflow concepts
• Sqoop Data Ingestion
• Flume Agents
• Tableau Visualization
HBase concepts
• Real Time tools like Hue, Putty, FileZilla, Cloudera Manager
• Real Time Projects

Linux (Ubuntu/Cent Os) - Tips and Tricks

Basic(core) Java Programming Concepts – OOPS

Introduction to Big Data and Hadoop

• What is Big Data?
• What are the challenges for processing big data?
• What is Hadoop?
• Why Hadoop?
• History of Hadoop
• Hadoop ecosystem
• HDFS
• MapReduce

Understanding the Cluster

• Hadoop 2.x Architecture
• Typical workflow
• HDFS Commands
• Writing files to HDFS
• Reading files from HDFS
• Rack awareness
• Hadoop daemons

Let's talk MapReduce

• Before MapReduce
Hadoop Developer Course

• MapReduce overview
• Word count problem
• Word count flow and solution
• MapReduce flow

Developing the MapReduce Application

• Data Types
• File Formats
• Explain the Driver, Mapper and Reducer code
• Configuring development environment - Eclipse
• Writing unit test
• Running locally
• Running on cluster
• Hands on exercises

How MapReduce Works

• Anatomy of MapReduce job run
• Job submission
• Job initialization
• Task assignment
• Job completion
• Job scheduling
• Job failures
• Shuffle and sort
• Hands on exercises

MapReduce Types and Formats

• File Formats – Sequence Files
• Compression Techniques
• Input Formats - Input splits & records, text input, binary input
• Output Formats - text output, binary output, lazy output
• Hands on exercises

MapReduce Features

Counters
• Side data distribution
• MapReduce combiner
• MapReduce partitioner
• MapReduce distributed cache
• Hands exercises

Hive
• Hive Architecture
• Types of Metastore
• Hive Data Types
Hadoop Developer Course
• HiveQL
• File Formats – Parquet, ORC, Sequence and Avro Files Comparison
• Partitioning & Bucketing
• Hive JDBC Client
• Hive UDFs
• Hive Serdes
• Hive on Tez
• Hands-on exercises
• Integration with Tableau

Pig
• Pig Architecture
• Pig Data Types
• Load/Store Functions
• PigLatin
• Pig Udfs

Hbase

• HBase architecture and concepts

• Hbase Data Model
• Hbase Shell Interface
• Hbase Java API

Sqoop
• Sqoop Architecture
• Sqoop Import Command Arguments, Incremental Import
• Sqoop Export
• Sqoop Jobs
• Hands-on exercises

Flume
• Flume Architecture
• Flume Agent Setup
• Types of sources, channels, sinks Multi Agent Flow
• Hands-on exercises

Oozie
• Oozie Fundamentals
• Oozie workflow creations
• Oozie Job submission, monitoring, debugging
• Concepts on Coordinators and Bundles
• Hands-on exercises
Case Studies Discussions

Any one of the Four Projects

• Log File Analysis covering Flume, HDFS, MR/Pig, Hive, Tableau
• Crime Data Analysis Covering Oozie, Sqoop, HDFS, Hive, Hbase, RestFul Client.

• Hadoop Use Cases in Insurance Domain

Hadoop Use Cases in Retail Domain

Scala or Python , Spark
➢ Understand the difference between Apache Spark and Hadoop
➢ Learn Scala and its programming implementation

✓ Why Scala or python

✓ Scala Installation
✓ Get deep insights into the functioning of Scala
✓ Execute Pattern Matching in Scala
✓ Functional Programming in Scala – Closures, Currying, Expressions,
Anonymous Functions
✓ Know the concepts of classes in Scala
✓ Object Orientation in Scala – Primary, Auxiliary Constructors, Singleton &
Companion Objects
✓ Traits and Abstract classes in Scala
✓ Scala Simple Build Tool – SBT
✓ Building with Maven

➢ Spark Basics

✓ What is Apache Spark?

✓ Spark Installation
✓ Spark Configuration
✓ Spark Context
✓ Using Spark Shell
✓ Resilient Distributed Datasets (RDDs) – Features, Partitions, Tuning Parallelism
✓ Functional Programming with Spark

➢ Working with RDDs

✓ RDD Operations - Transformations and Actions
✓ Types of RDDs
✓ Key-Value Pair RDDs – Transformations and Actions
✓ MapReduce and Pair RDD Operations
✓ Serialization

➢ Spark on a cluster

✓ Overview
✓ A Spark Standalone Cluster
✓ The Spark Standalone Web UI
✓ Executors & Cluster Manager
✓ Spark on YARN Framework

➢ Writing Spark Applications

✓ Spark Applications vs. Spark Shell

✓ Creating the SparkContext
✓ Configuring Spark Properties
✓ Building and Running a Spark Application
✓ Logging
✓ Spark Job Anatomy

➢ Caching and Persistence

✓ RDD Lineage
✓ Caching Overview
✓ Distributed Persistence

➢ Improving Spark Performance

✓ Shared Variables: Broadcast Variables

✓ Shared Variables: Accumulators
✓ Per Partition Processing
✓ Common Performance Issues

➢ Spark API for different File Formats & Compression Codecs

✓ Text
✓ CSV
✓ Sequence
✓ Parquet
✓ ORC
✓ Compression Techniques – Snappy, Zlib, Gzip

➢ Spark SQL
✓ Spark SQL Overview
✓ HiveContext
✓ SQL Datatypes
✓ Dataframes vs RDDs
✓ Operations on DFs
✓ Parquet Files with Spark Sql – Read, Write, Partitioning, Merging Schema
✓ ORC Files
✓ JSON Files
✓ Inferring Schema programmatically
✓ Custom Case Classes
✓ Temp Tables vs Persistent Tables
✓ Writing UDFs
✓ Hive Support
✓ JDBC Support - Examples
✓ HBase Support - Examples
➢ Spark Streaming

✓ Spark Streaming Overview

✓ Example: Streaming Word Count
✓ Other Streaming Operations
✓ Sliding Window Operations
✓ Developing Spark Streaming Applications – Integration with Kafka and Hbase

Complementary Course: AWS

20IT503 - Big Data Analytics - Unit4
No ratings yet
20IT503 - Big Data Analytics - Unit4
73 pages
A Review On Artificial Intelligence With Deep Human Reasoning
No ratings yet
A Review On Artificial Intelligence With Deep Human Reasoning
4 pages
Data Science Training Content Naresh IT Hyderabad
No ratings yet
Data Science Training Content Naresh IT Hyderabad
13 pages
Developer Training For Apache Spark and Hadoop
No ratings yet
Developer Training For Apache Spark and Hadoop
3 pages
Data Bots Training Courses
100% (1)
Data Bots Training Courses
36 pages
IIT Kharagpur Data Science PDF
No ratings yet
IIT Kharagpur Data Science PDF
22 pages
Linux Programming
No ratings yet
Linux Programming
4 pages
Venu Data Engineering Training in Hyderabad 1
No ratings yet
Venu Data Engineering Training in Hyderabad 1
8 pages
Big Data & Hadoop - Course Curriculum
No ratings yet
Big Data & Hadoop - Course Curriculum
6 pages
Hadoop Course Circulum
No ratings yet
Hadoop Course Circulum
2 pages
Big Data - Road Map
No ratings yet
Big Data - Road Map
22 pages
Big Data and Hadoop For Developers - Syllabus
No ratings yet
Big Data and Hadoop For Developers - Syllabus
6 pages
Hadoop Development Download Syllabus PDF
No ratings yet
Hadoop Development Download Syllabus PDF
5 pages
B.Tech. CS - CE and CSE Syllabus 3rd Year 2024-25
No ratings yet
B.Tech. CS - CE and CSE Syllabus 3rd Year 2024-25
2 pages
Specialised Programme On Big Data and Machine Learning - 8 Weeks
No ratings yet
Specialised Programme On Big Data and Machine Learning - 8 Weeks
6 pages
Diploma in Data Science Online Training Content by MR Navin NareshIT Modified
No ratings yet
Diploma in Data Science Online Training Content by MR Navin NareshIT Modified
10 pages
MCA - II Sem - Curriculum and Syllabus
No ratings yet
MCA - II Sem - Curriculum and Syllabus
15 pages
Annexure - I - Syllabus PG-DBDA Aug 16
No ratings yet
Annexure - I - Syllabus PG-DBDA Aug 16
4 pages
Road Map 1741960074
No ratings yet
Road Map 1741960074
24 pages
Big Data Engineer Course
No ratings yet
Big Data Engineer Course
31 pages
Azure de and Fabric de Full Edited
No ratings yet
Azure de and Fabric de Full Edited
7 pages
Cloud Data Engineering V1.0
No ratings yet
Cloud Data Engineering V1.0
5 pages
Learn Well Technocraft: Hadoop/Big Data Syllabus
100% (1)
Learn Well Technocraft: Hadoop/Big Data Syllabus
12 pages
Hadoop Online Training
No ratings yet
Hadoop Online Training
7 pages
Had Oop Details
No ratings yet
Had Oop Details
21 pages
Skyess Spark Syllabus
No ratings yet
Skyess Spark Syllabus
12 pages
Course Outline Hadoop and Spark For Big Data and Data Science
100% (1)
Course Outline Hadoop and Spark For Big Data and Data Science
4 pages
Course Outline Hadoop and Spark For Big Data and Data Science PDF
No ratings yet
Course Outline Hadoop and Spark For Big Data and Data Science PDF
4 pages
B2. Introduction To Big Data With Spark and Hadoop - Coursera
No ratings yet
B2. Introduction To Big Data With Spark and Hadoop - Coursera
12 pages
Hadoop Course Content
No ratings yet
Hadoop Course Content
2 pages
Bigdata Hadoop Spark - Python
No ratings yet
Bigdata Hadoop Spark - Python
8 pages
Data Analytics TOC
No ratings yet
Data Analytics TOC
6 pages
Big Data Hadoop & Spark Curriculum
No ratings yet
Big Data Hadoop & Spark Curriculum
10 pages
GAME
No ratings yet
GAME
2 pages
Big Data Training in Chennai - Big Data Course in Chennai
No ratings yet
Big Data Training in Chennai - Big Data Course in Chennai
1 page
Course Contents of Hadoop and Big Data
No ratings yet
Course Contents of Hadoop and Big Data
11 pages
Azure SQL Trainings: Contact: +91 90 32 82 44 67
No ratings yet
Azure SQL Trainings: Contact: +91 90 32 82 44 67
6 pages
Syllabus of Big Data Analysis - Proposed
No ratings yet
Syllabus of Big Data Analysis - Proposed
2 pages
Advanta Innovation: Course Objective Summary
No ratings yet
Advanta Innovation: Course Objective Summary
3 pages
Big Data With Hadoop and Spark - 2023-25
No ratings yet
Big Data With Hadoop and Spark - 2023-25
4 pages
Big Data Hadoop - Course Curriculum - V1
No ratings yet
Big Data Hadoop - Course Curriculum - V1
7 pages
Bigdata Engineer Complete Syllabus: Presented by
No ratings yet
Bigdata Engineer Complete Syllabus: Presented by
21 pages
Big Data Analytics
No ratings yet
Big Data Analytics
2 pages
Trend Nologies Curriculum
No ratings yet
Trend Nologies Curriculum
30 pages
Training For Bigdata and Hadoop: #I Background and Introduction
No ratings yet
Training For Bigdata and Hadoop: #I Background and Introduction
9 pages
Data Science C
No ratings yet
Data Science C
21 pages
Data Engineering Brochure FXSr63lN9T
No ratings yet
Data Engineering Brochure FXSr63lN9T
14 pages
Big Data - Hadoop & Spark Training Syllabus: Tamilboomi
No ratings yet
Big Data - Hadoop & Spark Training Syllabus: Tamilboomi
4 pages
Big Data Hadoop Certification Training: About Intellipaat
No ratings yet
Big Data Hadoop Certification Training: About Intellipaat
13 pages
Hadoop (Big Data) : Skills Gained
No ratings yet
Hadoop (Big Data) : Skills Gained
8 pages
Hadoop Architect Brochure
No ratings yet
Hadoop Architect Brochure
13 pages
Big Data Roadmap
No ratings yet
Big Data Roadmap
3 pages
Introduction Big Data With Hadoop
No ratings yet
Introduction Big Data With Hadoop
3 pages
Big Data Analytics - Sem 7 CVMU
No ratings yet
Big Data Analytics - Sem 7 CVMU
4 pages
Heuristic Ladder: Hadoop - Big Data Analytics Course
No ratings yet
Heuristic Ladder: Hadoop - Big Data Analytics Course
5 pages
Data Engineer in 3 Months
No ratings yet
Data Engineer in 3 Months
2 pages
Bigdata
No ratings yet
Bigdata
3 pages
Big Data Analytics
No ratings yet
Big Data Analytics
2 pages
Course Pack BDA
No ratings yet
Course Pack BDA
6 pages
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
4/5 (1)
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
CO I Internal 2020
No ratings yet
CO I Internal 2020
3 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
Module 4
No ratings yet
Module 4
103 pages
Module 3
No ratings yet
Module 3
60 pages
ETHICS - (Resubmission) The Categorical Imperative by Immanuel Kant (Reflection)
No ratings yet
ETHICS - (Resubmission) The Categorical Imperative by Immanuel Kant (Reflection)
1 page
Cost Estimation of Substrate For Biogas (A Case Study of Pokhara Nepal)
No ratings yet
Cost Estimation of Substrate For Biogas (A Case Study of Pokhara Nepal)
14 pages
Sirco-M-Mv Brochure 2014-02 Doc68113 En-Gb
No ratings yet
Sirco-M-Mv Brochure 2014-02 Doc68113 En-Gb
6 pages
SAEP-16 - 0305 - Project Execution Guide For Process Automation Systems
0% (1)
SAEP-16 - 0305 - Project Execution Guide For Process Automation Systems
18 pages
Pre-Test - Performing The Engagement
No ratings yet
Pre-Test - Performing The Engagement
2 pages
A Practical Applications of Virtual PLC Using LabVIEW Software
No ratings yet
A Practical Applications of Virtual PLC Using LabVIEW Software
6 pages
SOA Exam P Syllabus
No ratings yet
SOA Exam P Syllabus
3 pages
ENGL 110: Introduction To Academic Writing
No ratings yet
ENGL 110: Introduction To Academic Writing
7 pages
Introduction To Quantum Computing
No ratings yet
Introduction To Quantum Computing
8 pages
Attitude and Purchase Intention Towards Smartwatches An Empirical Research On The Innovative Wearable Technology Field1
No ratings yet
Attitude and Purchase Intention Towards Smartwatches An Empirical Research On The Innovative Wearable Technology Field1
20 pages
Boundary Layer Notes PDF
No ratings yet
Boundary Layer Notes PDF
10 pages
Application Slip
No ratings yet
Application Slip
2 pages
Lesson Plan The Sun Grades 4-5 Us
No ratings yet
Lesson Plan The Sun Grades 4-5 Us
3 pages
Amplifiers - The Technique of Sound Reproduction PDF
100% (1)
Amplifiers - The Technique of Sound Reproduction PDF
254 pages
Weight Steel
No ratings yet
Weight Steel
128 pages
Code of Ethics For Philippine Librarians, Code of Ethics of Indonesian Librarians and Code of Ethics For Malaysian Librarians: A Comparative Study
No ratings yet
Code of Ethics For Philippine Librarians, Code of Ethics of Indonesian Librarians and Code of Ethics For Malaysian Librarians: A Comparative Study
18 pages
Datasheet ADC10 R8
No ratings yet
Datasheet ADC10 R8
7 pages
Application Form For Admission in MSC Engineering/Ms Sciences
No ratings yet
Application Form For Admission in MSC Engineering/Ms Sciences
2 pages
AMSD
No ratings yet
AMSD
6 pages
Theory of Structures - SEM IX - Long Span Structures
No ratings yet
Theory of Structures - SEM IX - Long Span Structures
99 pages
Dell Technologies Cloud Implementation
No ratings yet
Dell Technologies Cloud Implementation
26 pages
Brielle Kiewiet - Resume PDF
No ratings yet
Brielle Kiewiet - Resume PDF
2 pages
Holidays Homework Grade - 2
100% (1)
Holidays Homework Grade - 2
3 pages
(Journal of Environmental Geography) Drought Monitoring With Spectral Indices Calculated From Modis Satellite Images in Hungary
No ratings yet
(Journal of Environmental Geography) Drought Monitoring With Spectral Indices Calculated From Modis Satellite Images in Hungary
10 pages
Odisha Police Si 6f524e88
No ratings yet
Odisha Police Si 6f524e88
4 pages
Spears Tru Union Ball Valve Installation Instructions
No ratings yet
Spears Tru Union Ball Valve Installation Instructions
2 pages
KL-21B Fiber Cleaver Manual
No ratings yet
KL-21B Fiber Cleaver Manual
3 pages
Gigaset DA410 User Guide
No ratings yet
Gigaset DA410 User Guide
12 pages
Humane Shooting of Kangaroos and Wallabies National Code 2020
No ratings yet
Humane Shooting of Kangaroos and Wallabies National Code 2020
55 pages

DE Python

Uploaded by

DE Python

Uploaded by

Data Engineering and Machine Learning Using Python

Module 1: Introduction to Machine Learning

▪ Introduction To Machine Learning

Module 3: Python for Machine Learning

Module 4: Deep dive into Machine Learning

▪ Introduction To Machine Learning:

Supervised Learning (Regression AND Classification Algorithms):

▪ Challenges in Unsupervised Learning

Module 5: NLP & Recommender Systems:

Hadoop Developer Course

During this course you will learn:

• Linux (Ubuntu/Centos) - Tips and Tricks

Linux (Ubuntu/Cent Os) - Tips and Tricks

Basic(core) Java Programming Concepts – OOPS

Introduction to Big Data and Hadoop

Understanding the Cluster

Let's talk MapReduce

Developing the MapReduce Application

How MapReduce Works

MapReduce Types and Formats

• HBase architecture and concepts

Any one of the Four Projects

• Hadoop Use Cases in Insurance Domain

Hadoop Use Cases in Retail Domain

✓ Why Scala or python

✓ What is Apache Spark?

➢ Working with RDDs

➢ Writing Spark Applications

✓ Spark Applications vs. Spark Shell

➢ Caching and Persistence

➢ Improving Spark Performance

✓ Shared Variables: Broadcast Variables

➢ Spark API for different File Formats & Compression Codecs

✓ Spark Streaming Overview

Complementary Course: AWS

You might also like