0% found this document useful (0 votes)
255 views

Data Engineering Notes

The document provides an overview of a 20-day foundational training program covering topics related to business analytics, data engineering, cloud technologies, and programming. It discusses key concepts like data analysis processes, types of data analysts, requirements for becoming a data analyst, and characteristics of data pipelines. Example code is also provided to define a car class with attributes and methods in Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
255 views

Data Engineering Notes

The document provides an overview of a 20-day foundational training program covering topics related to business analytics, data engineering, cloud technologies, and programming. It discusses key concepts like data analysis processes, types of data analysts, requirements for becoming a data analyst, and characteristics of data pipelines. Example code is also provided to define a car class with attributes and methods in Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 4

###Day 1 of Foundational Training;###

11 Topics to learn in 20 Days:

1) business analytics
2) Real world Scenarios, Case studies
3) SQL/Databases/Data Modelling
4) Cloud Service providers
5) Data Warehouse, Data Lake and its Architechture
6) ETL (Extract Tranasfer Load) - Informatica, Data Quality
7) Data Engg - big Data, PySpark, Kafka
8) Cloud Engg. - AWS, Azure, GCB, Data Visualization
9) Power BI
10) Adv. Python Programming
11) Github repositories

Data Analyst: someone who collects/cleans and interprets Data in order to solve a
particular problem.
They can work in diff. industries like business, finance, justice,
science, medicine, govt., etc.

Data Analysis: getting insights into data. Steps include;

1. Identify
2. Collect
3. Clean
4. analyze
5. represent/interpret

Types of Data analysts;

1. business Analyst
2. Market Research Analyst
3. medical and healthcare analyst

Requirements for becoming a data analyst;

1. Database Technologies
2. Programming Languages (Python) (this step is cleaning)
3. Visualization
4. Statistical and Mathematical methods
5. Industry Knowledge
6. Problem solving

Business Analyst: someone who looks into Business insights.


his day to day roles are understanding strategies, goals and requirements for his
business,
creating financial models to handle business decisions, data
visualization/representation (charts, pies), identifying and prioritizing the
requirements,
large data sets analyze by Excel, Microsoft Power BI tools, SQL, Tableau, Python
(Jupyter notebooks), etc.

DATA ENGINEERING

This includes building systems for collecting, storing and analyzing the data.
Organizations have the abiltity to collect massive amounts of data
and they need the right people amd technology to ensure the data is in highly
usable state
by the time it reaches the data scientists and analysts.
Diff. between Data Scientists and analysts is that a scientist thinks of the future
probabilities/possiblities of data and etc.

Hence a data engineer works in a variety of settings that collects, stores,


manipulates data and converts this raw data into usable information.
Make data accessible so that organizations can use it to evaluate and optimize
their performances.

Tasks;

1. Acquiring Data
2. Develop Algorithms
3. Build, test, and maintain database pipeline arhitectures
4. compliance with data governance and security policies

Typical flow in analysis project;

Define goal --> get the data --> clean the data --> enrich the data --> find
insights and visualize --> deploy ML --> Iterate

BI tools - application software which collects and processes large amounts of


unstructured data from internal and external systems, e.g., Power BI

Data Warehousing - process for collecting and managing data from varied sources to
provide meaningful business insights.

Data Pipeline - a data pipeline essentially is the steps involved in aggregating,


organizing and moving data.
Modern data pipelines automate many of the manual steps involved in
transforming and optimizing continuous data loads.

their importance is;

- rely on real-time data analysis


- stores data in the cloud
- houses data in multiple resources

elements of a data pipeline;

1. source
2. processing steps
3. destination

characteristics when considering data pipeline;

1. continuous and extensible data processing


2. high availability and disaster recovery
3. the elasticty and agility of the cloud
4. self-service management

SNOWFLAKE is one of the important and popular Data pipelining services.

COMPUTATION COST IN CLOUD;

PAY AS YOU GO
5 Vs of Marketing;
1. Variety
2. velocity
3. volume
4. value
5. veracity

###Day 2 of FT###

login credentials for google rps cloud; 18

Car and specifcs; class - car, cub capacity, no. of seats= attributes
functions = fFw, rear sriving, parkinhg
classes - properties = Hyundai, toyota
model name

class Car:
def __init__(self, capacity, number_of_seats):
self.capacity = capacity
self.number_of_seats = number_of_seats
return 'the capcity is'
return 'the no. of seats are'

def fwd(self, drive_type1):


self.drive_type = drive_type1
return 'the car is a FW drive'

def rwd(self, drive_type2):


self.drive_type = drive_type2
return 'the car is an RW drive'

def parking(self, parking):


self.parking = parking

class Model_name(Car):
def Hyundai(self, name1):
self.name = name1

def Toyota(self, name2):


self.name = name2

Big Data: stands for 3Vs;

Veracity
Velocity
Volume

HDFS - Hadoop Distributor File System

Topics for Assessment:

1. Data Engineering, - done


2. Advanced Python, - done
3. Big Data, - done
4. Hadoop, - done
5. Spark, - done
6. PySpark, - done
7. Cloud Technology, - done
8. GCP, - done
9. AWS, - done
10. Azure - done

You might also like