Data Engineering Notes
Data Engineering Notes
1) business analytics
2) Real world Scenarios, Case studies
3) SQL/Databases/Data Modelling
4) Cloud Service providers
5) Data Warehouse, Data Lake and its Architechture
6) ETL (Extract Tranasfer Load) - Informatica, Data Quality
7) Data Engg - big Data, PySpark, Kafka
8) Cloud Engg. - AWS, Azure, GCB, Data Visualization
9) Power BI
10) Adv. Python Programming
11) Github repositories
Data Analyst: someone who collects/cleans and interprets Data in order to solve a
particular problem.
They can work in diff. industries like business, finance, justice,
science, medicine, govt., etc.
1. Identify
2. Collect
3. Clean
4. analyze
5. represent/interpret
1. business Analyst
2. Market Research Analyst
3. medical and healthcare analyst
1. Database Technologies
2. Programming Languages (Python) (this step is cleaning)
3. Visualization
4. Statistical and Mathematical methods
5. Industry Knowledge
6. Problem solving
DATA ENGINEERING
This includes building systems for collecting, storing and analyzing the data.
Organizations have the abiltity to collect massive amounts of data
and they need the right people amd technology to ensure the data is in highly
usable state
by the time it reaches the data scientists and analysts.
Diff. between Data Scientists and analysts is that a scientist thinks of the future
probabilities/possiblities of data and etc.
Tasks;
1. Acquiring Data
2. Develop Algorithms
3. Build, test, and maintain database pipeline arhitectures
4. compliance with data governance and security policies
Define goal --> get the data --> clean the data --> enrich the data --> find
insights and visualize --> deploy ML --> Iterate
Data Warehousing - process for collecting and managing data from varied sources to
provide meaningful business insights.
1. source
2. processing steps
3. destination
PAY AS YOU GO
5 Vs of Marketing;
1. Variety
2. velocity
3. volume
4. value
5. veracity
###Day 2 of FT###
Car and specifcs; class - car, cub capacity, no. of seats= attributes
functions = fFw, rear sriving, parkinhg
classes - properties = Hyundai, toyota
model name
class Car:
def __init__(self, capacity, number_of_seats):
self.capacity = capacity
self.number_of_seats = number_of_seats
return 'the capcity is'
return 'the no. of seats are'
class Model_name(Car):
def Hyundai(self, name1):
self.name = name1
Veracity
Velocity
Volume