Volume
Big Data Velocity
Variety
Unstructured Data/Images/Videos/Text/Tables/
Hard disk
Traditional Storage Platform Computer
Data Lake
Mobile
Google Drive
Cloud Storage Platforms
S3 Buckets
Horizontal Increase in number of computers, processing power increases
Scalability
Vertical Increase in Processing Power of your current computer
Traditional Data Warehouse
MySQL
SQL Microsoft SQL
PL-SQL
OLAP Online Analytical Processing Column-Major Format MySQL
Type of Activities
OLTP Online Transactional Processing Row-Major Format MySQL
Star Schema
SnowFlake Schema Data Model
Ways to define the schema Data Model(OLAP)
Structured Data(RDBMS) Normalization
Denormalization
Data Engineering
Ways to define Schema No need to follow any Data Model in OLTP(RDBMS)
You must have Schema Design
How To Store the Data? ER Diagrams - We Design Schema
Schema on Write
Snowflake
Google Big Query
Data Warehouse
Cloud Based Data Warehouse Redshift
Hive - Hadoop based SQL Platform
Databricks Spark SQL
Declarative You define the output and SQL going to take care of the steps
Types of Programming
Imperative You define the steps in python/programming so we are able to get the output
Mongo DB
No Schema Database Cassandra
Structured Data + Semi Structured + Unstructured Data NoSQL Database HBAse
Schema on Read
1. [E]We Extract The From the Source Website/Excel/API
Data Pipeline 2. [T]We Transform the Data Python/alteryx/Power Query
3. [L]We Load the Data SQL/Warehouse
1. [E]We Extract The From the Source Website/Excel/API
2. [L]We Load the Data Data Lake/S3/HardDisk
ELT Pipeline
3. [T]We Transform the Data Python/alteryx/Power Query
4. [L]We Load the Data SQL/Warehouse