0% found this document useful (0 votes)
14 views

How To Build Data Pipelines On AWS - Reference Workflow

This document provides a 7 step process for building a data pipeline in AWS using various AWS services: 1. Use AWS Database Migration Service to load data into AWS from other databases or systems. 2. Store the data in Amazon S3 for scalability and low costs. 3. Use AWS Glue Data Catalog to organize the data into tables and provide schema on read. 4. Query the data tables using Amazon Athena for interactive analysis. 5. Process and analyze the data using Amazon EMR and big data frameworks. 6. Automate the workflow using Amazon MWAA which manages Apache Airflow workflows. 7. Build visualizations of metrics and dashboards using Amazon QuickSight for business

Uploaded by

lilou_ss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

How To Build Data Pipelines On AWS - Reference Workflow

This document provides a 7 step process for building a data pipeline in AWS using various AWS services: 1. Use AWS Database Migration Service to load data into AWS from other databases or systems. 2. Store the data in Amazon S3 for scalability and low costs. 3. Use AWS Glue Data Catalog to organize the data into tables and provide schema on read. 4. Query the data tables using Amazon Athena for interactive analysis. 5. Process and analyze the data using Amazon EMR and big data frameworks. 6. Automate the workflow using Amazon MWAA which manages Apache Airflow workflows. 7. Build visualizations of metrics and dashboards using Amazon QuickSight for business

Uploaded by

lilou_ss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Kamran Ali

@aTechGuide

How to build
Data Pipelines
in AWS
using Let’s look into
7 Simple
services
[2023 Edition]

atech.guide
Buckle Up,
to build end to end
Data Pipeline

atech.guide
First Question …

atech.guide
How to Load Data in
AWS?

atech.guide
1. AWS Database Migration Service

1. It can be used for Snapshot


Import of Data or Setting Up
Continuous Replication

2. Support both Homogeneous


and Heterogeneous Data
Migration

3. It o ers features such as


Schema Conversion, Data
Validation, error Handling etc

atech.guide
ff
We have a
mechanism to Pull
Data.

But …

atech.guide
Where to Save
data?

atech.guide
2. Amazon S3

1. S3 stands for Simple Storage


Service

2. It is an Distributed Object
store

3. It is highly available, Scalable


and Secure.

4. We can build Data lake, do


data back ups at Low cost

atech.guide
Can we build Tables
on top of Raw Data?

atech.guide
A Schema on Read
will be helpful.

How to do that?

atech.guide
3. AWS Glue Data Catalog

1. Glue Catalogue is scalable


Collection of Tables organised
into databases

2. Its uniform repository where


disparate systems can store
and nd metadata to keep
track of data in data silos

atech.guide
fi
How to Query the
Tables?

atech.guide
4. Amazon Athena

1. Amazon Athena is an
Interactive Query service to
analyse data in S3

2. We can use standard SQL to


run ad-hoc queries and get
results in seconds.

atech.guide
Now, we need to

1. Clean the data

2. Join it with other tables

3. Aggregate data

4. …

atech.guide
How to do
compute?

atech.guide
5. Amazon EMR

1. Amazon EMR is a Managed


Cluster platform that simpli es
running big data frameworks,
such as Apache Hadoop and
Apache Spark etc.

2. We can use above frameworks


and other open-source
projects (e.g. Zeppelin), to
process data for analytics
purposes and business
intelligence workloads.

atech.guide
fi
We have all the
pieces but how to
run everything
Automatically and in
Sequence

atech.guide
How to Automate
the Work ow?

atech.guide
fl
6. Amazon MWAA

1. Amazon MWAA stands for


Managed Work ows for
Apache Air ow

3. It is a managed service for


Apache Air ow that we can
use to build and manage our
work ows in the cloud

atech.guide
fl
fl
fl
fl
Also, we need to
build Graph on top
of aggregated
Metrics
And
Dashboard
containing those
Graphs

atech.guide
How to build
visualisations?

atech.guide
7. Amazon QuickSight

1. It is a fast business analytics


service to build visualisations

2. It can discover AWS data


sources and enables
organisations to scale to
hundreds of thousands of
users

3. It uses Super-fast, Parallel, In-


Memory, Calculation Engine
(SPICE) to deliver fast and
responsive query performance

atech.guide
We have an End to
End Pipeline 🥳

atech.guide
Over to You,

If we are using
Tableau, How shall
our pipeline
change?

Let me know in the


Comments

atech.guide
Want to be a part of
Community?

Join the Discord

https://discord.gg/Pc9ed8krYK

atech.guide
Kamran Ali
@aTechGuide

That’s a Wrap

1. Follow Me on LinkedIn for more


valuable content - I’m Kamran Ali

2. Turn on the bell Noti cation in


my pro le

atech.guide
fi
fi

You might also like