0% found this document useful (0 votes)
15 views43 pages

Intro to ETL

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 43

ETL Testing

• The data in a Data Warehouse system is loaded with an ETL


(Extract, Transform, Load) tool. As the name suggests, it
performs the following three operations:
• Extracts the data from your transactional system which can be
an Oracle, Microsoft, or any other relational database
• Transforms the data by performing data cleansing operations,
and then
• Loads the data into the OLAP data Warehouse. You can also
extract data from flat files like spreadsheets and CSV files using
an ETL tool and load it into an OLAP data warehouse for data
analysis and reporting. Let us take an example to understand it
better.
Difference between ETL and BI Tools
• An ETL tool is used to extract data from
different data sources, transform the data, and
load it into a DW system; however a BI tool is
used to generate interactive and ad-hoc
reports for end-users, dashboard for senior
management, data visualizations for monthly,
quarterly, and annual board meetings.

• The most common ETL tools include: SAP BO
Data Services (BODS), Informatica – Power
Center, Microsoft – SSIS, Oracle Data
Integrator ODI, Talend Open Studio, Clover ETL
Open source, etc.
• Some popular BI tools include: SAP Business
Objects, SAP Lumira, IBM Cognos, JasperSoft,
Microsoft BI Platform, Tableau, Oracle
Business Intelligence Enterprise Edition, etc.
ETL vs ELT
• ELT is another method of data integration, where
instead of transforming the data before loading, the
data is first copied to the target and then
transformed.
• The key differences between ETL and ELT are:
• While ETL stands for Extract, Transform, and Load, ELT
stands for Extract, Load, and Transformation.
• ETL loads data from the data source into the staging
server and thereafter into the target system. But with
ELT, data gets directly copied into the target system.
• ETL cleanses sensitive and secure data before loading
it into the data warehouse, thereby ensuring data
privacy and data compliance.
• ETL tool is usually used for data that is on-premises,
relational, and structured, while ELT tool is used for
scalable, cloud structured, as well as unstructured
data.
• It’s easy to implement ETL, whereas ELT requires
expert skills for implementation and maintenance.
• ETL can be more cost-effective compared to ELT
Other Data Integration Methods

• Besides ETL and ELT, some other data


integration methods include:
• Change Data Capture (CDC) – captures changed source data
only and moves that to the target system.
• Data Replication – replicates changes in data sources in real-
time or batch by batch to a central repository.
• Data Virtualization – makes use of software abstraction layer
to create an integrated view of data without actually loading
or copying source data.
• Stream Data Integration (SDI) – accepts data streams in real-
time, transforms, and loads them onto the target system.
How Does ETL Work?

• Step 1: Extract
• The extraction process involves copying or
exporting raw data from multiple locations
called source locations and storing them in a
staging location for further processing.
• Source locations can consist of any type of
data, including SQL or NSQL servers, flat files,
emails, logs, web pages, CRM, ERP systems,
spreadsheets, logs, etc.
• Common data extraction methods are:
• Partial extraction with update notification
• Partial extraction without update notification
• Full extraction
Step 2: Transform

• In the transformation stage of the ETL process,


data in the staging area is transformed
through the data processing phase to make it
suitable for use for analytics. Raw data is
converted to a consolidated, meaningful data
set.
• Several tasks are performed on the data like:
• Cleaning and Standardization
• Verification and Validation
• Filtering and Sorting
• De-duplication
• Data audits
• Calculations, Translations
• Formatting
• Data encryption, protection
Step 3: Load

• Loading the Data into a DW System It involves


loading the data into a DW system for
analytical reporting and information. The
target system can be a simple delimited flat
file or a data warehouse
How Does ETL Tool Function
Architecture of ETL
Benefits and Challenges of ETL (Extract, Transform, and Load)

• ETL process improves data quality as data is


cleansed before being loaded onto the final
repository for further analytics. An automated
data processing pipeline is provided to collect
and format data without having to pass on
data transformation tasks to other tools.
• However, ETL is a time-consuming batch
operation, which is recommended for building
smaller data repositories that do not need to
be updated frequently. Other data integration
tools like ELT, CDC, and data virtualization can
be appropriately used to integrate larger
volumes of data that require real-time
updating.
Why Is ETL Important?

• ETL helps ensure good data hygiene and improved data


analytics. Plus, ETL tools perform many important
business functions like:
• Reconciling different data formats to move data from
legacy systems to modern technology
• Syncing external data from vendors, customers,
suppliers
• Consolidating data from multiple overlapping systems
• Combining transactional data so that it can be
understood by users
ETL Tools

• Integrate.io
• Talend
• MarkLogic
• Oracle Data Integrator
• Amazon RedShift
• AWS Glue
• Matillion
• Azure Data Factory
• FlyData
ETL Challenges

You might also like