A Generalized Lesson in ETL Architecture

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

Company A Generalized Lesson in

LOGO
ETL Architecture

Presented by Wes Dumey


Durable Impact Consulting, Inc.
June 11, 2007
Agenda

1.
1. ETL
ETLOverview
Overview

2.
2. When
When is
isETL
ETLAppropriate?
Appropriate?

3.
3. Tools
Toolsvs.
vs.Hard
HardCoding
Coding

4.
4. ETL
ETLArchitecture
Architecture
• ETL Overview
– 20 Minutes
• ETL Design Tips
– 20 Minutes
• Demonstration
– 20 Minutes

• Ask questions at any time


Speaker Biography
• Senior Consultant, Durable Impact Consulting,
Inc.
• Experience on high-performance data
warehouses
• Education
– B.S. in Computer Information Systems
• Missouri State University
– M.A. in Business Economics (in progress)
• University of South Florida
• External interests: aviation (Private pilot) and
economics
ETL Overview
• Extract Transform and Load is used to populate a data
warehouse
• Extract is where data is pulled from source systems
– SQL connect over networks
– Flat files
– Transaction messaging (MSMQ)
• Transformations can be the most complex part of data
warehousing
– Convert text to numbers
– Apply business logic in this stage
• Load is where data is loaded into the data warehouse
– Sequential or bulk loading
ETL?

• Many companies find value in the


graphical representation of data and use it
in other applications as well
• ETL is very efficient when designed
correctly
ETL Tools vs. Hard Coding

• Many shops still use hard code (triggers,


procedures, code blocks)
• Hard to maintain code
• Hard to scale properly
• ETL tools easy to visualize flows
• With SSIS, there is no good reason to not
use an ETL tool
What is going on here?
ETL Design Methodology

• Steps for successful ETL Design


1. Clear and concise requirements
2. Modularized design
3. Data cleansing capability
4. High Emphasis on Data Quality
5. Functional Testing
6. Sufficient Documentation
• See the methodology document
ETL Methodology Steps

1. Extract the data – pulls data


2. Load PSA and audit tables
3. Source Load Temp – sources and cleanses data
4. Lookup Dimensions – extract records for update
5. Lookup Facts
6. Transform Facts
7. Transform Dimensions
8. Quality Check
9. Load
Design Considerations
• Naming conventions and comments
• Standard approaches allow for:
– Quick, micro-batch processing (if desired)
– Ability to pause/resume, resurrection
• Data cleansing
• Legal requirements (HIPAA, SOX)
• Industry-standard best practices
• Data retention
– Archive vs Purge
• Quality
Demonstration
Let’s Get Started

• Gather Functional Requirements


• Build the Data Model
• Write Technical Specification
• Construct
• Test

Follows the systems development lifecycle


IECT
Closing Info
• Presenter Information
• Blog
– www.thedamndata.com “A techies’ discussion of databases,
datawarehouses, and the damn data itself”
– Covering SQL Server 2005, Oracle, IBM Websphere DataStage
ETL tool, SSIS, and whatever the hell else is on my mind
– Check it out – funny and hopefully informative
• Corporate Information
– www.durableimpact.com – Durable Impact Consulting
• Presenting finalized EDW at Tampa Code Camp
Cycle Diagram
Add Your Text

Text
Text

Text Cycle Name

Text

Text
Progress Diagram

Phase
Phase 11 Phase
Phase 22 Phase
Phase 33
Block Diagram

TEXT TEXT TEXT TEXT

TEXT TEXT TEXT TEXT


Table

TEXT TEXT TEXT TEXT TEXT

Title A

Title B

Title C

Title D

Title E

Title F
3-D Pie Chart

TEXT

TEXT
TEXT

TEXT

TEXT
TEXT
Marketing Diagram

Title

TEXT TEXT TEXT TEXT

You might also like