Technet Etl Design Questionnaire
Technet Etl Design Questionnaire
Technet Etl Design Questionnaire
Applies to:
Informatica PowerCenter
Summary
This article provides a questionnaire that can be useful when being involved in the design process of an
interface.
Author Bio
Author(s): Matthias Urech
Company: interface-development.com
Created on: January 8, 2009
Matthias Urech, founder and project leader of interface - development.com, has a proven track
record of applying data integration solutions for several companies across the industry. He
played a key role to implement projects in the area of data integration, data migration, data
consolidation and data warehousing.
Matthias contributed several articles about data integration best practices for Informatica
Technology Network, resulting in nomination as member of developer wall of fame and
receiving an award as contributor of the year in 2008.
Table of Contents
Introduction .........................................................................................................................................................3
ETL Design Questionnaire..................................................................................................................................4
Question 1: Who is involved? .........................................................................................................................4
Question 2: What is the scope of the interface? .............................................................................................5
Question 3: How is the data food chain? ........................................................................................................5
Question 4: How is the relationship between the systems? ...........................................................................6
Question 5: How is the data life cycle? ...........................................................................................................7
Question 6: How are changes captured? .......................................................................................................7
Question 7: What are the load frequencies? ..................................................................................................8
Question 8: What are the load types?.............................................................................................................8
Question 9: Which data volume is extracted? ................................................................................................9
Question 10: What is the strategy to load data?.............................................................................................9
Question 11: What is the load order? ...........................................................................................................10
Question 12: How is the interface data flow? ...............................................................................................10
Question 13: How is the target layout? .........................................................................................................11
Question 14: How is the logical mapping?....................................................................................................11
Question 15: Where are data quality issues addressed? .............................................................................12
Question 16: What are the operational actions?...........................................................................................12
Question 17: How is the scheduling configuration?......................................................................................13
Question 18: What level of documentation is provided?...............................................................................14
Disclaimer and Liability notice ..........................................................................................................................15
Introduction
Do not design a bridge by counting the number of people who swim across the river today. That’s also true
for ETL projects. Depending on your role in the ETL project, your work starts sometimes when the data flow
needs to be built. Fine, someone will tell you what you have to do. But sometimes you will be involved earlier
in the project to design the interface. As a matter of fact, you are faced then with the challenge of gathering
requirements. That’s where the ETL Design Questionnaire comes in place. Asking the right questions is not
only essential, but will put you also in a position of controlling the design process. As a side effect, you will be
recognized as a professional ETL developer that has a plan.
Sounds that the questionnaire is an exciting and useful tool to work with. But what makes the difference
compared to other methodologies or frameworks (i.e. Informatica Velocity)? Each of the provided questions
is supported by figures, tables or graphical elements (let’s call them diagrams). It is neither about creating
comprehensive documentation nor strictly answering all questions. Consider this set of questions rather as a
presentation of multiple views in order to get a common understanding about the interface for all involved
parties. The main goal is to gather as much information with a simple approach. The key word here is
“tailoring”. Decide yourself what you want to use!
Like everybody has a schema for getting rich that will not work, using the proposed questions in this article
will neither prevent you for going through the design process nor is the list of questions complete. At the end
of the day, your goal should be to provide the requirements for developing a working interface, respond to
changes, and have a good costumer communication and collaboration. I hope nonetheless that the provided
ETL Design Questionnaire will be useful to you and to your challenges.
These are just some of the tasks and the list is by far not complete. However, all those tasks have
to be done by someone. The objective of the role/task diagram is to define the involved people and
their responsibilities. In short: you can simple ask "who does what?". For example: subject matter
expert (who) provides business rules (what).
• Master / Slave
This is the most common relationship. Data will be maintained in system 1 and
provided to system 2.
Of course, the road still doesn’t end here since some systems are connected to more than one
(see data food chain diagram). In such case you should prioritize the data flow order and check if
the data food chain makes sense at all.
Please note that the given actions in system 1 are just examples. Some systems only allow to flag data
inactive instead of delete them.
that is most reliable. It’s essential that you only take an exact copy of the target system. Any other
attempt (e.g. copy of previous extraction) will have the risk to miss some data.
Choosing the approach that best fit to the environment is key. It’s important that you know all constraints
and consider them during development. Spending enough research time will prevent you from troubles
running the interface in production.
Determining the data volume may take some detective work. You might have to perform source
system analysis, study the history concept of the source system or simply talk to subject matter
experts in order to find out the criteria that are applied to the dataset.
Having a proper load strategy will help you in getting a clear understanding about what data the
interface will Extract, Transform and Load (ETL). In addition, the load strategy table also contains
the parameters that can be used in order to make the interface more flexible. This can reduce the
overhead of creating multiple interfaces when only certain attributes of an interface need to be
changed.
Moreover, a scheduling configuration table is also provided explaining in detail how the scheduling job is
setup and executed.