Etl Testing Real Time Interview Questions
Etl Testing Real Time Interview Questions
Etl Testing Real Time Interview Questions
15 answers
Can any one let me know where can I find the list of etl& reporting tolls available in
the market. Thanks in advance
Answered by: Michael on: Feb 27th, 2013
Add InetSofts Style Report to the list of BI reporting tools
Answered by: sajid on: Feb 22nd, 2013
Hi, i would like share my knowledge on this. ETL Tools: Ab Initio, BusinessObjects
Data Integrator, IBM InfoSphereDataStage, Informatica, Oracle Warehouse Builder,
SQL Server In...
Etl testing in Informatica
Asked By: phani.nekkkalapudi | Asked On: Nov 28th, 2011
2 answers
As an etl tester what are the things to test in Informatica tool. What are the types of
testing has to do in Informatica and the things to test in each type? If there is any
documents on etl testing
Answered by: lakshmi on: Jan 31st, 2013
ETL Testing in Informatica: 1. First check the workflow exist in specified folder. 2.
Run the workflow. If the workflow is success then check the target table is loaded on
proper data else we can nee...
Answered by: Lokesh M on: Feb 18th, 2012
- Test ETL software. - Test ETL datawarehouse components. - Executing backend
data-driven test. - Create, design and execute test plans, test harnesses and test
cases. - Identify, troubleshoot and pro...
What are the testings performed in etl testing?
Asked By: sindhugowda | Asked On: Jan 23rd, 2013
1 answer
Answered by: jatin on: Feb 7th, 2013
Data-Centric Testing: Data-centric testing revolves around testing quality of the
data. The objective of the data-centric testing is to ensure valid and correct data is
in the system. Following are th...
If LKP on target table is taken, can we update the rows without update
strategy transformation?
Yes, by using dynamic lookup
Update strategy transformation determines whether to insert, update, delete or
reject a record for the target. We can bypass update strategy transformation by
creating a router to divide rows based on insert, update, etc and connecting to one
of the multiple instances of target. In the session, for that target instance we can
check the appropriate box to mark records for insert or update or delete
If you even do dynamic lookup you should make use of update strategy to mark the
records either to insert or update in the target using look-up cache (look-up will be
cached on target).
your requirement is not so clear here whether to use dynamic lookup or session
properties.
Note:When you create a mapping with a Lookup transformation that uses a dynamic
lookup cache, you must use Update Strategy transformations to flag the rows for
the target tables.
Clean data before loading
Asked By: riyazz.shaik | Asked On: Sep 17th, 2008
3 answers
Why is it necessary to clean data before loading it into the warehouse
Answered by: Prashant Khare on: Oct 3rd, 2012
Data Cleansing is a process of detecting and correcting the corrupt and inaccurate
data from table or database.
There are following steps used:1) Data auditing
2) Workflow Specification
3) Workflow Execution
4) Post-processing and controlling
Informatica: if flat file name varies day by day ...
Asked By: Mahendra | Asked On: Nov 16th, 2011
5 answers
If we are using flat file in our loading, and flat file name change daily so how we
handle this without changing file name manually daily? for example: like file name is
changing depend on date so what should I do? Plshelp
Answered by: Charu on: Jan 24th, 2012
Use the indirect filelist option at informatica session. say the filelist name is
daily_file.txt put a shell script daily_filename.sh at the pre-session command. The
content of daily_filename.sh is ...
Answered by: SHIV on: Dec 28th, 2011
You can use Informatica File List option in target to get the dynamic file names along
with the transaction control transformation so that you can create dynamic
filenames based on some transaction properties.....
What are the different lookup methods used in Informatica?
Asked By: Interview Candidate | Asked On: Sep 9th, 2005
11 answers
Answered by: SHIV on: Dec 28th, 2011
The main difference between connected and unconnected lookup is, we can call
unconnected lookup based on some conditions but not the connected lookup.
Answered by: SreedharLokaray on: Nov 2nd, 2011
Lookup can be used as Connected or Unconnected. Apart from cache and receiving
input values from pipe line, there is one more difference. If you want to use the
same lookup more than one in a mapping ...
How do you implement the concept of reusability in microstrategy?( as in
Informatica it is done by creating mapplets.)
Asked By: arjunsarathy | Asked On: Jan 2nd, 2006
1 answer
Answered by: Doyel Ghosh Dutta on: Sep 16th, 2011
I think in microstrategy view can be reusable object.
How to check the existence of a lookup file in a graph ?
Asked By: debashish29 | Asked On: Sep 9th, 2011
How to check the existence of a lookup file in a graph ..The requriement is if lookup
file is present then some search will be carried out in the same else default value
will be set. Please note we need to check the existence of the lookup file in graph
level only..
Cache files are mainly used in Lookup transformations in Informatica 2 types Static
cache , Dynamic cache. Both are used by connected lookup Unconnected lookup
uses only static cache. If LookupSQL Override is used it uses Index cache and Data
Cache Index cache - Stores key columns ( i.e on which index present)Data Cache Stores output values. These 2 files sufficient memory allocation is important aspect
in case of lookup optimization. If the memory allocation is Auto, make sure that
Maximum Memory Allowed For Auto Memory Attributes , Maximum Percentage of
Total Memory Allowed For Auto Memory Attributes is defined properly.
Actually the answer is partially related to the Question.......
t/rs like Joiner, Lookup, Aggregator,Rank Transformations uses the Caches.....
Answer Question
Hi,There are two ways to do this,they are as follows:1)we can use a command task
anywhere in the workflow or worklet to run the shell commands.2)In the session
task,we can call reusable command ...
1
. What is Data warehouse?
In 1980, Bill Inmon known as father of data warehousing. "A Data warehouse is a
subject oriented, integrated ,time variant, non volatile collection of data in support
of management's decision making process".
Subject oriented : means that the data addresses a specific subject such
as sales, inventory etc.
Time variant : implies that the data is stored in such a way that when
some data is changed.
Non volatile : implies that data is never removed. i.e., historical data is
also kept.
In Data base we can maintain only current data which was not more than 3 years
But in datawarehouse we can maintain history data it means from the starting day
of enterprise DDL commands it means ( Insert ,update,delete)we can do in
Database In datawarehouse once data loaded in Datawarehouse we can do any
DDL operatations.
Database is used for insert, update and delete operation where asdatawarehouse is
Database
Data Warehouse
The tables and joins in DB are are complex Tables and joins are simple since they
since they are normalized
are de-normalized
ER Modeling techniques are used for
database design
Database uses OLTP concept Data warehouse uses OLAP concept, means Data
warehouse stores historical data.
A database is a collection related data and also it is related to same data. Where as
come to Data warehouse, It is collection of data integrated from different sources
and stored in one container for taking or ( getting knowledge) managerial
decisions.
In database we are using CRUD operations means create, read, use, delete but in
datawarehouse we are using select operation.
3
. What are the benefits of data warehousing?
4
. What are the types of data warehouse?
There are mainly three type of Data Warehouse are :
Data Mart
5
. What is the difference between data mining and data warehousing?
Data mining, the operational data is analyzed using statistical techniques and
clustering techniques to find the hidden patterns and trends. So, the data mines
do some kind of summarization of the data and can be used by data warehouses
for faster analytical processing for business intelligence.
Data warehouse may make use of a data mine for analytical processing of the
data in a faster way.
Q. 1) What is ETL?
Ans. ETL - extract, transform, and load. Extracting data from outside source
systems.Transforming raw data to make it fit for use by different departments.
Loading transformed data into target systems like data mart or data warehouse.
Q. 2) Why ETL testing is required?
Ans.
To verify the correctness of data transformation against the signed off business
requirements and rules.
To verify that expected data is loaded into data mart or data warehouse without
loss of any data.
To validate the accuracy of reconciliation reports (if any e.g. in case of comparison
of report of transactions made via bank ATM ATM report vs. Bank Account Report).
To make sure complete process meet performance and scalability requirements
Data security is also sometimes part of ETL testing
To evaluate the reporting efficiency
Q 3) What is Data warehouse?
Ans. Data warehouse is a database used for reporting and data analysis.
Production Reconciliation
IT Developer Productivity
Data Integrity
Generally, basic testing concepts remains same across all domains. So, the basic testing questions will also remain
same. The only addition would be some questions on domain. e.g. in case of ETL testing interview questions, it would
be some concepts of ETL, how tos on some specific type of checks / tests in SQL and some set of best practices.
Here is the list of some ETL testing interview questions:
Q. 1) What is ETL?
Ans. ETL - extract, transform, and load. Extracting data from outside source systems. Transforming raw data to make
it fit for use by different departments. Loading transformed data into target systems like data mart or data warehouse.
Q. 2) Why ETL testing is required?
Ans.
To verify the correctness of data transformation against the signed off business requirements and rules.
To verify that expected data is loaded into data mart or data warehouse without loss of any data.
To validate the accuracy of reconciliation reports (if any e.g. in case of comparison of report of transactions made
Each organization categorize testing types by their own way based on the testing practice or testing strategy build @
organization level. It holds true for ETL testing also. Sometimes, for larger projects / programs, it vary from client to
client. Generally, below are the main types of testing that are covered under ETL testing:
Reconciliation testing: Sometimes, it is also referred as Source to Target count testing. In this check,
matching of count of records is checked. Although this is not the best way, but in case of time crunch, it helps.
Constraint testing: Here test engineer, maps data from source to target and identify whether the data is
mapped or not. Following are the key checks: UNIQUE, NULL, NOT NULL, Primary Key, Foreign key, DEFAULT,
CHECK
Validation testing (source to target data): It is generally executed in mission critical or financial projects.
Here, test engineer, validates each data point and match source to target data.
Testing for duplicate check: It is done to ensure that there are no duplicate values for unique columns.
Duplicate data can arise due to any reason like missing primary key etc. Below is one example:
Testing for attribute check: To check if all attributes of source system are present in target table.
Logical or transformation testing: To test any logical gaps in the. Here, depending upon the scenario,
following methods can be used: boundary value analysis, equivalence partitioning, comparison testing, error guessing
or sometimes, graph based testing methods. It also covers testing for look-up conditions.
Incremental and historical data testing: Test to check the data integrity of old & new data with the addition
of new data. It also covers the validation of purging policy related scenarios.
GUI / navigation testing: To check the navigation or GUI aspects of the front end reports.
In case of ETL or data warehouse testing, re-testing or regression testing is also part of this effort.
Each company had different business structure and model, so they need to make a clear analyze
before choosing the ETL tool for their business. With the help of these Open source ETL tools,
business have the opportunity to try out the free software tools without any huge investments.
All the Software giants have introduced their own BI tools.
Some of the most used ETL tools are as follows:
Talend Open Studio,
Clover ETL,
Elixir,
Pentaho,
Informatica,
IBM Cognos Data Manager,
Oracle Data Integrator,
SAS Data Integration Studio,
AB Initio, SAP Business Objects Data Integrator.
When an ETL tool has been selected, the next logical step would be testing using these tools. Here,
the company will get to know if they are in the right path on the selection of the tool.
As these tools help in dealing with huge amount of Data and Historic Data, it is necessary to carry out
the ETL testing. To keep a check on the accuracy of the Data, ETL testing is very important.
There are two types of ETL testing available
Application Testing
Summary Report: Upon the completion of the test run, a brief summary report is prepared for
improvising and concluding.
To keep a check on the Data which are being transferred from one system to the other.
To keep a track on the efficiency and speed of the process.
To be well acquainted with the ETL process before it gets implemented into your business and
production.
Mapping Doc Validation Verifying if the ETL information is provided in the Mapping Doc.
Data Check Every aspect regarding the Data such as Data check, Number Check, Null check
are tested in this case
Correctness Issues Misspelled Data, Inaccurate data and null data are tested.
Answer:
It is a collection of integrated databases designed to support
operational monitoring. Unlike the OLTP databases, the data in the ODS are
integrated, subject oriented and enterprise wide data.
Question.18 What is Denormalization?
Answer: Denormalization means a table with multi duplicate key. The dimension
table follows Denormalization method with the technique of surrogate key.
Question.19 What is Surrogate Key?
Answer: A Surrogate Key is a sequence generated key which is assigned to be a
primary key in the system (table).[/signinlocker]