ETL Interview Questions

What is Database Testing?
Testing the backend databases like comparing the actual results with expected results.
Data base testing basically include the following.
1) Data validity testing.
2) Data Integrity testing
3) Performances related to database.
4) Testing of Procedure, triggers and functions.
For doing data validity testing you should be good in SQL queries
For data integrity testing you should know about referential integrity and different constraint.
For performance related things you should have idea about the table structure and design.
For testing Procedure triggers and functions you should be able to understand the same.
*******************************************************************************
Difference between Database Testing and Data Warehouse Testing?

There is a popular misunderstanding that database testing and data warehouse is similar while the fact is
that both hold different direction in testing.
Database testing is done using smaller scale of data normally with OLTP (Online transaction processing)
type of databases while data warehouse testing is done with large volume with data involving OLAP
(online analytical processing) databases.
In database testing normally data is consistently injected from uniform sources while in data warehouse
testing most of the data comes from different kind of data sources which are sequentially inconsistent.
We generally perform only CRUD (Create, read, update and delete) operation in database testing while in
data warehouse testing we use read-only (Select) operation.
Normalized databases are used in DB testing while demoralized DB is used in data warehouse testing.
There are number of universal verifications that have to be carried out for any kind of data warehouse
testing.
Below is the list of objects that are treated as essential for validation in ETL testing:
Verify that data transformation from source to destination works as expected
Verify that expected data is added in target system
Verify that all DB fields and field data is loaded without any truncation
Verify data checksum for record count match
Verify that for rejected data proper error logs are generated with all details
Verify NULL value fields
Verify that duplicate data is not loaded
Verify data integrity
***************************************************************************
What is ETL TESTING?
ETL basically stands for Extract Transform Load - which simply implies the process where you extract
data from Source Tables, transform them in to the desired format based on certain rules and finally load
them onto Target tables. There are numerous tools that help you with ETL process - Informatica,
Control-M being a few notable ones.
So ETL Testing implies - Testing this entire process using a tool or at table level with the help of test
cases and Rules Mapping document.
In ETL Testing, the following are validated 1) Data File loads from Source system on to Source Tables.
2) The ETL Job that is designed to extract data from Source tables and then move them to staging
tables. (Transform process)
3) Data validation within the Staging tables to check all Mapping Rules / Transformation Rules are
followed.
4) Data Validation within Target tables to ensure data is present in required format and there is no data
loss
from
Source
to
Target
tables.
Extract
In this step we extract data from different internal and external sources, structured and/or unstructured.
Plain queries are sent to the source systems, using native connections, message queuing, ODBC or OLEDB middleware. The data will be put in a so-called Staging Area (SA), usually with the same structure as
the source. In some cases we want only the data that is new or has been changed, the queries will only
return the changes. Some tools can do this automatically, providing a changed data capture (CDC)
mechanism.
Transform
Once the data is available in the Staging Area, it is all on one platform and one database. So we can
easily join and union tables, filter and sort the data using specific attributes, pivot to another structure
and make business calculations. In this step of the ETL process, we can check on data quality and cleans
the data if necessary. After having all the data prepared, we can choose to implement slowly changing
dimensions. In that case we want to keep track in our analysis and reports when attributes changes over
time, for example a customer moves from one region to another.
Load
Finally, data is loaded into a central warehouse, usually into fact and dimension tables. From there the
data can be combined, aggregated and loaded into datamarts or cubes as is deemed necessary.
*******************************************************************************
What are cubes?

Multi dimensional data is logically represented by Cubes in data warehousing. The dimension and the
data are represented by the edge and the body of the cube respectively. OLAP environments view the
data in the form of hierarchical cube. A cube typically includes the aggregations that are needed for
business intelligence queries.
*******************************************************************************
OLTP vs. OLAP

We can divide IT systems into transactional (OLTP) and analytical (OLAP). In general we can assume that
OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it.
- OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions
(INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing,
maintaining data integrity in multi-access environments and an effectiveness measured by number of
transactions per second. In OLTP database there is detailed and current data, and schema used to store
transactional databases is the entity model (usually 3NF).
- OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries
are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness
measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is
aggregated, historical data, stored in multi-dimensional schemas (usually star schema).
OLTP System
Online Transaction Processing
(Operational System)
OLAP System
Online Analytical Processing
(Data Warehouse)
Source of data
Operational data; OLTPs are the

original source of the data.
Consolidation data; OLAP data comes

from the various OLTP Databases
Purpose of data
To control and run fundamental

business tasks
To help with planning, problem solving,

and decision support
What the data
Reveals a snapshot of ongoing

business processes
Multi-dimensional views of various kinds

of business activities
Inserts and Updates
Short and fast inserts and updates

initiated by end users
Periodic long-running batch jobs refresh

the data
Queries
Relatively standardized and simple

queries Returning relatively few
records
Often complex queries involving

aggregations
Processing Speed
Space Requirements
Database Design
Typically very fast
Depends on the amount of data

involved; batch data refreshes and
complex queries may take many hours;
query speed can be improved by creating
indexes
Can be relatively small if historical

data is archived
Larger due to the existence of

aggregation structures and history data;
requires more indexes than OLTP
Typically de-normalized with fewer

Highly normalized with many tables tables; use of star and/or snowflake
schemas
Backup religiously; operational data

is critical to run the business, data
Backup and Recovery
loss is likely to entail significant
monetary loss and legal liability
Instead of regular backups, some

environments may consider simply
reloading the OLTP data as a recovery
method
*******************************************************************************
What is Business Intelligence?

Business intelligence, or BI for short, is an umbrella term that refers to competencies, processes,
technologies, applications and practices used to support evidence-based decision making in organizations.
In the widest sense it can be defined as a collection of approaches for gathering, storing, analyzing and
providing access to data that helps users to gain insights and make better fact-based business decisions.
BI used for?
Organizations use Business Intelligence to gain data-driven insights on anything related to business
performance. It is used to understand and improve performance and to cut costs and identify new
business opportunities, this can include, among many other things:
Analyzing customer behaviors, buying patterns and sales trends.

Measuring, tracking and predicting sales and financial performance
Budgeting and financial planning and forecasting
Tracking the performance of marketing campaigns
Optimizing processes and operational performance

Improving delivery and supply chain effectiveness
Web and e-commerce analytics
Customer relationship management
Risk analysis
Strategic value driver analysis
Basics of Business Intelligence

Gathering Data
Gathering data is concerned with collecting or accessing data which can then be used to inform decision
making. Gathering data can come in many formats and basically refers to the automated measurement
and collection of performance data. For example, these can come from transactional systems that keep
logs of past transactions, point-of-sale systems, web site software, production systems that measure and
track quality, etc. A major challenge of gathering data is making sure that the relevant data is collected in
the right way at the right time. If the data quality is not controlled at the data gathering stage then it can
harm the entire BI efforts that might follow always remember the old adage - garbage in garbage out
Storing Data
Storing Data is concerned with making sure the data is filed and stored in appropriate ways to ensure it
can be found and used for analysis and reporting. When storing data the same basic principles apply that
you would use to store physical goods say books in a library you are trying to find the most logical
structure that will allow you to easily find and use the data. The advantages of modern data-bases (often
called data warehouses because of the large volumes of data) is that they allow multi-dimensional
formats so you can store the same data under different categories also called data marts or datawarehouse access layers. Like in the physical world, good data storage starts with the needs and
requirements of the end users and a clear understanding of what they want to use the data for.
Analyzing Data
The next component of BI is analysing the data. Here we take the data that has been gathered and
inspect, transform or model it in order to gain new insights that will support our business decision
making. Data analysis comes in many different formats and approaches, both quantitative and
qualitative. Analysis techniques includes the use of statistical tools, data mining approaches as well as
visual analytics or even analysis of unstructured data such as text or pictures.
Providing Access
In order to support decision making the decision makers need to have access to the data. Access is
needed to perform analysis or to view the results of the analysis. The former is provided by the latest
software tools that allow end-users to perform data analysis while the latter is provided through
reporting,
dashboard
and
scorecard
applications.
*******************************************************************************
What is Metadata?
Metadata is defined as data that describes other data. Metadata can be divided into two main types:
structural and descriptive.
Structural metadata describes the design structure and their specifications. This type of metadata
describes the containers of data within a database.
Descriptive metadata describes instances of application data. This is the type of metadata that is
traditionally spoken of and described as data about the data.
A third type is sometime identified called Administrative metadata. Administrative metadata provides
information that helps to manage other information, such as when and how a resource was created, file
types and other technical information.
Metadata makes it easier to retrieve, use, or manage information resources by providing users with
information that adds context to the data theyre working with. Metadata can describe information at any
level of aggregation, including collections, single resources, or component part of a single resource.
Metadata can be embedded into a digital object or can be stored separately. Web pages contain
metadata called metatags.
Metadata at the most basic level is simply defined as data about data. An item of metadata describes
the specific characteristics about an individual data item. In the database realm, metadata is defined as,
data about data, through which the end-user data are integrated and managed. Metadata in a
database typically store the relationships that link up numerous pieces of data. Metadata names these
fields, describes the size of the fields, and may put restrictions on what can go in the field (for example,
numbers
only).
Therefore, metadata is information about how data is extracted, and how it may be transformed. It is
also about indexing and creating pointers into data. Database design is all about defining metadata
schemas. Meta data can be stored either internally, in the same file as the data, or externally, in a
separate area. If the data is stored internally, the metadata is together with the data, making it more
easily accessible to view or change. However, this method creates high redundancy. If metadata is stored
externally, the searches can become more efficient. There is no redundancy but getting to this metadata
may
be
a
little
more
technical.
All the metadata is stored in a data dictionary or a system catalog. The data dictionary is most typically
an external document that is created in a spreadsheet type of document that stores the conceptual
design ideas for the database schema. The data dictionary also contains the general format that the
data, and in effect the metadata, should be. Metadata is an essential aspect to database design, it allows
for increased processing power, due to the fact that it can help create pointers and indexes.
*******************************************************************************
How to Change Primary Index in Teradata?
If data exists or not in a table Primary Index once created in Teradata cannot be changed. We have to
drop the table and recreate the table with the column you want to be created as Primary Index
*******************************************************************************
Explain
To
the
Parallel
Shared
Process
everything
nothing
is
architecture?
to Share
nothing
Shared nothing architecture (SNA) is a distributed computing architecture which consists of multiple
nodes such that each node has its own private memory, disks and input/output devices independent of
any other node in the network. Each node is self sufficient and shares nothing across the network.
Therefore, there are no points of contention across the system and no scope for data sharing or system
resources. This type of architecture is highly scalable and has become quite popular especially in the
context
of
web
development.
For instance, Google has implemented an SNA which evidentially enables it to scale web applications
effectively by simply adding nodes in its network of servers without slowing down the system.
*******************************************************************************
Difference
between
SMP
and
MPP?
Symmetric Multiprocessing (SMP) is the processing of programs by multiple processors that share a
common operating system and memory. This SMP is also called as "Tightly Coupled Multiprocessing". A
Single copy of the Operating System is in charge for all the Processors Running in an SMP. This SMP
Methodology doesnt exceed more than 16 Processors. SMP is better than MMP systems when Online
Transaction Processing is Done, in which many users can access the same database to do a search with a
relatively simple set of common transactions. One main advantage of SMP is its ability to dynamically
balance the workload among computers ( As a result Serve more users at a faster rate )
Massively Parallel Processing (MPP)is the processing of programs by multiple processors that work on
different parts of the program and share different operating systems and memories. These Different
Processors which run communicate with each other through message interfaces. There are cases in
which there are upto 200 processors which run for a single application. An Interconnect arrangement of
data paths allows messages to be sent between different processors which run for a single application or
product. The Setup for MPP is more complicated than SMP. An Experienced Thought Process should to be
applied when u setup these MPP and one shold have a good in-depth knowledge to partition the
database among these processors and how to assign the work to these processors. An MPP system can
also be called as a loosely coupled system. An MPP is considered better than an SMP for applications that
allow a number of databases to be searched in parallel.
*******************************************************************************
How Many ways Teradata can be communicated?

Teradata System can be communicated in two ways
Channel Attached System [Mainframe, Unix] [Through Third Party]
Network Attached System [LAN]
*******************************************************************************
To Find How many AMP s are there in TERADATA Architecture?

SELECT HASHAMP()
*******************************************************************************
To Find AMP_NUMBER corresponding with data in table
SELECT HASHROW(EMPLOYEE_ID) AS HASH_ROW,

HASHBUCKET( HASHROW(EMPLOYEE_ID) ) AS HASH_BUCKET_NO,
HASHAMP(HASHBUCKET( HASHROW(EMPLOYEE_ID) )) AS HASH_AMP_NO
FROM SAMPLES.EMPLOYEES
ORDER BY HASH_AMP_NO
*******************************************************************************
How much space are my tables using?
SELECT tablename, sum(currentperm)/1024/1024 AS MB

FROM dbc.allspace
WHERE databasename='SAMPLES'
GROUP BY tablename
ORDER BY 2 DESC;
*******************************************************************************
How many Types of Indexes are there in Teradata?

Five Types of Indexes are Present in Teradata
1. PRIMARY INDEX
UNIQUE PRIMARY INDEX[UPI]
NON-UNIQUE PRIMARY INDEX[NUPI]
2. SECONDARY INDEX
UNIQUE SECONDARY INDEX[USI]
NON-UNIQUE SECONDARY INDEX[NUSI]
3. PARTITION PRIMARY INDEX
RANGE BASED [range_n]
CASE BASED[case_n]
4. HASH INDEX
5. JOIN INDEX
*******************************************************************************
Indexes in Teradata are created at which level?

Indexes in Teradata are created at column level. Indexes check the duplication based on that particular
column on which Index is created.
ROWID is generated based on the column on which Index is created.
INDEXES
SET / MULTISET
COLUMN LEVEL
ROW LEVEL
*******************************************************************************
What is the condition for a table to have data without redundancy?

Table
Should
PI + SET
UPI + MULTI SET
Both
the
be
created
with
the
below
combinations
COLUMN LEVEL+ ROW LEVEL

COLUMN LEVEL+ ROW LEVEL
above
conditions
check
the
duplication
of
*******************************************************************************
How Many types of Tables in Teradata?

PERMANENT TABLES
TWO TYPES OF PERMANENT TABLES
SET TABLES [CREATED AS DEFAULT] DOES NOT ACCEPT DUPLICATE RECORDS
MULTISET TABLES - ACCEPTS DUPLICATE RECORDS
TEMPORARY TABLES
THREE TYPES OF TEMPORARY TABLES
VOLTAILE TABLES
data.
GLOBAL TABLES
DERIVED TABLES
Permanent and Temporary Tables

Permanent storage of tables is necessary when different sessions and users must share table contents.
When tables are required for only a single session, we can request the system to create temporary tables.
Using this type of table, we can save query results for use in subsequent queries within the same session.
We can break down complex queries into smaller queries by storing results in a temporary table for use
during the same session. When the session ends, the system automatically drops the temporary table.
Global Tables
They exist only for the duration of the SQL session in which they are used.
The contents of these tables are private to the session, and System Automatically drops the table at the
end of that session.
System saves the Global Temporary Table Definition Permanently in the Data Dictionary.
The Saved Definition may be Shared by Multiple Users and Sessions with Each Session getting its Own
Instance of the Table.
Volatile Tables
If you need a temporary table for a single use only, you can define a volatile table.
The definition of a volatile table resides in memory (RAM) but does not survive across a system restart.
It improves performance even more than using global temporary tables because the system does not store
the definitions of volatile tables in the Data Dictionary.
Access-rights checking is not necessary because only the creator can access the volatile table.
Derived Tables
A special type of temporary table is the derived table. It is specified in SQL SELECT statement.
A Derived Table is Obtained from One or More Other Tables as the Result of a Sub-Query.
Scope of A Derived Table is only Visible to the Level of the SELECT statement calling the Sub-Query.
Using Derived Tables avoids having to use the CREATE and DROP TABLE Statements for Storing Retrieved
Information and Assists in Coding More Sophisticated, Complex Queries.
*******************************************************************************
DIFFERENCES BETWEEN Global Tables and Volatile Tables?

Global Temporary Tables
Syntax : Create Global Temporary
Table<TNAME>
(
)
on commit preserve rows;
ON COMMIT PRESERVE ROWS allows the data to
Volatile Temporary Tables

Syntax : Create Volatile Temporary
Table<TNAME>
(
)
on commit preserve rows;
ON COMMIT PRESERVE ROWS allows the data to
store in the Global table within the session. The

default statement is ON COMMIT DELETE ROWS,
which means the data is deleted when the query is
committed.
Table is Created in Temporary Space
Table Structure remains Constant. Data will be lost
[within the session login and logout]
Collect statistics can be applied

In a single session 2000 Global temporary table
can be materialized.
store in the Volatile table within the session. The

default statement is ON COMMIT DELETE ROWS,
which means the data is deleted when the query is
committed.
Table is Created in Spool Space (User Level)
Table Structure and Data will be lost [within the
session login and logout]
Structure will be created in user space
Data will created in spool space
No Collect statistics can be applied
In a single session 1000 Volatile tables can be
materialized.
*******************************************************************************
what are different types of spaces in teradata?

SYSTEM SPACES
USER SPACES
PERMANENT SPACE -- stores PERMANENT TABLES
SYSTEM TABLES
This is disk space used for storing user data rows in any tables located on the
database.
Both Users & databases can be given perm space.
This Space is not pre-allocated , it is used up when the data rows are stored on
disk.
TEMPORARY SPACE -- stores GLOBAL TABLES
It is allocated to any databases/users where Global temporary tables are created
and data is stored in them.
Unused perm space is available for TEMP space
SPOOL SPACE
stores VOLTAILE TABLES
-It is a temporary workspace which is used for processing Rows for given SQL
statements.
-Spool space is assigned only to users . -Once the SQL processing is complete the spool is freed and given to some other
query.
-Unused Perm space is automatically available for Spool .
*******************************************************************************
Types of ETL Bugs
1. User interface bugs/cosmetic bugs:Related to GUI of application

Navigation, spelling mistakes, font style, font size, colors, alignment.
2. BVA Related bug:Minimum and maximum values
3. ECP Related bug:Valid and invalid type
4. Input/output bugs:Valid values not accepted

Invalid values accepted
5. Calculation bugs:Mathematical errors

Final output is wrong
6. Load condition bugs:Does not allows multiple users

Does not allows customer expected load
7. Race condition bugs:System crash & hang

System cannot run client plat forms
8. Version control bugs:No logo matching

No version information available
This occurs usually in regression testing
9. H/W bugs:Device is not responding to the application
10. Source bugs:Mistakes in help documents
*******************************************************************************
Types of ETL Testing:-
1)
Constraint Testing:
In the phase of constraint testing, the test engineers identifies whether the data is mapped from source
to target or not.
The Test Engineer follows the below scenarios in ETL Testing process.
a)
NOT NULL
b)
UNIQUE
c)
Primary Key
d)
Foreign key
e)
Check
f)
Default
g)
NULL
2)
Source to Target Count Testing:
In the Source to Target data is matched or not. A Tester can check in this view whether it is ascending
order or descending order it doesnt matter .Only count is required for Tester.
Due to lack of time a tester can follow this type of Testing.
3)
Source to Target Data Validation Testing:

In this Testing, a tester can validate the each and every point of the source to target data.
Most of the financial projects, a tester can identify the decimal factors.
4)
Threshold/Data Integrated Testing:
In this Testing, the Ranges of the data, A test Engineer can usually identifies the population calculation
and share marketing and business finance analysis (quarterly, halferly, Yearly)
MIN
4
MAX
10
RANGE
6
5)
Field to Field Testing:
In the field to field testing, a test engineer can identify that how much space is occupied in the database.
The data is integrated in the table cum datatypes.
NOTE: To check the order of the columns and source column to target column.
6)
Duplicate
Check
Testing:
In this phase of ETL Testing, a Tester can face duplicate value very frequently so, at that time the tester
follows database queries why because huge amount of data is present in source and Target tables.
Select ENO, ENAME, SAL, COUNT (*) FROM EMP GROUP BY ENO, ENAME, SAL HAVING COUNT (*) >1;
1)
2)
Note:
There are no mistakes in Primary Key or no Primary Key is allotted then the duplicates may
arise.
Sometimes, a developer can do mistakes while transferring the data from source to target
at that time duplicates may arise.
3)
Due to Environment Mistakes also duplicates arise (Due to improper plugins in the tool).
7)
Error/Exception Logical Testing:

1)
Delimiter is available in Valid Tables
2)
Delimiter is not available in invalid tables(Exception Tables)
8)
Incremental and Historical Process Testing:
In the Incremental data, the historical data is not corrupted. When the historical data is corrupted then
this is the condition where bugs raise.
9)
Control Columns and Defect Values Testing:
This is introduced by IBM
10) Navigation Testing:
Navigation Testing is the End user point of view testing. An end user cannot follow the friendly of the
application that navigation is called as bad or poor Navigation.
At the time of Testing, A tester can identify this type of navigation scenarios to avoid
unnecessary navigation.
11) Initialization testing:

A combination of hardware and software installed in platform is called the Initialization Testing
12) Transformation Testing:

At the time of mapping from source table to target table, Transformation is not in mapping condition,
then the Test Engineer raises bugs.
13) Regression Testing:

Code modification to fix a bug or to implement a new functionality which makes us to to find errors.
These introduced errors are called regression . Identifying for regression effect is called regression
testing.
14) Retesting:
Re executing the failed test cases after fixing the bug.
15)
System Integration Testing:
Integration testing: After the completion of programming process . Developer can integrate the modules
there are 3 models
a)
Top Down
b)
Bottom Up
c)
Hybrid
*******************************************************************************
What
is
secondary
index?
Whats
are
its
uses?
A secondary index is an alternate path to the data. Secondary indexes are used to improve performance
by allowing the user to avoid scanning the entire table during a query. A secondary index is like a primary
index in that it allows the user to locate rows. Unlike a primary index, it has no influence on the way rows
are distributed among amps. Secondary indexes are optional and can be created and dropped
dynamically. Secondary indexes require separate subtables which require extra i/o to maintain the
indexes.
Comparing to primary indexes, secondary indexes allow access to information in a table by alternate,
less frequently used paths. Teradata automatically creates a secondary index subtable. The subtable will
contain:
Secondary index value
Secondary index row id
Primary index row id
When a user writes an sql query that has a si in the where clause, the parsing engine will hash
the secondary index value. The output is the row hash of the si. The pe creates a request containing the
row hash and gives the request to the message passing layer (which includes the bynet software and
network). The message passing layer uses a portion of the row hash to point to a bucket in the hash
map. That bucket contains an amp number to which the pe's request will be sent. The amp gets the
request and accesses the secondary index subtable pertaining to the requested si information. The amp
will check to see if the row hash exists in the subtable and double check the subtable row with the actual
secondary index value. Then, the amp will create a request containing the primary index row id and send
it back to the message passing layer. This request is directed to the amp with the base table row, and
the amp easily retrieves the data row.
Secondary indexes can be useful for :
Satisfying complex condition

Processing aggregates
Value comparison
Matching character combination
Joining tables
*********************************************************************
What are the different types of locks in teradata?

Locking prevents multiple users who are trying to change the same data at the same time from
violating the data's integrity. Locks are automatically acquired during the processing of a request and
released at the termination of the request.
There are four types of locks:
Exclusive lock:exclusive locks are only applied to databases or tables, never to
rows.
They are the most restrictive type of lock; all other users are locked out.
Exclusive locks are used rarely, most often when structural changes are being made to the database.
Read lock:read locks are used to ensure consistency during read operations. Several users may
hold concurrent read locks on the same data, during which no modification of the data is permitted.
Write lock:write locks enable users to modify data while locking out all other users except
readers not concerned about data consistency (access lock readers). Until a write lock is released, no
new read or write locks are allowed.
Access lock:access locks can be specified by users who are not concerned about data
consistency. The use of an access lock allows for reading data while modifications are in process. Access
locks are designed for decision support on large tables that are updated only by small single row
changes. Access locks are sometimes called stale read locks, i.e. You may get stale data that hasnt
been updated.
Locks may be applied at three levels:
Database applies to all tables/views in the database
Table/view applies to all rows in the table/views
Row hash applies to all rows with same row hash
Lock types are automatically applied based on the sql command:
Select applies a read lock
Update applies a write lock
Create table applies an exclusive lock
********************************************************************

ETL Interview Questions

Uploaded by

Copyright:

Available Formats

ETL Interview Questions

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ETL Interview Questions

Uploaded by

Copyright:

Available Formats

What is Database Testing?

Difference between Database Testing and Data Warehouse Testing?

What is ETL TESTING?

What are cubes?

OLTP vs. OLAP

Operational data; OLTPs are the

Consolidation data; OLAP data comes

To control and run fundamental

To help with planning, problem solving,

What the data

Reveals a snapshot of ongoing

Multi-dimensional views of various kinds

Inserts and Updates

Short and fast inserts and updates

Periodic long-running batch jobs refresh

Relatively standardized and simple

Often complex queries involving

Typically very fast

Depends on the amount of data

Can be relatively small if historical

Larger due to the existence of

Typically de-normalized with fewer

Backup religiously; operational data

Instead of regular backups, some

What is Business Intelligence?

Analyzing customer behaviors, buying patterns and sales trends.

Optimizing processes and operational performance

Basics of Business Intelligence

How to Change Primary Index in Teradata?

How Many ways Teradata can be communicated?

To Find How many AMP s are there in TERADATA Architecture?

To Find AMP_NUMBER corresponding with data in table

SELECT HASHROW(EMPLOYEE_ID) AS HASH_ROW,

How much space are my tables using?

SELECT tablename, sum(currentperm)/1024/1024 AS MB

How many Types of Indexes are there in Teradata?

Indexes in Teradata are created at which level?

ROWID is generated based on the column on which Index is created.

What is the condition for a table to have data without redundancy?

COLUMN LEVEL+ ROW LEVEL

How Many types of Tables in Teradata?

Permanent and Temporary Tables

DIFFERENCES BETWEEN Global Tables and Volatile Tables?

Volatile Temporary Tables

store in the Global table within the session. The

Collect statistics can be applied

store in the Volatile table within the session. The

what are different types of spaces in teradata?

stores VOLTAILE TABLES

Types of ETL Bugs

1. User interface bugs/cosmetic bugs:Related to GUI of application

3. ECP Related bug:Valid and invalid type

4. Input/output bugs:Valid values not accepted

5. Calculation bugs:Mathematical errors

6. Load condition bugs:Does not allows multiple users

7. Race condition bugs:System crash & hang

8. Version control bugs:No logo matching

9. H/W bugs:Device is not responding to the application

10. Source bugs:Mistakes in help documents