Informatica Advanced Training

Informatica PowerCenter 7 Advance Training
Course Objectives
PowerCenter/PowerMart Architecture PowerCenter Advanced Transformations Informatica Performance Tuning Informatica Templates
PowerCenter Architecture
This section will cover Informatica Architecture & Connectivity - 6.x and 7.x Informatica Architecture & Connectivity 5.x
Architectural Differences
Informatica Architecture & Connectivity 6.x and 7.x

native
RDBMS
Server
native ODBC
Targets
ODBC
TCP/IP
Heterogeneous Sources
Repository Server
Heterogeneous Targets
TCP/IP
Repository Agent
native
Repository Designer Workflow Workflow Rep Server Manager Manager Monitor Administration Console
Repository
Informatica Architecture 5.x

Designer Server Manager Repository Manager
Sources 1-n
Targets 1-n
Repository
Server
6
Informatica Connectivity 5.x

Network Protocol Native Drivers ODBC
Source(s) Target(s)
Native / ODBC
Client Repository Manager
Native / ODBC
Designer Repository Server Manager ODBC Native
Network Protocol
Server
7
Architectural Differences
Sr. No. 1 Informatica 5.x Only one server component - Informatica server which is ETL engine of Informatica suite Informatica 6.x and 7.x versions Two server components - Repository Server (for managing communication of different Informatica client and server tools with Metadata repository) and Informatica Server (ETL engine of Informatica suite) All client tools and Informatica server communicate with Repository server over TCP/IP. Repository server connects to Metadata repository using Repository Agent processes which use native connectivity for accessing metadata Workflows are executable components Workflow Manager tool used for registering Informatica Server with repository, creation of tasks, worklets and workflows and scheduling of workflows. Workflow Monitor used for load-monitoring Versioning is in-built in the suite. In addition, there are built-in configuration management tools in order to migrate components between environments such as Dev, Test and Prod supporting a lights out migration Repository Server Administrator Console is a new client tool for maintaining and managing repository PMREPAGENT Command-line utility now available for Repository maintenance 8 PMREP used for User Management
Native or ODBC connectivity used for communication between Metadata repository and client tools and Informatica server Sessions are executable components Server Manager client tool used for registering Informatica Server with repository, creation of sessions and batches and scheduling the same as well as load-monitoring Third-party tools required for versions of Informatica components managing
3 4
Repository Manager used for all maintenance tasks like Backup and Restore, copy, creating repository contents, etc PMREP Command-line utility used for all User management as well as Repository Maintenance
PowerCenter Advanced Transformations
This section will cover Transformation Concepts Recap Stored Procedure Transformation (Lab 1)
Filter Aggregator
Pivoting Data (Rows to columns) (Lab 2)
Incremental Aggregation (Lab 3,4) Implementation using mapping variables and Aggregator
Implementation using SP and Aggregator Implementation using Filter and Aggregator
Joiner
Self Join (Lab 5) Joiner to emulate Lookup Contd
10
Normalizer
Converting Columns to Rows (Lab 6) Sourcing data from VSAM files
Understanding Transaction Control Transaction Control (Lab 7)

Transaction Control to restart sessions
Lookup
Dynamic Lookup Unconnected Lookup
Union Transformation (Lab 8)
11
What is a transformation?
A transformation is repository object that generates, modifies and/or passes data to other repository object
12
Transformation Types
Active Transformation The number of rows that pass through this transformation change, i.e. Number of rows in the transformation may or may not be equal to number of rows out of the transformation Example Number of rows coming in Filter transformation may be 10. But depending on condition, number of rows out of Filter can be anywhere between Zero and 10
Passive Transformation
The number of rows into the transformation is always equal to number of rows out of the transformation Example Number of rows into Expression is always equal to number of rows out of the transformation
13
Stored Procedure Transformation

A stored procedure is a precompiled collection of Transact-SQL, PL/SQL or other database procedural and flow control statements. To call this from within Informatica, Stored Procedure Transformation is used This passive transformation can be connected or unconnected The stored procedure transformation can be scheduled to run Normal Runs like any other inline transformation Pre-load of the Source Runs before session retrieves data from the Source Post-load of the Source Runs after session retrieves data from the Source Pre-load of the Target Runs before session sends data to the Target Contd
14

Post-load of the Target Runs after session sends data to the target
Informatica supports following databases for Stored Procedure Transformation:

Oracle Informix Sybase SQL Server Microsoft SQL Server IBM DB2
Teradata
Contd
15

Stored Procedure Transformation Ports
Input/Output Parameters send and receive input and output parameters using ports, variables or values hard-coded in expression Return Values Status Codes The stored procedure issues a status code that notifies whether or not it completed successfully. PowerCenter server uses these to determine weather to continue running the session or stop it
Stored Procedure Transformation Properties

Stored Procedure Name Name of the procedure/function as in the database Connection Information Database containing the stored procedure. Default value is $Target Call Text This is used only when transformation type is not Normal. Input parameters passed to SP can be included within the call text
16
17
Filter - Basics
Filter transformation allows to filter rows in a mapping It is active transformation and is always connected
All ports in Filter transformation are input/output ports

Rows from multiple transformations cannot be concatenated into Filter Transformation. Thus input ports for Filter must come from a single transformation
Any expression that returns Boolean value can be used as a filter condition
Filter condition is case sensitive
18
Filter Performance Tips

Place the filter as close as possible to the Source to maximize session performance
19
Aggregator - Basics
Aggregator transformation performs aggregate calculations like sum, max, min, count on groups of data
Aggregate transformation uses Informatica Cache to perform these aggregations

Typically Aggregator has Aggregate expressions in output port
Group By port to create groups. The port can be I, I/O, V or O port

Aggregator is Active transformation and is always connected
20
Aggregator Cache
21
Aggregator Performance Tips

Connect only necessary input/output ports to subsequent transformations, reducing the size of data cache
Pass data to Aggregator sorted by Group By ports, in ascending or descending order

Use Incremental Aggregation if changes in Source are less than half the target
22
Aggregator Pivoting Data (Rows to Columns)

In many scenarios, source data rows need to be converted to columns
Example: Source contains two fields, field_name and field_value. These are to be converted into a target table which has column names as given by list of field_names in Source
Aggregator transformation and FIRST function can be used for converting rows into columns Elaborating above example, target field names are First_Name, Last_Name, Middle_Name and Salutation o_First_Name = FIRST(field_value, field_name = FIRSTNAME)
1
Contd
23
Aggregator Pivoting Data (Rows to Columns)
24
Incremental Aggregation
When to use Incremental Aggregation? New source data can be captured Incremental changes do not significantly change the target When aggregation does not contain Percentile or median 2 functions When mapping includes aggregator with Transaction Transformation scope How to capture Incremental Data? Using Filter and Variables
1
Using Stored Procedure
25
Session failure scenarios when using Incremental Aggregation When PowerCenter Server data movement mode is changed from ASCII to Unicode or vise-versa When PowerCenter Server code page is changed to incompatible code page When session sort order is changed for PowerCenter server in Unicode mode When Enable High Precision option is changed in a session To avoid session failures for above, reinitialize the cache by deleting the incremental aggregation files
26
Configuring Session Set location of cache files. By default index and data files are created in $PMCacheDir Cache filenames can be written to session log by setting tracing level to Verbose Init. In session log, path after TE_7034 gives the location of index file and path after TE_7035 gives the location of data file Incremental Aggregation Session Properties can be set in Session -> Properties -> Performance
27
Incremental Aggregation Using Mapping Variable
Mapping Variable
28
Incremental Aggregation Using Filter
29
Joiner - Basics
Joiner Transformation joins two sources with at least one matching port Joiner requires inputs from two separate pipelines or two branches from one pipeline Joiner is an Active transformation and is always connected Joiner transformation cannot be used in following cases: Either input pipeline contains an Update Strategy A Sequence Generator transformation is directly connected before a Joiner transformation Joiner conditions for string comparison are case sensitive Join types supported are Normal, Master Outer, Detail Outer and Full Outer
30
Joiner - Cache
31
Joiner Debugging Common Errors

"Assertion/Unexpected condition at file tengine.cpp line 4221" running session with multiple Joiners Product : PowerMart Problem Description: When running a session with multiple Joiner Transformations, the following error occurs: On NT: MAPPING> CMN_1141 ERROR: Unexpected condition at file:[d:\cheetah\builder\powrmart\server\dmapper\trans\tengine.cpp] line:[4221]. Application terminating. Contact Informatica Technical Support for assistance. On Unix: Session fails with no error and the following assertion appears on the standard output (stdout) of the pmserver machine. Assertion failed: ((m_teFlags) & (0x00000002)), file /export/home1/build50/pm50n/server/dmapper/trans/tengine.cpp, line 4221
32
Joiner Debugging Common Errors

Solution: This is because of an invalid master/detail relationship for one or more of the Joiner Transformations. Informatica reads all the master rows before it starts reading the detail. So if there are multiple joiners in the mapping, make sure the master/detail relationships are defined in such a way that the server will be able to define the DSQ order.
33
Self Join
Scenario of self join is encountered when some aggregated information is required along with Transaction data in target table
Using Joiner transformation, this can be achieved in Informatica PowerCenter version 7.x
This feature was not available in earlier Informatica versions
34
Self Join
Aggregator and sorter outputs combined in Joiner transformation
35
Joiner to emulate Lookup

A joiner can be used to emulate the lookup, but with certain limitations: - Only equi- conditions are supported. - Duplicate matching is not supported. - Joiner will always cache, Non-Cached is not an option. Here are the settings that need to be made on the Joiner for it to emulated the lookup: 1. The table that is used for the lookup must be brought into the mapping as a source. 2. The "lookup" table should be joined to the other table with an outer join. The outer join must be based on the "lookup" table, i.e. if it is the Detail table then it should be a Detail Outer Join, if it is the Master table then it should be a Master Outer Join.
36

3. The Condition must set in the same way that it would have been in the Lookup. Note: Only equi- conditions are allowed. 4. The lookup values must be mapped downstream from the Joiner just as it would be with a Lookup Transformation.
37
38
Normalizer - Basics
Normalizer Transformation is used to organize the data in a mapping COBOL Sources For COBOL sources, Normalizer acts as a Source Qualifier These sources are often stored in normalized form, where OCCUR statement nests multiple records in single record For relational sources, Normalizer can be used to create multiple rows from a single row of data Normalizer is an Active transformation and is always connected
39
Normalizer Ports
When using normalizer, two keys are generated Generated Key One port for each REDEFINES clause Naming convention - GK_<redefined_field_name> At the end of each session the Power Center updates the GK value to the last value generated for the session plus one Generated Column ID One port for each OCCURS clause Naming convention - GCID_<occurring_field_name>
Note: These are default ports created by Normalizer and cannot be deleted
40
Normalizer Debugging Common Errors

A mapping that contains a Normalizer transformation fails to initialize with the following error: "TE_11017: Cannot match OFOSid with IFOTid"? Sol: This error will occur when the inputs to the Normalizer and outputs from it do not match. Resolutions: 1. Make sure there are no unconnected input ports to the Normalizer transformation. 2. If the Normalizer has an Occurs in it, make sure number of input ports matches the number of Occurs. Note: Second case is more common in mappings where Normalizer is not a Source Qualifier where definition is somehow changed by hand or damaged. To fix this, drop and then recreate the Normalizer transformation.
41

A session with a COBOL source crashes with the following error: WRITER_1_*_1> WRT_8167 Start loading table *********** FATAL ERROR : Caught a fatal signal/exception *********** *********** FATAL ERROR : Aborting the DTM process due to fatal signal/exception. *********** *********** FATAL ERROR : Signal Received: SIGSEGV (11) Sol: This error will occur when the normalizer output ports are connected to two downstream transformations. Contd
42

Resolution:
1. Add an Expression Transformation right after the Normalizer

2. Connect all the ports from the Normalizer to the Expression Transformation 3. Connect this Expression Transformation to the existing downstream transformations as needed
43
Normalizer Converting Columns to Rows
Set Occurrence Change Level to 2
44
Normalizer VSAM Sources
Built-in seq
45
Understanding Transaction Control

A transaction is a set of rows bound by COMMIT or ROLLBACK, i.e. the transaction boundaries
Some rows may not be bound by Transaction boundaries and is an open transaction
Transaction boundaries originate from transaction control points Transaction Control Point is a transformation that defines or 1 redefines the transaction boundary
46
Transformation Scope
PowerCenter server can be configured on how to apply transformation logic to incoming data with the Transformation Scope transformation property. Following values can be set: Row Applies the transformation logic to one row of data at a time. Examples are expression, external procedure, filter, lookup, normalizer (relational), router, sequence generator, stored procedure, etc.
Transaction Applies the transformation logic to all rows in a transaction. With this scope, PC server preserves incoming transaction boundaries. Examples are aggregator, joiner, Custom transformation, Rank, Sorter, etc.
All Input Applies the transformation logic on all incoming data. With this scope, PC server drops incoming transaction boundaries and outputs all rows from the transformation as an open transaction. For transformations like aggregator, joiner, Rank, Sorter and Custom transformation, this is default scope
47
Transaction Control Transformation Basic

Transaction Control transformation is used to define conditions to commit or rollback transactions from relational, XML and dynamic IBM MQ series targets. Following built-in variables can be used to create transaction control expression TC_CONTINUE_TRANSACTION This is the default value of the expression TC_COMMIT_BEFORE TC_COMMIT_AFTER TC_ROLLBACK_BEFORE TC_ROLLBACK_AFTER Transaction Control Transformation is an Active transformation and is always connected
48
Transaction Control Transformation Guidelines

If a mapping includes an XML target, and you choose to append or create a new document on commit, the input groups must receive data from the same transaction control point Transaction Control transformation connected to any target other than relational, XML or dynamic IBM MQSeries targets is ineffective for those targets Each target instance must be connected to Transaction Control transformation in a mapping where multiple targets are present Multiple targets can be connected to single Transaction Control transformation Only one effective Transaction Control transformation can be connected to a target Transaction Control transformation cannot be connected in a pipeline branch that starts with a Sequence Generator transformation Contd
49
Transaction Control Transformation Guidelines

If Dynamic Lookup transformation and Transaction Control transformation are used in same mapping, a rolled-back transaction might result in unsynchronized target data A Transaction Control transformation may be effective for one target and ineffective for another target. If each target is connected to an effective Transaction Control transformation, mapping is valid Examples of Effective/Ineffective Transaction Control transformation Please refer Advanced training folder on Informatica Repository Transaction Control transformation can be used to Restart Failed sessions
Constraint-based delete of rows with foreign key relations
50
Transaction Control Transformation
51
Lookup - Basics
Lookup transformation is created when data is to be looked up Lookup transformation is passive and can be connected or unconnected Lookup is used to perform tasks like Getting a related value (keys from dimension table) Perform a calculation
Update slowly changing dimension tables

Lookup can be configured as Connected or unconnected
Relational or flat file lookup

Cached or un-cached For un-cached lookups, PowerCenter server queries the Lookup table for each input row
52
Lookup - Cache
The PowerCenter Server builds a cache in memory when it processes the first row of data in a cached Lookup transformation
The PowerCenter Server stores Condition Values in the Index Cache and Output Values in Data Cache
Cache files are created in $PMCacheDir by default. These files are deleted and cache memory is released on completion of session, exception being Persistent Caches Lookup cache can be Persistent Cache Static Cache Dynamic Cache Shared Cache
53
Lookup - Cache
Persistent Cache With this option, lookup cache files are created by Informatica Server the first time and can be reused the next time Informatica Server processes Lookup transformation, thus eliminating the time required to read the Lookup table. Lookup can be configured to use persistent cache when table does not change between sessions. Static Cache By default, PowerCenter Server creates static or read-only cache for any lookup table. The PowerCenter Server does not update the cache while it processes the Lookup Transformation Dynamic Cache Usual scenario where dynamic cache is used is when Lookup table is also a target table. The PowerCenter Server dynamically inserts or updates data in Lookup cache
Shared Cache Lookup cache can be shared between multiple transformations. Unnamed cache can be shared between transformations of single mapping. Named cache can be shared between transformations of single mapping or across mappings
54
Lookup - Cache
55
Unconnected Lookup
An Unconnected Lookup transformation receives input values from the result of :LKP expression in another transformation Unconnected Lookup transformation can be called more than once in a mapping Unconnected Lookup transformation can return only single column as against connected lookup. This single column is designated as Return Port
Unconnected Lookup does not support user-defined default value for return port. By default, return value is NULL
Unconnected Lookup cannot have dynamic cache Unconnected Lookup can be used in scenarios where Slowly changing dimension tables are to be updated Lookup is called multiple times Conditional Lookup
56
Dynamic Lookup Cache Advantages

When the target table is also the Lookup table, cache is changed dynamically as the target load rows are processed in the mapping New rows to be inserted into the target or for update to the target will affect the dynamic Lookup cache as they are processed Subsequent rows will know the handling of previous rows Dynamic Lookup cache and target load rows remain synchronized throughout the Session run
57
Update Dynamic Lookup Cache

NewLookupRow port values
0 static lookup, cache is not changed 1 insert row to Lookup cache 2 update row in Lookup cache
Does NOT change row type
Use the Update Strategy transformation before or after Lookup, to flag rows for insert or update to the target
Ignore NULL Property
Per port Ignore NULL values from input row and update the cache only with non-NULL values from input
58
Example: Dynamic Lookup Configuration

Router Group Filter Condition should be: NewLookupRow = 1
This allows isolation of insert rows from update rows

59
Dynamic Lookup Cache Common Errors

"Lookup field has no associated port" when validating dynamic Lookup Transformation Sol: This message occurs when the Dynamic Lookup Cache lookup property is selected and one or more lookup/output ports have not been associated with an input port. To resolve this, go to the Ports tab and choose an Associated Port for each output/lookup port. Additional Information When a dynamic lookup cache is used, each lookup/output port must be associated with an appropriate input port or a sequence id. The session inserts or updates rows in the lookup cache based on data in associated ports. When a sequence-id is selected in Associated Port column, Informatica Server generates a key for new records inserted in lookup cache.
60
Dynamic Lookup Cache Common Errors

"CMN_1650 A duplicate row was inserted into a dynamic lookup cache" running session Sol: This error occurs when the table on which the lookup is built has duplicate rows. Since a dynamic cached lookup cannot be created with duplicate rows, the session fails with this error.
Additional Information Following options are available to work around this: Make sure there are no duplicate rows in the table before starting the session Use a static cache instead of dynamic cache Do a SELECT DISTINCT in the lookup cache SQL
61
Union Transformation - Basics

Union Transformation is a multiple input group transformation that merges data from multiple pipelines or pipeline branches into a single pipeline branch Union transformation is an active transformation and is always connected. This transformation merges data similar to the Union All SQL statement, i.e. the transformation does not remove duplicates. Update Strategy or Sequence Generator transformations cannot be used upstream from a Union transformation. Union transformation is an example of Custom transformation. Editable tabs Transformation, Properties, Groups, Group Ports Non-editable tabs Ports, Initialization Properties, Metadata Extensions, Port Attribute Definitions Designer creates one Output group by default. You cannot edit or delete the output group Input ports can be created by copying ports from upstream transformations. Designer uses the port names specified on the Group Ports tab for each input and output ports, and it appends a number to make each port name in the transformation unique.
62
Union Transformation Common Errors

New order of group ports in a Union Transformation is not reflected in the links Problem Description: After re-ordering the group ports in a Union Transformation the new order is not reflected in the links from the preceding transformation. Example: Suppose there are two ports A and B in a Source Qualifiers and UA and UB in a Union Transformation with port A connected to port UA and port B connected to port UB. Changing the order of the ports UA and UB in the Union Transformation links A to UB and B to UA.
Solution: This is a known issue (CR 83107) with PowerCenter 7.1.x.

Workaround: Delete the incorrect links and create new links.
63

Union Transformation and XML Transformations are grayed out in the Designer
Problem Description: While editing a mapping, the following transformations are grayed out and cannot be added to a mapping or created in Transformation Developer Union, Mid-Stream XML Generator, Midstream XML Parser
Solution: The transformations are grayed out because they need to be registered in the PowerCenter repository. These transformation types were introduced in PowerCenter 7.x and are treated as "native" plug-ins to the PowerCenter Repository. Like any plugin, they need to be registered in the repository. In normal operations when the repository is created, they should be registered automatically. If the registration did not occur, these transformations are not available in the Designer. Contd
64

Resolution: To resolve this issue, you need to manually register these plug-ins. You can do so using pmrepagent commands from the command line. To register the plug-in, you need to locate the XML file for the plug-in. The XML file for a native plug-in is located in PowerCenter Repository Server native subdirectory. First navigate to the directory where the pmrepserver binary is installed. Then, change the command argument for database type to the database used for your repository. The following example shows how to register the Union transformation on Microsoft SQL Server and Oracle: On Windows, enter the following commands:
pmrepagent registerplugin -r RepositoryName -n Administrator -x Administrator -t "Microsoft SQL Server" -u DatabaseUserName -p DatabasePassword -c DatabaseConnectString -i native\pmuniontransform.xml
On UNIX, enter the following commands: pmrepagent registerplugin -r RepositoryName -n Administrator -x Administrator -t "Oracle" -u DatabaseUserName -p DatabasePassword -c DatabaseConnectString -i native/pmuniontransform.xml
65
Union Transformation
66
Datatype Sizes for calculating Cache

Datatype Binary Date/Time Decimal, high precision off (all precision) Decimal, high precision on (precision <=18) Decimal, high precision on (precision >18, <=28) Decimal, high precision on (precision >28) Decimal, high precision on (negative scale) Aggregator, Rank precision + 2 18 10 18 22 10 10 Joiner, Lookup
precision + 8 Round to nearest multiple of 8

24 16 24 32 16 16
Double
Real Integer Small integer
10
10 6 6
16
16 16 16 Unicode mode: 2*(precision + 5) ASCII mode: precision + 9
NString, NText, String, Text
Unicode mode: 2*(precision + 2) ASCII mode: precision + 3
67
Datatype Sizes for calculating Sorter Cache
68
Cache Sizes
Transformation Name AGGREGATOR Min Index Cache Size Number of Groups * [(Summation( Column size) + 17] 200 * [(Summation of Column size) + 16] Number of input rows * [(Summation of Column size) + 16] (Sum of master column sizes in join condition + 16) X number rows in master table Number OF Groups * [(Sum of Column size) + 17] (Sum of master column sizes NOT in join condition but on output ports + 8) * number of rows in master table Number of Groups * [(number of ranks * ((Sum of Column size) + 10)) + 20] Max Index Cache Size Double Min Index Cache size Number of rows in Lookup table * [(Summation of Column size) + 16] * 2 Data Cache Size Number of Groups * [(Summation( Column size) + 7] Number of rows in Lookup table * [(Summation of Column size) + 8]
LOOKUP
SORTER
JOINER
RANK
69
Informatica Performance Tuning
70
This section will cover What is Performance Tuning Measuring Performance
Bottlenecks Mapping Optimization Session Task Optimization Partitioning (Lab 9)
71
Performance Tuning
There are two general areas to optimize and tune External components to Informatica (OS, memory, etc.) Internal to Informatica (tasks, mapping, workflows, etc.) Getting data through the Informatica engine This involves optimizing at the task and mapping level This involves optimizing the system to make sure Informatica runs well Getting data into and out of the Informatica engine This usually involves optimizing non-Informatica components The engine cant run faster than the source or target
72
Measuring Performance
For the purpose of identifying bottlenecks we will use:
Wall Clock time as a relative measurement time
Number of rows loaded over the period of time (rows per
second)
Rows per second (rows/sec) will allow performance measurement of a session over a period of time and with changes in our environment. Rows per sec can have a very wide range depending on the size of the row (number of bytes), the type of source/target (flat file or relational) and underlying hardware.
73
Identifying Bottlenecks
Reader Processes Configure session to read from Flat File target instead of relational target Measure the performance Writer Processes Configure session to read from Flat File target instead of relational target Measure the performance Mapping Generally if the bottleneck is not with the reader or writer process then the next step is to review the mapping Mapping bottlenecks can be created by improper configuration of aggregator, joiner, sorter, rank and lookup caches
74
Mapping Optimizing
Single-Pass Read Use a single SQL when reading multiple tables from the same database. Data type conversions are expensive Watch out for hidden port to port data type conversions Over use of string and character conversion functions Use filters early and often Filters can be used as SQL overrides (Source Qualifies, Lookups) and as transformations to reduce the amount of data processed
Simplify expressions Factor out common logic Use variables to reduce the number of time a function is used
75
Mapping Optimizing
Use operators instead of functions when possible
The concatenation operator (||) is faster than the CONCAT
function
Simplify nested IIFs when possible Use proper cache sizing for Aggregators, Rank, Sorter, Joiner and Lookup transformations
Incorrect cache sizing creates additional disk swaps that
can have a large performance degradation Use the performance counters to verify correct sizing
76
Session Task Optimizing

Run partitioned sessions
Improved performance Better utilization of CPU, I/O and data source/target
bandwidth
Use Incremental Aggregation when possible

Good for rolling average type aggregation
Reduce transformation errors

Put logic in place to reduce bad data such as nulls in a
calculation
Reduce the level of tracing

Tip: See the Velocity Methodology Document for further information
77
Partitioned Extraction and Load

Key Range Round Robin
Hash Auto Keys

Hash User Keys Pass Through
78

Key Range Partition
Data is distributed between partitions according to
pre-defined range values for keys Available in PowerCenter 5, but only for Source Qualifier. Key Range partitioning can be applied to other transformations in 6.x and up Common use for this new functionality:
Apply to Target Definition to align output with physical partition scheme of target table. Apply to Target Definition to write all data to a single file to stream into a database bulk loader that does not support concurrent loads (e.g. Teradata, DB2)
79

Key Range Partition (Continued)
You can select input or input/output ports for the keys,
not variable ports and output only ports.
Remember, the partition occurs BEFORE the transformation;
hence variable and output only are not allowed because they have not yet been evaluated
You can select multiple keys to form a Composite Key Range specification is: Start Range and End Range You can specify an Open Range also NULL values will go to the First Partition All unmatched rows will also go into First Partition, user will see the following warning message once in the log file:
80
TRANSF_1_1_2_1> TT_11083 WARNING! A row did not match any of the key ranges specified at transformation [EXPTRANS]. This row and all subsequent unmatched rows will be sent to the first target partition.

Round Robin Partitioning
The Informatica Server evenly distributes the data to
each partition. The user need not specify anything because key values are not interrogated Common use:
Apply to a flat file source qualifier when dealing with unequal
input file sizes Use user hash when there are downstream lookups/joiners Trick: All but one of the input files can be empty you no longer have to physically partition input files Note: There are performance implications with doing this. Sometimes its better, sometimes its worse.
81

Hash Partitioning
Data is distributed between partitions according to a hash
function applied to the key values PowerCenter supports Auto Hash Partitioning automatically for aggregator and rank transformations Goal: Evenly distribute data, but make sure that like key values are always processed by the same partition A hash function is applied to a set of ports.
Hash function returns a value between 1 and the number of partitions. A good hash function provides a uniform distribution of return values
Not all 1s or all 2s, but an even mix.

Based on this functions return value, the row is routed to
the corresponding partition (e.g. if 1, send to partition 1, if 2, send to partition 2, etc.)

82

Hash Auto Key
No need to specify the keys to hash on. Automatically uses all
key ports (ex, Group By key or Sort key) as a composite key Only valid for Unsorted Aggregation, Rank and Sorter The default partition type for unsorted aggregator and rank NULL values will get converted to zero for hashing.
Hash Key
Just like Auto Hash Key, but the user explicitly specifies the ports
to be used as the key.
Only input and input/output ports allowed
Common use: When dealing with input files of unequal sizes, hash partition (vs. round-robin) data into downstream lookups and joiners to improve locality of reference of caches (hash on ports used in the lookup/join condition) Override the auto hash for performance reasons
Hashing is faster on numeric values vs. strings
83

Pass Through Partitioning
Data is passed through to the next stage within the current
partition Since data is not distributed, the user need not specify anything Common use:
Create additional stages (processing threads) within pipeline
to improve performance
84

Partition tab appears in Session Task from within the Workflow Manager
85

By default, session tasks have the following partition points and partition schemes: Relational Source, Target (Pass Through) File Source, Target (Pass Through) Unsorted Aggregator, Rank (Auto Hash) NOTE: You cannot delete the default partition points for sources and targets.
You can NOT run debug session with number of partition >1
86
Partitioning Dos
Cache as much as possible in memory Spread cache files across multiple physical devices both within and across partitions Unless the directory is hosted on some kind of disk array, configure disk based caches to use as many disk devices as possible Round Robin or Hash partition a single input file until you determine this is bottleneck Range Partition to align source/target with physical partitioning scheme of source/target tables Pass through partition to apply more CPU resources to a pipeline (when TX is bottleneck)
87
Partitioning Donts
Dont add partition points if the session is already source or target constrained Tune the source or target to eliminate the bottleneck Dont add partition points if the CPUs are already maxed out (%idle < ~5%) Eliminate unnecessary processing and/or buy more CPUs Dont add partition points if system performance monitor shows regular page out activity Eliminate unnecessary processing and/or buy more memory Dont add multiple partitions until youve tested and tuned a single partition Youll be glad you did
88
Default Partition Points

Default partitioning points
Reader
Transformation
Transformation
Writer
Default Partition Points

Default Partition Point Default Partition Type Description
Source Qualifier or Normalizer Transformation Rank and unsorted Aggregator Transformation Target Instances
Pass-through
Controls how the Server reads data from the source and passes data into the source qualifier Ensures that the Server group rows before it sends them to the transformation Control how the instances distribute data to the targets
Hash auto-keys
Pass-through
89
Informatica Templates
90
This section will cover Template for generating Mapping Specifications Template for generating sessions/workflow specifications Unit Test Plan for Informatica Mapping Standards and Best Practices
91
Mapping Specifications
92
Session/Workflow Specifications
93
94

Informatica Advanced Training

Uploaded by

Copyright:

Informatica Advanced Training

Uploaded by

Document Information

Copyright

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Informatica Advanced Training

Uploaded by

Copyright:

Informatica PowerCenter 7 Advance Training

Informatica Architecture & Connectivity 6.x and 7.x

Informatica Architecture 5.x

Informatica Connectivity 5.x

Client Repository Manager

Designer Repository Server Manager ODBC Native

PowerCenter Advanced Transformations

Understanding Transaction Control Transaction Control (Lab 7)

Union Transformation (Lab 8)

Stored Procedure Transformation

Stored Procedure Transformation

Informatica supports following databases for Stored Procedure Transformation:

Stored Procedure Transformation

Stored Procedure Transformation Properties

Stored Procedure Transformation

All ports in Filter transformation are input/output ports

Filter Performance Tips

Aggregate transformation uses Informatica Cache to perform these aggregations

Group By port to create groups. The port can be I, I/O, V or O port

Aggregator Performance Tips

Pass data to Aggregator sorted by Group By ports, in ascending or descending order

Aggregator Pivoting Data (Rows to Columns)

Aggregator Pivoting Data (Rows to Columns)

Using Stored Procedure

Incremental Aggregation Using Mapping Variable

Incremental Aggregation Using Filter

Joiner Debugging Common Errors

Joiner Debugging Common Errors

Aggregator and sorter outputs combined in Joiner transformation

Joiner to emulate Lookup

Joiner to emulate Lookup

Joiner to emulate Lookup

Normalizer Debugging Common Errors

Normalizer Debugging Common Errors

Normalizer Debugging Common Errors

1. Add an Expression Transformation right after the Normalizer

Normalizer Converting Columns to Rows

Set Occurrence Change Level to 2

Normalizer VSAM Sources

Understanding Transaction Control

Transaction Control Transformation Basic

Transaction Control Transformation Guidelines

Transaction Control Transformation Guidelines

Constraint-based delete of rows with foreign key relations

Transaction Control Transformation

Update slowly changing dimension tables

Relational or flat file lookup

Dynamic Lookup Cache Advantages

Update Dynamic Lookup Cache

Does NOT change row type

Example: Dynamic Lookup Configuration

This allows isolation of insert rows from update rows

Dynamic Lookup Cache Common Errors

Dynamic Lookup Cache Common Errors

Union Transformation - Basics

Union Transformation Common Errors

Solution: This is a known issue (CR 83107) with PowerCenter 7.1.x.

Union Transformation Common Errors

Union Transformation Common Errors

Datatype Sizes for calculating Cache

precision + 8 Round to nearest multiple of 8

NString, NText, String, Text

Unicode mode: 2*(precision + 2) ASCII mode: precision + 3

Datatype Sizes for calculating Sorter Cache