Informatica Powercenter 8.6: Basics Training Course
Informatica Powercenter 8.6: Basics Training Course
Informatica Powercenter 8.6: Basics Training Course
6
Basics Training Course
Introduction
At the end of this course you will Understand how to use all major PowerCenter 8.6 components Be able to perform basic administration tasks Be able to build basic ETL Mappings and Mapplets Understand the different Transformations and their basic attributes in PowerCenter 8.6 Be able to create, run and monitor Workflows Understand available options for loading target data Be able to troubleshoot common development problems
ETL Basics
Decision Support
Data Warehouse
Cleanse Data Transaction level data Apply Business Rules Optimized for Transaction Aggregate Data Response Time Consolidate Data De-normalize Current Normalized or DeTransform Normalized data
Extract
ETL
Load
This section includes The purpose of the Repository and Repository Service The Administration Console
Version Control
10
PowerCenter Repository
It is a relational database managed by the Repository Service
Stores metadata about the objects (mappings, transformations etc.) in database tables called as Repository Content
The Repository database can be in Oracle, IBM DB2 UDB, MS SQL Server or Sybase ASE To create a repository service one must have full privileges in the Administrator Console and also in the domain Integration Service uses repository objects for performing the ETL
11
Repository Service
A Repository Service process is a multi-threaded process that fetches, inserts and updates metadata in the repository Manages connections to the Repository from client applications and Integration Service Maintains object consistency by controlling object locking Each Repository Service manages a single repository database. However multiple repositories can be connected and managed using repository domain
It can run on multiple machines or nodes in the domain. Each instance is called a Repository Service process
12
Repository Connections
Each Repository has a repository service assigned for the management of the physical Repository tables
PowerCenter
Client
1 Node A
Service Manager Application Service Repository Service
Node B (Gateway)
Service Manager
3
TCP/IP
Application Service
2 4
Repository Administration Manager Console Native Connectivity or ODBC Driver
13
Repository Database
Navigation Window Other Important Task Include Managing Objects like services, nodes, grids, licenses etc. and security within a domain Managing Logs Create &Manage Users Upgrading Repository & Server Shutdown Domain
Main Window
**Upgrading Repository means updating configuration file and also the repository tables
15
Repository Management
Perform all Repository maintenance tasks through Administration Console
Create Contents Delete Contents Backup Contents Copy Contents from Upgrade Contents Disable Repository Service View Locks
Manage Connections
Notify Users Propagate Register/UnRegister Local Repositories Edit Database properties
16
Navigator Window
Main Window
Shortcut Dependencies
17
Repository Manager
Use Repository manager to navigate through multiple folders and repositories. Perform following tasks: Add/Edit Repository Connections Search for Repository Objects or Keywords Implement Repository Security(By changing the password only) Perform folder functions ( Create , Edit , Delete ,Compare) Compare Repository Objects Manage Workflow/Session Log Entries View Dependencies Exchange Metadata with other BI tools
18
Add Repository
19
20
21
Managing Privileges
Check box assignment of privileges
22
Folder Permissions
Assign one user as the folder owner for first tier permissions Select one of the owners groups for second tier permissions All users and groups in the Repository will be assigned the third tier permissions
23
Object Locking
Object Locks preserve Repository integrity Use the Edit menu for Viewing Locks and Unlocking Objects
24
Object Searching
Menu -> Analyze > Search
Keyword search
Used when a keyword is associated with a target definition
Search all
Filter and search objects
25
Object Sharing
Reuse existing objects Enforces consistency Decreases development time Share objects by using copies and shortcuts
COPY Copy object to another folder Changes to original object not captured SHORTCUT Link to an object in another folder or repository Dynamically reflects changes to original object
Duplicates space
Copy from shared or unshared folder
Preserves space
Created from a shared folder
Mapping Components
30
Design Process
1. Create Source definition(s) 2. Create Target definition(s)
3. Create a Mapping
4. Create a Session Task 5. Create a Workflow from Task components 6. Run the Workflow 7. Monitor the Workflow and verify the results
31
Client Tools
Navigator
Workspace
Output
Status Bar
32
Mapping Components
Each PowerCenter mapping consists of one or more of the following mandatory components Sources Transformations Targets The components are arranged sequentially to form a valid data flow from SourcesTransformationsTargets
33
Transformation Toolbar
Iconized Mapping
35
Foreign Key
37
39
40
This section introduces to Different Source Types Creation of ODBC Connections Creation of Source Definitions Source Definition properties Data Preview option
42
Source Analyzer
Navigation Window
Analyzer Window
43
Import from Database Import from File Import from Cobol File Import from XML file Import from third party software like SAP, Siebel, PeopleSoft etc Create manually
Repository
Source Analyzer
Relational
XML file
Flat file
COBOL file
44
ODBC
TCP/IP
Repository Service
native
Repository
45
DEF
47
48
Flat File
DEF
native
Repository
49
DEF
.DTD File
DEF
TCP/IP
DATA
Repository Service
native
DEF Repository
In addition to the DTD file, an XML Schema or XML file can be used as a Source Definition
51
.CBL File
DEF
TCP/IP
DATA
Repository Service
DEF
53
55
Target Designer
56
Manual Creation
Automatic Creation
57
58
Target Designer
Database
ODBC
TCP/IP
DEF
Repository Service
native
Repository DEF
59
61
62
DEF
DEF
DEF
63
Use Preview Data to verify the results (right mouse click on object)
64
Transformation Concepts
By the end of this section you will be familiar with: Transformation types
66
Types of Transformations
Active/Passive Active : Changes the numbers of rows as data passes through it Passive: Passes all the rows through it Connected/Unconnected
67
Transformation Types
PowerCenter 8.6 provides 24 objects for data transformation Aggregator: performs aggregate calculations Application Source Qualifier: reads Application object sources as ERP Custom: Calls a procedure in shared library or DLL Expression: performs row-level calculations External Procedure (TX): calls compiled code for each row Filter: drops rows conditionally Mapplet Input: Defines mapplet input rows. Available in Mapplet designer Java: Executes java code Joiner: joins heterogeneous sources Lookup: looks up values and passes them to other objects Normalizer: reads data from VSAM and normalized sources Mapplet Output: Defines mapplet output rows. Available in Mapplet designer
68
Transformation Types
Rank: limits records to the top or bottom of a range Router: splits rows conditionally Sequence Generator: generates unique ID values Sorter: sorts data Source Qualifier: reads data from Flat File and Relational Sources Stored Procedure: calls a database stored procedure Transaction Control: Defines Commit and Rollback transactions Union: Merges data from different databases Update Strategy: tags rows for insert, update, delete, reject XML Generator: Reads data from one or more Input ports and outputs XML through single output port XML Parser: Reads XML from one or more Input ports and outputs data through single output port XML Source Qualifier: reads XML data
69
Passive T T T
Active T
Example holds true with Normalizer in lieu of Source Qualifier. Exceptions are: Mapplet Input and Joiner transformations
70
Transformation Views
A transformation has three views:
Iconized - shows the transformation in relation to the rest of the mapping Normal - shows the flow of data through the transformation Edit - shows transformation ports and properties; allows editing
71
Edit Mode
Allows users with folder write permissions to change or create transformation ports and properties
Define port level handling Define transformation level properties
Enter comments
Make reusable
72
Expression Editor
An expression formula is a calculation or conditional statement Used in Expression, Aggregator, Rank, Filter, Router, Update Strategy Performs calculation based on ports, functions, operators, variables, literals, constants and return values from other transformations
73
Specific to the source and target database types Display in source and target tables within Mapping Designer
PowerCenter internal datatypes based on ANSI SQL-92 Display in transformations within Mapping Designer
Native
Transformation
Native
Transformation datatypes allow mix and match of source and target database types When connecting ports, native and transformation datatypes must be compatible (or must be explicitly converted)
74
Datatype Conversions
Integer, Small Int Integer, Small Integer Decimal Double , Real String , Text Date/Time Binary X X X X Decimal X X X X Double, Real X X X X String , Text X X X X X X X X Date/ Time Binary
All numeric data can be converted to all other numeric datatypes, e.g. - integer, double, and decimal All numeric data can be converted to string, and vice versa Date can be converted only to date and string, and vice versa Raw (binary) can only be linked to raw Other conversions not listed above are not supported These conversions are implicit; no function is necessary
75
ASCII CHR CHRCODE CONCAT INITCAP INSTR LENGTH LOWER LPAD LTRIM RPAD RTRIM SUBSTR UPPER REPLACESTR REPLACECHR
Character Functions
76
PowerCenter Functions
TO_CHAR (numeric) TO_DATE TO_DECIMAL TO_FLOAT TO_INTEGER TO_NUMBER
Conversion Functions Used to convert datatypes
ADD_TO_DATE DATE_COMPARE DATE_DIFF GET_DATE_PART LAST_DAY ROUND (date) SET_DATE_PART TO_CHAR (date) TRUNC (date)
Date Functions Used to round, truncate, or compare dates; extract one part of a date; or perform arithmetic on a date To pass a string to a date function, first use the TO_DATE function to convert it to an date/time datatype
77
PowerCenter Functions
Numerical Functions
ABS CEIL CUME EXP FLOOR LN LOG MOD MOVINGAVG MOVINGSUM POWER ROUND SIGN SQRT TRUNC
78
PowerCenter Functions
ERROR ABORT DECODE IIF
Special Functions Used to handle specific conditions within a session; search for certain values; test conditional statements
IIF(Condition,True,False)
Test Functions Used to test if a lookup result is null Used to validate data
SOUNDEX METAPHONE
79
Expression Validation
The Validate or OK button in the Expression Editor will: Parse the current expression Remote port searching (resolves references to ports in other transformations) Parse transformation attributes e.g. - filter condition, lookup condition, SQL Query Parse default values Check spelling, correct number of arguments in functions, other syntactical errors
80
Types of Ports
Four basic types of ports are there Input Output Input/Output
Variable
Apart from these Look-up & Return ports are also there that are specific to the Lookup transformation
81
82
Connection Validation
Examples of invalid connections in a Mapping: Connecting ports with incompatible datatypes Connecting output ports to a Source Connecting a Source to anything but a Source Qualifier or Normalizer transformation Connecting an output port to an output port or an input port to another input port Connecting more than one active transformation to another transformation (invalid dataflow)
83
Mapping Validation
Mappings must: Be valid for a Session to run
84
Determines how the source database binds data when the Integration Service reads it If mismatch between the source definition and source qualifier datatypes then mapping is invalid All ports by default are Input/Output ports
85
Used as Joiner for homogenous tables using a where clause Filter using a where clause
86
Can use Mapping Parameters and Variables in SQL executed against the source Use a semi-colon (;) to separate multiple statements Informatica Server ignores semi-colons within single quotes, double quotes or within /* ...*/ To use a semi-colon outside of quotes or comments, escape it with a back slash (\) Workflow Manager does not validate the SQL
87
88
Filter Transformation
Drops rows conditionally
Active Transformation Connected Ports
All input / output
90
Joiner Transformation
By the end of this sub-section you will be familiar with: When to use a Joiner Transformation
Homogeneous Joins
Heterogeneous Joins Joiner properties Joiner Conditions Nested joins
91
Homogeneous Joins
Joins that can be performed with a SQL SELECT statement: Source Qualifier contains a SQL join
92
Heterogeneous Joins
Joins that cannot be done with a SQL statement: An Oracle table and a Sybase table Two Informix tables on different database servers Two flat files A flat file and a database table
93
Joiner Transformation
Performs heterogeneous joins on records from two tables on same or different databases or flat file sources
Active Transformation Connected Ports
All input or input / output M denotes port comes from master source
Joiner Conditions
95
Joiner Properties
Join types: Normal (inner) Master outer Detail outer Full outer
The flow of sorted data must me maintained by avoiding the use of transformations like Rank, Custom, Normalizer etc. which alter the sort order
Enable the sorted input option is properties tab The order of the ports used in joining condition must match the order of the ports at the sort origin When joining the Joiner output with another pipeline make sure that the data from the first joiner is sorted
97
98
Expression Transformation
Passive Transformation
Connected
Ports Mixed Variables allowed Create expression in an output or variable port Usage Perform majority of data manipulation
Click here to invoke the Expression Editor
99
Introduction To Workflows
This section will include Integration Service Concepts The Workflow Manager GUI interface Setting up Server Connections
Relational FTP External Loader Application
101
Integration Service
Application service that runs data integration sessions and workflows
102
Integration Service
103
106
107
User Name/Password Database connectivity information Optional Environment SQL (executed with each use of database connection) Optional Environment SQL (executed before initiation of each transaction)
108
FTP Connection
Create an FTP connection Instructions to the Integration Service to ftp flat files Used in Session Tasks
109
110
Task Developer
Create basic Reusable building blocks to use in any Workflow Reusable Tasks Session - Set of instructions to execute Mapping logic Command - Specify OS shell / script command(s) to run during the Workflow Email - Send email at any point in the Workflow
111
Session Tasks
s_CaseStudy1
After this section, you will be familiar with: How to create and configure Session Tasks
112
Session Task
Integration Service instructs to runs the logic of ONE specific Mapping
s_CaseStudy1
e.g. - source and target data location specifications, memory allocation, optional Mapping overrides, scheduling, processing and load instructions Becomes a component of a Workflow (or Worklet) If configured in the Task Developer, the Session Task is reusable (optional)
113
Session Task
Created to execute the logic of a mapping (one mapping only)
s_CaseStudy1
Session Tasks can be created in the Task Developer (reusable) or Workflow Developer (Workflow-specific) Steps to create a Session Task Select the Session button from the Task Toolbar or Select menu Tasks -> Create
114
115
116
117
118
119
Allows overrides of some transformation properties Does not change the properties in the Mapping
120
121
Command Task
Command
Specify one (or more) Unix shell or DOS (NT, Win2000) commands to run at a specific point in the Workflow Becomes a component of a Workflow (or Worklet) If configured in the Task Developer, the Command Task is reusable (optional)
Commands can also be referenced in a Session through the Session Components tab as Pre- or Post-Session commands
122
Command Task
Command
123
Email Task
Email
Sends email during a workflow Becomes a component of a Workflow (or Worklet) If configured in the Task Developer, the Email Task is reusable (optional) Email can be also sent by using post-session email option and suspension email options of the session. (Non-reusable)
124
Email Task
Email
125
Workflow Structure
A Workflow is set of instructions for the Integration Service to perform data transformation and load Combines the logic of Session Tasks, other types of Tasks and Worklets The simplest Workflow is composed of a Start Task, a Link and one other Task
Link
Start Task
Session Task
126
127
Developing Workflows
Create a new Workflow in the Workflow Designer
Customize Workflow name
Configure Workflow
129
Workflow Properties
Customize Workflow Properties
Workflow log displays
130
Workflows Properties
Create a User-defined Event which can later be used with the Raise Event Task
Define Workflow Variables that can be used in later Task objects (example: Decision Task)
131
Select the workflows Note: All the folders should be closed for assigning workflows to Integration Service
132
133
Workflows Administration
This section details The Workflow Monitor GUI interface Monitoring views
135
Monitoring Workflows
Perform operations in the Workflow Monitor Restart -- restart a Task, Workflow or Worklet
Monitor Workflows
The Workflow Monitor is the tool for monitoring Workflows and Tasks Review details about a Workflow or Task in two views Gantt Chart view Task view
138
Monitoring Workflows
Task View
Completion Time Workflow Start Time Status
Status Bar
139
Right-click on Session to retrieve the Session Log (from the Integration Service to the local PC Client)
140
Debugger Features
Debugger is a Wizard driven tool View source / target data
Configure the Debugger Port to 6010 as thats the default port configured by the Integration Service for the Debugger
142
Debugger Interface
Debugger windows & indicators
Debugger Mode indicator
Lookup
Update Strategy Sequence Generator Rank Normalizer Stored Procedure External Procedure Custom Transformation Transaction Control
145
Router Transformation
Multiple filters in single transformation
Adds a group
146
SQ_TARGET_O RDERS_COST
RTR_OrderCostt
147
Router
Tests rows for one or more condition
Routes the rows not meeting the filter condition to default group
In case of multiple filter transformation the Integration service processes rows for each transformation but in case of router the incoming rows are processed only once.
148
Sorter Transformation
Sorter Transformation
Discard duplicate rows by selecting Distinct option
Aggregator Transformation
Performs aggregate calculations
Mixed
Variables allowed Group By allowed
151
152
Aggregate Expressions
Aggregate functions are supported only in the Aggregator Transformation Conditional Aggregate expressions are supported
Aggregator Properties
Sorted Input Property
Instructs the Aggregator to expect the data to be sorted
154
The Integration Service will cache data from each group and release the cached data -- upon reaching the first record of the next group Data must be sorted according to the order of the Aggregator Group By ports Performance gain will depend upon varying factors
155
Incremental Aggregation
Trigger in Session Properties -> Performance Tab
ORDER_ITEMS (Oracle)
ORDERS (Oracl e)
SQ_ORDERS_I TEMS
EXP_GET_INC REMENTAL_DA TA
AGG_INCREME NTAL_DATA
Upon next run, files are overwritten with new cache information Functions like median ,running totals not supported as system memory is used for these functions
Example: When triggered, Integration Service will save new MTD totals. Upon next run (new totals), Service will subtract old totals; difference will be passed forward
Best Practice is to copy these files in case a rerun of data is ever required. Reinitialize when no longer needed, e.g. at the beginning new month processing
156
Lookup Transformation
By the end of this sub-section you will be familiar with: Lookup principles
Lookup properties
Lookup conditions Lookup techniques Caching considerations
157
If a match is found, one or more table values are returned to the Mapping. If no match is found, NULL is returned Look Up Transformation
Look-up Values
SQ_TARGET_ITEMS_OR... Source Qualifier Name Datatype Len... 38 72 72 10 38 38 38 38 38 Name ITEM_ID decimal ITEM_NAME string ITEM_DESC string WHOLESALE_CO... decimal DISCONTINUED_... decimal MANUFACTURER...decimal DISTRIBUTOR_ID decimal ORDER_ID decimal TOTAL_ORDER_... decimal LKP_OrderID Lookup Procedure Datatype IN_ORDER_ID decimal DATE_ENTERED date/ time DATE_PROMISED date/ time DATE_SHIPPED date/ time EMPLOYEE_ID decimal CUSTOMER_ID decimal SALES_TAX_RATE decimal STORE_ID decimal 38 19 19 19 38 38 5 38
Return Values
No Yes Yes Yes Yes Yes Yes Yes No No No No No No No No
TARGET_ORDERS_COS... Target Definition Datatype ORDER_ID number(p,s) DATE_ENTERED date DATE_PROMISED date DATE_SHIPPED date EMPLOYEE_ID number(p,s) CUSTOMER_ID number(p,s) SALES_TAX_RATE number(p,s) STORE_ID number(p,s) TOTAL_ORDER_... number(p,s)
3 1 1 1 3 3 5 3 3
158
Lookup Transformation
Looks up values in a database table or flat files and provides data to downstream transformation in a Mapping
Passive Transformation Connected / Unconnected Ports
Mixed L denotes Lookup port R denotes port used as a return value (unconnected Lookup only)
Lookup Properties
160
161
Lookup Conditions
Multiple conditions are supported
162
Connected Lookup
SQ_TARGET_ITEMS_OR... Source Qualifier Name Datatype Len... 38 72 72 10 38 38 38 38 38 Name ITEM_ID decimal ITEM_NAME string ITEM_DESC string WHOLESALE_CO... decimal DISCONTINUED_... decimal MANUFACTURER...decimal DISTRIBUTOR_ID decimal ORDER_ID decimal TOTAL_ORDER_... decimal
LKP_OrderID Lookup Procedure Datatype IN_ORDER_ID decimal DATE_ENTERED date/ time DATE_PROMISED date/ time DATE_SHIPPED date/ time EMPLOYEE_ID decimal CUSTOMER_ID decimal SALES_TAX_RATE decimal STORE_ID decimal 38 19 19 19 38 38 5 38 No Yes Yes Yes Yes Yes Yes Yes No No No No No No No No
TARGET_ORDERS_COS... Target Definition Len... Loo... Ret... Associated K...Name ... Datatype ORDER_ID number(p,s) DATE_ENTERED date DATE_PROMISED date DATE_SHIPPED date EMPLOYEE_ID number(p,s) CUSTOMER_ID number(p,s) SALES_TAX_RATE number(p,s) STORE_ID number(p,s) TOTAL_ORDER_... number(p,s)
3 1 1 1 3 3 5 3 3
Unconnected Lookup
Will be physically unconnected from other transformations
There can be NO data flow arrows leading to or from an unconnected Lookup
Lookup function can be set within any transformation that supports expressions
Lookup data is called from the point in the Mapping that needs it Function in the Aggregator calls the unconnected Lookup
164
Condition
IIF ( ISNULL(customer_id),0,:lkp.MYLOOKUP(order_no))
Lookup function
Conditional statement is evaluated for each row Lookup function is called only under the pre-defined condition
165
167
Part of the mapping data flow Returns multiple values (by linking output ports to another transformation) Executed for every record passing through the transformation More visible, shows where the lookup values are used Default values are used
Separate from the mapping data flow Returns one value (by checking the Return (R) port option for the output port that provides the return value) Only executed when the lookup function is called Less visible, as the lookup is called from an expression within another transformation Default values are ignored
168
Uncached
Each Mapping row needs one SQL SELECT
Rule Of Thumb: Cache if the number (and size) of records in the Lookup table is small relative to the number of mapping rows requiring lookup or large cache memory is available for Integration Service
169
171
Use the Update Strategy transformation before or after Lookup, to flag rows for insert or update to the target
Ignore NULL Property
Per port
Ignore NULL values from input row and update the cache using only with non-NULL values from input
172
Persistent Caches
By default, Lookup caches are not persistent When Session completes, cache is erased Cache can be made persistent with the Lookup properties When Session completes, the persistent cache is stored on the machine hard disk files The next time Session runs, cached data is loaded fully or partially into RAM and reused
Smart aggregation
175
176
Usage
Generate sequence numbers Shareable across mappings
178
Increment Value
To repeat values
Number of Cached Values
179
Rank Transformation
RNKTRANS
Active Connected
180
Normalizer Transformation
NRMTRANS
Active Connected
Used to organize data to reduce redundancy primarily with the COBOL sources
A single long record with repeated data is converted into separate records.
181
Stored Procedure
GET_NAME_US ING_ID
182
External Procedure
Passive Connected/ Unconnected
Used to run the procedures created outside of the Designer Interface in other programming languages like c , c++ , visual basic etc.
Using this transformation we can extend the functionality of the transformations present in the Designer
183
Custom Transformation
Active/Passive Connected
Can be bound to a procedure that is developed using the functions described under custom transformations functions
Using this we can create user required transformations which are not available in the PowerCenter like we can create a transformation that requires multiple input groups, multiple output groups, or both.
184
Transaction Control
TC_EMPLOYEE
Active Connected
Used to control commit and rollback transactions based on a set of rows that pass through the transformation
Can be defined at the mapping as well as the session level
185
Reusability
Mapplets
Tasks
187
188
System Variables
SYSDATE
$$$SessStartTime
Returns the system date value as a string when a session is initialized. Uses system clock on machine hosting Integration Service
format of the string is database type dependent Used in SQL override Has a constant value
SESSSTARTTIME
189
190
Userdefined names
SetMinVariable -- Evaluates the value of a mapping variable to the lower of two values (compared against the value specified)
SetVariable -- Sets the value of a mapping variable to a specified value
192
Transformation Developer
Transformations used in multiple mappings are called Reusable Transformations
Mapplet Developer
When a group of transformation are to be reused in multiple mappings then we develop mapplets
194
Reusable Tasks
Tasks can be created in
Task Developer (Reusable)
Tasks can be made reusable my checking the Make Reusable checkbox in the general tab of sessions Following tasks can be made reusable:
Session Email
Command
When a group of tasks are to be reused then use a worklet (in worklet designer )
195
Queries???
Thank You!!!