ODI Scheduler Source Control Performance

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 62

AGENDA

ODI Performance
ODI Scheduling
ODI Deployment/Release
ULI BETHKE
Dublin based
Blog www.bi-q.ie
ODI 2007
Reviewer two ODI books
ODI articles OTN
Deputy chair OUG BI SIG. Next event 11th June
ODI advanced trainer
ODI PERFORMANCE

ODI is a metadata driven (SQL) code generator using


code templates (knowledge modules). It uses a Java
agent to communicate and send data between
source and target systems and the repository over
the network.
SQL

- > 80%: ODI performance issues = SQL issues


=> SQL main ODI skill
- Perfect your SQL. Advanced SQL. Analytic
Functions
- Know your database(s) inside out. In
particular the target
- Understand, write, and modify Knowledge
Modules
AGENT

- Light weight Java based application


- Tied to host OS
- Generates code based on ODI metadata.
- Communicates source, target, repository.
- JDBC data transport
- XML
- Jetty
- Interpreters: Jython, JBS, JavaScript, Groovy
- HSQLDB in memory database
- Scheduler
- Sizing
AGENT

Target
- Least amount of roundtrips. Network (JDBC, XML)
- One target database server only (DW)
Another Server
- ODBC drivers
- JEE agent on Weblogic
- No support for target OS
- Resources on target
- DBA
INTERFACES

- No!! KM using row by row processing


- Use ODI functions rather than DB functions
- Dont overuse CKM (especially for large data
volumes)
- temp indexes (I$)
- Gather statistics (C$, I$, TGT when applicable)
- Rule of thumb: Use loader KMs or db link KMs
rather than JDBC KMs
SOURCE/TARGET

- Schemas on same database server. Physical


schema and not data server.

- Have sources physically close to target


- Minimize impact on source
- Chunking
CRITICAL PATH

NETWORK PATHS: PATH DURATIONS:


B>E>H 6 + 2 + 11=19
B>D> F 6 + 4 + 14=24
B>D>G 6 + 4 + 10=20
A>C>G 9 + 8 + 10=27 CRITICAL
PATH
MICRO TUNING

JDBC drivers
JVM
Type 4 or 5 JDBC drivers (Data Direct)
Array fetch size.
DB packet size.
Network packet size.
PERFORMANCE MONITORING

ODI Log Data Mart


Facts
Dimensions
Metrics
Frontend
DBMS_SQLTUNE_UTIL0

dbms_sqltune_util0.sqltext_to_sqlid
Link to Data Dictionary Tables
MACIEJ KOCON
Dublin based
ODI 2005 (Sunopsis)
Reviewer two ODI books
Blog www.bi-q.ie
maciek@bi-q.ie
ORCHESTRATING DWH
PROCESSES
Orchestration of Data Process Flow
Standard DWH Process flow orchestration
Packages in Oracle Data Integrator 10g
Load Plans in Oracle Data Integrator 11g
Process Flow use cases - efficiency analysis
Alternative scheduling
benefits
TYPICAL DATA FLOW in DWH
step 1
STAGE E-LT

DATA EXTRACT
loads data from
sources
TYPICAL DATA FLOW in DWH
step 1 ste 2
p
STAGE DIMs E-LT

DATA EXTRACT LABEL


loads data from provides
sources structured labeling
information
TYPICAL DATA FLOW in DWH
step 1 ste 2 ste 3
p p
STAGE DIMs FACTS E-LT

DATA EXTRACT LABEL FACTS


loads data from provides consists of
sources structured labeling measurements,
information metrics or facts
TYPICAL DATA FLOW in DWH
step 1 ste 2 ste 3
p p
STAGE DIMs FACTS E-LT

DATA EXTRACT LABEL FACTS


loads data from provides consists of
sources structured labeling measurements,
information metrics or facts data transport &
transform units
TYPICAL DATA FLOW in DWH
step 1 ste 2 ste 3
p p
STAGE DIMs FACTS E-LT

DATA EXTRACT LABEL FACTS


loads data from provides consists of
sources structured labeling measurements,
information metrics or facts data transport &
transform units

ODI 10g ODI 11


Packages Load Plans orchestration
ORCHESTRATION ODI PACKAGES
using object directly
PKG_ABC

INT_A
PRC_B
INT_C

PKG_DE

INT_D
INT_E

ORCHESTRATION ODI PACKAGES
using object directly using scenarios compiled
code
PKG_ABC SYNCHRONOUS PKG_ABCDE

INT_A INT_A
PRC_B PRC_B
INT_C INT_C

PKG_DE
PKG_DE
INT_D
INT_E

ORCHESTRATION ODI PACKAGES
using object directly using scenarios compiled
code
PKG_ABC SYNCHRONOUS PKG_ABCDE

INT_A INT_A
PRC_B PRC_B
INT_C INT_C

PKG_DE
PKG_DE
INT_D
INT_E

ASYNCHRONOUS PKG_ABCDE

INT_A
PRC_B
INT_C

PKG_DE
ODI 10g vs. ODI 11
STAGE DIMs FACTS
PKG_DM
PKG_ABC PKG_DE PKG_FG
INT_A INT_C INT_F ODI 10g
PRC_B PRC_D PRC_G Packages
INT_C

A D F
B E G
C
ODI 10g vs. ODI 11
STAGE DIMs FACTS
PKG_DM
PKG_ABC PKG_DE PKG_FG
INT_A INT_C INT_F ODI 10g
PRC_B PRC_D PRC_G Packages
INT_C

ODI 11
Load plans
ODI 10g vs. ODI 11
STAGE DIMs FACTS
PKG_DM
PKG_ABC PKG_DE PKG_FG
INT_A INT_C INT_F ODI 10g
PRC_B PRC_D PRC_G Packages
INT_C

ODI 11
Load plans

A D F
SAME
B E G EFFECT!
C
PROCESS FLOW EFFICIENCY ANALYSIS
Standard Flow Orchestration:
Stage-(stop)DIMs-(stop)Facts

A
30 sequential
10 10
D F
B 10 A
30 10

parallel
E G D F
C
10 B
30 10 10 E G
10 C
30 10
30 + 30 + 10 = 70
10
PROCESS FLOW EFFICIENCY ANALYSIS
Standard Flow Orchestration:
Stage-(stop)DIMs-(stop)Facts

A
30 sequential
10 10
D F
B 10 A
30 10

parallel
E G D F
C
10 B
30 10 10 E G
10 C
30 10
30 + 30 + 10 = 70
10

DOWNSIDES:
POSSIBLE INEFFICIENCIES (IDLE RESOURCES)
PROCESS FLOW EFFICIENCY ANALYSIS

OPTIMIZATION ATTEMPT

30
A 10 10
D F
10
B 30 10
E G
10
C
PROCESS FLOW EFFICIENCY ANALYSIS

OPTIMIZATION ATTEMPT

30
sequential
A 10 10
D F
10 A D
B

parallel
30 10
E G F
10
C B
30 10 10
E G
10 30 + 10
30 10 + 10 = 50 C
10 10 + 30

70 50 = 1.4 times quicker!


UPSIDE:
EFFICIENCY IMPROVED
ADVANCED DATA FLOW EXAMPLE
ENTERPRISE DWH DATA FLOW EXAMPLE
ENTERPRISE DWH DATA FLOW EXAMPLE
PROCESS FLOW EFFICIENCY ANALYSIS

OPTIMIZATION ATTEMPT

30
sequential
A 10 10
D F
10 A D
B

parallel
30 10
E G F
10
C B
30 10 10
E G
10 30 + 10
30 10 + 10 = 50 C
10 10 + 30

70 50 = 1.4 times quicker!


UPSIDE:
EFFICIENCY IMPROVED
DOWNSIDES:
TIMINGS KNOWLEDGE REQUIRED
OVERALL DEPENDECY KNOWLEDGE REQURED
PROCESS FLOW EFFICIENCY ANALYSIS

OPTIMIZATION ATTEMPT

A
30 sequential
D
10
F
10 70

B
10 A
30 10

parallel
E G 70 D F
C
10 B
30 10 10 E G
10 C
30 10
30 + 30 + 10 = 70
10

DOWNSIDE:
INEFFICIENCY EXISTS BUT CANT BE RESOLVED
CONSUMER WAITING & IMPACT
TRADITIONAL SCHEDULING - LIMITATIONS

Possible inefficiencies (idle resources)


Timings knowledge required
Overall dependecy knowledge requred
Inefficiency exists but cant be resolved
Consumer waiting & impact
TRADITIONAL SCHEDULING - LIMITATIONS

Possible inefficiencies (idle resources)


Timings knowledge required
Overall dependecy knowledge required
Inefficiency exists but cant be resolved
Consumer waiting & impact

SCHEDULER
DEPENDENCY DRIVEN SCHEDULING
A
D
B
E
C

A A A
B B B
C C C
D D D
E E E

A A A
B B B
C C C
D D D
E E E
DEPENDENCY DRIVEN SCHEDULING
A
D
PACKGAGES
B &
E LOAD PLANS
C

A A A
B B B
C C C
D D D
E E E

A A A
B B B
C C C
D D D
E E E
PROCESS FLOW EFFICIENCY ANALYSIS
A 30 sequential
D
10
F
10 70

B 10
30
A

parallel
10
E G 70 D F
C
10 B
30 10 10 E G
10 C
30 10
30 + 30 + 10 = 70
10

30
A 10 10
D F
10
B 30 10
E G
10
C
10 10 10
10
30 30 10
PROCESS FLOW EFFICIENCY ANALYSIS
A 30 sequential
D
10
F
10 70

B 10
30
A

parallel
10
E G 70 D F
C
10 B
30 10 10 E G
10 C
30 10
30 + 30 + 10 = 70
10

30
A 10 10
D F 30
10
B 30 10
E G 70
10
C
10 10 10
10
30 30 10

70 30 = 2.3 times faster!


DEPENDENCY DRIVEN SCHEDULING

Simplifies orchestrating the flow


only immediate upstream definition required
execution timings not relevant
self-adapts in the most effective way
Improves overall E-LT performance
Less idle resources better utilization
Independency
unveils its full potential in complex Enterprise class
DWHs (Inmon)
DEPENDENCY DRIVEN SCHEDULING

Notifications
errors (+auto-restartability)
finish summary
logging
Multiple/overlapping E-LT streams
load with different frequencies
Parameterization
improved system stress control
process prioritization
F I R S T RUN

10
processes
F I R S T RUN T O D AY

10 584
processes processes
1389
DEPENDENCIES
F I R S T RUN T O D AY

10 584
processes processes
1389
DEPENDENCIES

132 231 SCENARIOS RUN


F I R S T RUN T O D AY

10 584
processes processes
1389
DEPENDENCIES

132 231 SCENARIOS RUN

12h43m
TIME

LOAD PLANS
F I R S T RUN T O D AY

10 584
processes processes
1389
DEPENDENCIES

132 231 SCENARIOS RUN

12h43m 4h21m 2.9


TIME

LOAD PLANS SCHEDULER TIMES


FASTER
ENTERPRISE DWH DATA FLOW
RELEASE 1.0
RELEASE 2.0 TST
TESTING RELEASE 2.0
DEPLOY RELEASE 2.0 PRD
THE HOT FIX SITUATION
RELEASE FREQUENTLY
CI ENVIRONMENT
CI ENVIRONMENT
THE BUILD MASTER
AUTOMATE STUFF
ODI VS. SOURCE CONTROL
ODI STRUCTURE
BEYOND INTRA BUILD
DEPENDENCIES

You might also like