Best Practices - PDI Design Guidelines
Best Practices - PDI Design Guidelines
Software Version
Pentaho 5.4, 6.x, 7.x
Name Consistently
• Recommendation: Name transformations, jobs, and steps consistently using the same
conventions.
• Rationale: This naming convention will allow you to see what type of task is being
performed when reviewing logs, files, and database logging.
• Solution: Name each job entry with a prefix of the type, followed by descriptive text. Name
each transformation step with a prefix of the type of step, followed by descriptive text.
Using Variables
This section provides information on parameters for variables, KETTLE_HOME variables, and variable
use for external references.
Logging
Here, you will find ways to navigate through logging operations such as redirecting output, tracking
audits, and handling errors upon root job fails. You will also find steps for Kettle logging, row-level
logging, and more.
Mondrian Cache
In this section, you will find information on clearing and priming the Mondrian cache, as well as JVM
job execution.
JSON Parsing
This section provides steps for separating tasks with JSON parsing, how to use JavaScript for many
levels of JSON, and ways to expedite parsing.
Expedite Parsing
• Recommendation: Use multiple copies of JSON/JavaScript steps to speed up JSON parsing.
• Rationale: The JSON step can only pass one level at a time. JSON parsing is CPU intensive. If
you can split up the task across multiple cores, it will be faster. This is only possible if you are
reading each row as an object.
• Solution: Enable multiple copies by right-clicking on a step and choosing CHANGE NUMBER
OF COPIES TO START.