01 Ab Initio Advance Concepts E2
01 Ab Initio Advance Concepts E2
01 Ab Initio Advance Concepts E2
• EME Administration
• Features of EME
• Sandbox and projects
• Checking out graphs, files and projects
• Checking in projects, sandboxes, graphs and files
• Dependency analysis
• Using web to access EME datastore
• Reports, Versioning and Tagging
• Air commands
• Custom Components
• Continuous flows
• XML
July 6, 2010
Introduction
What is EME?
EME, Enterprise Meta Environment, is an object oriented
data storage system that version controls and manages
various kinds of information associated with Ab Initio
applications, which may range from design information to
operational data. In simple terms, it is a repository, which
contains data about data – metadata.
Why EME?
• Avoid setup scripts, use EME parameters – more
standardisation
• Source Control
• Dependency and impact analysis for the graphs in the
repository
Project structure :
Typically a project should contains maximum of 5 to 10 graphs. This
helps in organising the code efficiently within EME. With increase in
the number of graphs in a Project, the time taken to perform
dependency analysis on the graphs and related data increases.
Before adding a Project to an existing application, which already has
a number of Projects in place, the impact it might have on other
Projects and on the Application as a whole must be considered.
Check-in
Check-out
Project parameters are inherited by all the graphs in the Project and
are accessed from the GDE by the sandbox parameter editor in
Project>Edit Sandbox>Parameters. This shows a dialog box
prompting to enter the sandbox path. Choose the correct host and
the sandbox path and press OK to open the sandbox parameter
editor, which exactly like the graph parameter editor shown as
above.
• Host : This is the name of a host profile which you can select
from the drop-down list. You can edit the currently-selected profile
by clicking Hosts > Edit. You can create a new profile by clicking
Hosts > New. May 18, 2010
Dependency Analysis in the GDE
• Directory : The location you specify here will be where the air
command is run. It should generally be the pathname to the
sandbox location of the objects you wish to analyze.
• Project Directory or File : It should be the full pathname of the
project you wish to analyze.
Click the Advanced button to access the settings for the
actual analysis. It will open the Advanced Options dialog .
• In the Home Object field type the datastore location you would like to
be able to navigate to quickly.
• Click Login.
May 18, 2010
Check out of files using GDE
Check out :
The Ab Initio GDE provides wizards to check out code from the EME
to sandbox. Check out updates the sandbox with the particular
version of code that is being checked out from the EME. By default
the latest version of any object is checked out, but we can check
out any version of code we want. Any object that is version
controlled in the EME Datastore can be checked out to a sandbox,
which may be pre-existing or may be created during check out
process itself. While checking out a Project or any objects belonging
to the Project to a sandbox, stdenv and any common Projects
associated with it also need to be referenced in the sandbox. If the
sandbox to which you are checking out is an existing one, it would
have the information as to where to reference for the common
projects (The stdenv sandbox and the public sandboxes). In case it
is a new sandbox, during check out we have to point to the stdenv
and public sandbox (if any) paths.
• air object access -Tests whether or not you can read, write,
execute or see the existence of the specified object, given your limited
permissions and the current state of the datastore being accessed as
rpath. The command returns either 1 (True) or 0 (False)
• air project tag : Tags a project and all the files it contains.
air project tag project-name tag [comment]
July 6, 2010
Building a Custom Component(GDE)
• Building a Custom Component for use in the Ab
Initio GDE is straightforward.
• The user can build a Program specification file which has
all the capabilities of a built in Ab Initio Component. That is, all
the visual capabilities expected within the GDE and all of the
capability to perform in a graph at run time with all the
capabilities of the program wrapped within the file.
• Program specification files provide the Co>Operating System
with the information it needs to run program or shell script.
• Program specification files should be with .mpc extensions
• All program specification files must start with <mpcfile>
• The <mpcfile> line is followed by a series of attribute:
value lines that describe the attributes of program.
mpname: “unitool”
image: “name-of-executable”
Port usage
Port usage
Port usage
"=metadata-string"
"&Remote-File-Path"
"LLocal-File-Path"
Port Example 1
Port Example 2
Port Example 3
Parameter usage :
• The placement is either positional or
keyword.
• positional parameters are sent to the
executable in the given order.
• The GDE may re-order keyword parameters
and will pass the values to the executable
preceded by the a dash (-) and the keyword
name.
• The name indicates the name of the parameter
as seen in component properties on the
parameters tab. It is also the switch used for
keyword parameters.
May 18, 2010
Descriptions of Attribute & Variables
Parameter usage :
Parameter usage :
Parameter Example 1 :
Parameter Example 2 :
Parameter Example 3 :
Argument usage :
argument: literal $1
//send the first parameter
Argument usage :
July 6, 2010
Outline
Performance
We don’t have to pay for job startup costs
Latency
Results are available sooner
Flexibility
Enables processing from unreliable data sources.
• Compute Points
• Checkpoints
• Queues
Subscribing
cfqueue/subscribe.sub1:
cursor data.00000003 data.00000004
data.00000005
cfqueue/subscribe.sub2:
cursor data.00000002 data.00000004
data.00000001 data.00000003
data.00000005
• Restart by re-running
• Job restarts from last committed checkpoint
• Tracking information is cumulative from job
start
• Checkpoints, queues, etc. all cleaned up
when job exits (example is shutdown with
m_shutdown)
• publish_style
queue - output to an Ab Initio queue (default)
files_after - output to a sequence of files
appended - append results to a single file
• In-memory only
No spilling to disk.
Fail if they run out of memory.
July 6, 2010
Read XML
Runtime Behavior :
July 6, 2010
Case Study 1
Create a graph which
• reads from the file insource with the record
format:
record
decimal(“,”) f1;
decimal(“\n”) f2;
end;
• checkpoints every 30 seconds
• writes to a queue with subscribers sub1 and
sub2.
Examine the state of the queue at various
intermediate points. Try killing and restarting the job.
Generate insource using the provided dribble
program.