01 Ab Initio Advance Concepts E2

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 152

Ab Initio E2

Advance Training Course


Course Content

• EME Administration
• Features of EME
• Sandbox and projects
• Checking out graphs, files and projects
• Checking in projects, sandboxes, graphs and files
• Dependency analysis
• Using web to access EME datastore
• Reports, Versioning and Tagging
• Air commands
• Custom Components
• Continuous flows
• XML

May 18, 2010


EME Administration

July 6, 2010
Introduction
What is EME?
EME, Enterprise Meta Environment, is an object oriented
data storage system that version controls and manages
various kinds of information associated with Ab Initio
applications, which may range from design information to
operational data. In simple terms, it is a repository, which
contains data about data – metadata.

Why EME?
• Avoid setup scripts, use EME parameters – more
standardisation
• Source Control
• Dependency and impact analysis for the graphs in the
repository

May 18, 2010


EME Datastore Settings
EME Datastore is a specific instance of EME in the
environment. This is a repository where different versions of
code and its related data like the record formats,
transformations etc are maintained. At any point of time a
user can connect to only one such EME repository instance.
To access an EME Datastore, go to Project>EME Datastore
Settings in the GDE Menu and details are to be filled up in
the following boxlike:

May 18, 2010


EME Datastore Settings
Following details are to be filled up in the EME Datastore Settings

Method : Remote Execution (Rexec)/Telnet


Host : The host where the EME Datastore resides
Login and Password : Unix Login credentials for the host
Co>Operating system Location : Path to where the Ab Initio
Co>Operating system is installed
EME Datastore Location : Path to where the EME Datastore is
located
Mode : Source Code Control
After filling in the detail press on the Connect button to test the
connection. If the details are filled in correctly you will get a
message box confirming the connection.

May 18, 2010


Project in EME
Project :
A Project is a collection of related graphs and its associated
elements like dml, xfr etc in the EME Datastore.

Project structure :
Typically a project should contains maximum of 5 to 10 graphs. This
helps in organising the code efficiently within EME. With increase in
the number of graphs in a Project, the time taken to perform
dependency analysis on the graphs and related data increases.
Before adding a Project to an existing application, which already has
a number of Projects in place, the impact it might have on other
Projects and on the Application as a whole must be considered.

May 18, 2010


Structure of a project in EME
/Projects mp Sub directory for AI Graphs

run Sub directory for deployed shell scripts


Project1

xfr Sub directory for transform files


Project2
dml Sub directory for record format files

db Sub directory for database interface files

bin Sub directory for tools and utilities

sql Sub directory for sql queries

May 18, 2010


Different Types of Project
Private vs. Public Project

• There is often information common to multiple Projects. For


instance several Projects may share some record format files or
transform files. Such elements which are used across Projects can
be made widely reusable by making them part of a Project and
including this Project in other projects to access the common
elements. A Project that is included by other Projects is termed as a
Public Project and the Projects including public Projects are known
as Private Projects.

• A public Project is public in the sense that their data and


metadata are expected to be shared with other Projects and a
private Project is private in the sense that their data and metadata
are not expected to be shared with other Projects.

May 18, 2010


Different Types of Project
The Environment Project (Stdenv)

There is a special Project associated with every instance of Ab Initio


environment known as the Environment Project or stdenv (Standard
Environment). This is no different from a regular Project in the
structure. It contains machine and Application specific settings like
the data directory mount points, max-core settings and application
wide parameters like current date, which are used across all
Projects. During creation of any Project, stdenv is included in it by
default. A single stdenv is required for an entire set of applications
on a single machine and sharing a single EME Datastore.

May 18, 2010


Concept of Sandbox
What is a Sandbox?

Projects held in the EME Datastore can’t be manipulated


directly. To work on Projects, they must be checked out to a
working area on the file system where we can develop and
modify code. This working area on the file system is known
as a Sandbox. It has exactly the similar directory structure
as that of a Project in the Datastore.

Each object that needs to be worked on is checked out to a


sandbox where modifications or enhancements are carried
out. After the changes are complete the code is checked in
from the sandbox area to the EME Datastore.
This action creates a new version of the code in the EME
Datastore.

May 18, 2010


Sandbox Projects vs. EME
Datastore Projects
Sandboxes are work areas used to develop, test or run code
associated with a given project. Only one version of the
code can be held within the sandbox at any time. The EME
Datastore contains all versions of the code that have been
checked into it. A particular sandbox is associated with only
one Project where as a Project can be checked out to a
number of sandboxes.

Check-in

Check-out

May 18, 2010


Parameters
A parameter is a name-value pair with some additional attributes that
determine when and how to interpret or resolve its value. Parameters
are used to provide logical names to physical location and should
always be used instead of hard coded paths in graphs. We can have
two types of parameters, graph and Project parameters.
Graph parameters
Graph parameters, as the name suggests are specific to the individual
graphs and are private to them. They affect execution of the graph for
which they have been defined. Graph parameters can be defined by
navigating to Edit>Parameters in the GDE which opens the graph
parameters editor.

May 18, 2010


Parameters
Project parameters

Project parameters are inherited by all the graphs in the Project and
are accessed from the GDE by the sandbox parameter editor in
Project>Edit Sandbox>Parameters. This shows a dialog box
prompting to enter the sandbox path. Choose the correct host and
the sandbox path and press OK to open the sandbox parameter
editor, which exactly like the graph parameter editor shown as
above.

May 18, 2010


Parameters
Editing parameters

To add a new Project parameter or to modify the value of an


existing one, we should first lock the parameters in the sandbox
parameter editor by clicking the lock button on the menu. If nobody
has locked it in their sandboxes, then the lock symbol turns green
indicating a successful lock. This implies we can add or modify
the parameters now. If a lock is already there before, then while
opening the parameter editor it shows a warning saying the
parameters are already locked and the lock symbol is red in such a
case. After getting a lock, others are disabled from editing the
parameters.

May 18, 2010


Major Parameters Attributes
Scope: Scope of a parameter can be formal or local. A local parameter
is internal to the sandbox and most of the parameters have their scope
as local. Its value is taken from the value column in the parameter
editor. A formal parameter is one whose value can be set from outside,
i.e. from the environment where the graph is run. Its value is supplied
from the command line. A green diamond can identify the formal
parameters with an arrow mark.
Kind: If scope is local, kind is left unspecified, but if it is formal, the kind
is automatically set to keyword.
Type: This determines the nature of the parameter. Project parameters
have four types as string, common Project, switch and dependent.
Graph parameters have different set of types.
Export: When this check box is checked, the corresponding parameter
value is exported as an environment variable, otherwise it is generated
as a local shell variable.
Private Value: If a parameter is specified as a private value, any
subsequent changes to it remain private to the local sandbox and are
not checked in into the EME.
This is useful when different users want different values for the same
parameter.
May 18, 2010
Major Parameters Attributes
Value: This column specifies the value of the parameter.

Interpretation: This determines how the parameter is going to be


evaluated.

Constant: Value is taken literally.


$ Substitution: Variables with $ prefixes are replaced with their values
${} Substitution: Variables within {} and with $ prefixes are replaced
by their values but other occurrences of $ are ignored.
Shell: Korn shell syntax is used to evaluate the value of the parameter.
Required: This attribute can take two values, required (the default) or
optional. If it is required, the value column can’t be left blank but if it is
optional, it can be left blank.

May 18, 2010


Version Control and Tagging
Each object under EME source control, which may be a file, a
directory or a Project, exist as a series of versions, each of which is
a representation of what was checked in by some user. It can
optionally have a textual description attached to it called as a tag
and a description as a comment. Each version is separately
numbered and can be accessed by either the version number or the
tag attached to it. Version numbers, which are integers and tags,
are global to the whole EME Datastore. Tags are the basic units
during migration of code across EME Datastore instances.

May 18, 2010


Dependency Analysis
• Using the EME, you can conduct project analysis of the
dependencies within and between graphs. The EME examines
the project and develops an analytical survey of it in its
entirety, tracing out how data is transformed and transferred,
field by field, from component to component.
• Using Dependency analysis we can observe the following
things:
• How operations that occur later in a graph(“downstream“)
are affected by components earlier (“upstream”) in the
graph
• What data is operated on by each component, and how.
• All details of every component in the project.
• What happens to each field in a dataset throughout a
graph.
• All the graphs that use a particular dataset.

May 18, 2010


Dependency Analysis
There are four ways in which the dependency analysis can be
controlled :

1) It can be directly invoked in the GDE, by selecting Project >


Analysis……. To start the Analysis wizard.
2) It can be directly invoked from the command line, by running the
air project analyze- dependencies command.
3) It can be invokes implicitly at checkin, depending on the settings
you select in the Checkin wizard’s Advanced Options under the
Analysis tab.
4) You can also create an analysis_level parameter and use it to
specify the amount of dependency analysis you wish to take place
during checkin. The setting of this parameter(if it exists) restricts all
other analysis level settings.

May 18, 2010


Dependency Analysis from
command line
The air commands directly involved with dependency analysis are the
following :
• air project analyze-dependencies : invokes dependency analysis.
• air project enqueue : adds objects to the analysis queue.
• air project dequeue : removes objects from the analysis queue.
• air analysis expand : pre-computes dependency information for a
graph.
• air analysis uses : prints out the names of all objects that use a
specified object.
• air analysis conditions : prints out conditionalization information
about all components in a graph.

May 18, 2010


Dependency Analysis in the GDE
• This is exactly equivalent to invoking the air-project analyze-
dependencies command from the command line.
• The initial dialog has the following appearance.

• Host : This is the name of a host profile which you can select
from the drop-down list. You can edit the currently-selected profile
by clicking Hosts > Edit. You can create a new profile by clicking
Hosts > New. May 18, 2010
Dependency Analysis in the GDE
• Directory : The location you specify here will be where the air
command is run. It should generally be the pathname to the
sandbox location of the objects you wish to analyze.
• Project Directory or File : It should be the full pathname of the
project you wish to analyze.
Click the Advanced button to access the settings for the
actual analysis. It will open the Advanced Options dialog .

May 18, 2010


Dependency Analysis in the GDE
Analysis Level :
1)None : No translation or dependency analysis are performed.
2)Translation Only : Graphs are translated from GDE to datastore
format, but no error checking is done.
3)Translation with Checking : Graphs are translated into
datastore format, and errors in the graphs that will interfere with
dependency analysis are checked for.
4)Full Dependency Analysis : Full dependency analysis is
performed.
The difference between this and Translation with Checking is
that when Full Dependency Analysis is specified, the analysis results
are saved to the datastore; with Translation with checking, they are
not
The default is Translation with Checking.

May 18, 2010


Dependency Analysis in the GDE
The tag page allows you to specify a tag that will be created and
used to tag the objects analyzed.
A tag is a piece of descriptive text, made up by you, which
you wish to associate with the version of the objects you are
checking in.
A tag must begin with a letter. Tags cannot contains forward
slashes(“/”), nulls, newlines, or tabs. They can be of any length.

May 18, 2010


Dependency Analysis in the GDE
Clicking Next> in the wizard’s initial dialog brings you to the
Analysis wizard’s confirmation dialog, which displays the message
”Analysis is ready” and offers a Details button which you can click
to see a summary of the operations the wizard is about to perform

Click Do Analysis to perform the analysis.

May 18, 2010


How to access EME
Ways to access EME in BT environment :
• GDE

• EME web Interface using valid user and password

May 18, 2010


Setting up the GDE for EME Access
• The Ab Initio GDE must be set up to connect to both the EME and the
development server.
• The following screen shots are taken from GDE 1.13.
• In the GDE, choose the menu option Project | EME Datastore Settings

May 18, 2010 28


EME web Interface using valid
user and password
Logging into the web interface :
• Launch either Microsoft Internet Explorer or Netscape Navigator
• Specify the URL address of the Web interface login page. You need to
know the name of the server host and its port.
• The server-host identifies the host where the AIW is installed.
• The server-port identifies the port on which the http server is listening.
• If 80 is the default port of server-host, the syntax of the URL for AIW
login page is : http://server-host/abinitio
• If 80 is not port number of server-host, the syntax of the URL for AIW
login page is : http://server-host: port-number/abinitio

May 18, 2010


EME web Interface using valid
user and password
• Click Show to show the log-in options.
• Enter a valid User ID and Password to log in to the datastore host.
• Enter the full pathname to the EME datastore you want to connect to.

• In the Home Object field type the datastore location you would like to
be able to navigate to quickly.
• Click Login.
May 18, 2010
Check out of files using GDE
Check out :

The Ab Initio GDE provides wizards to check out code from the EME
to sandbox. Check out updates the sandbox with the particular
version of code that is being checked out from the EME. By default
the latest version of any object is checked out, but we can check
out any version of code we want. Any object that is version
controlled in the EME Datastore can be checked out to a sandbox,
which may be pre-existing or may be created during check out
process itself. While checking out a Project or any objects belonging
to the Project to a sandbox, stdenv and any common Projects
associated with it also need to be referenced in the sandbox. If the
sandbox to which you are checking out is an existing one, it would
have the information as to where to reference for the common
projects (The stdenv sandbox and the public sandboxes). In case it
is a new sandbox, during check out we have to point to the stdenv
and public sandbox (if any) paths.

May 18, 2010


Check out of files using GDE
Check out wizard is invoked by navigating to Project>Check Out,
which looks as follows:
• Select the Project /directory or file you want to check out by
browsing to the particular Project /directory or file.
• In sandbox host dropdown list select the host on which the
sandbox resides.
• Enter the path to an existing sandbox (the sandbox must be
associated with the concerned Project, which is being checked out)
or mention a new one in the directory field, which would be created
during check out.

May 18, 2010


Check out of files using GDE
The advanced options dialog can be seen by clicking on advanced
button.
• The first two options specify whether to check out the required
files from the parent project and whether to check out required files
from the common Projects. The default is check out the required
files from the parent project. A file is required if it is directly
referenced in a graph or if it is referenced in an include in a dml or
xfr. While checking out a whole project these two options are
disabled as shown above.
• Run host setup script makes sure to run the host profile’s set up
script before check out and mark files read only on check out does
exactly what it says. The default is on for both of these options.

May 18, 2010


Check out of files using GDE
• We can select a particular tagged version of the object we want to
check out from the tag drop down list. By default the latest version
is checked out.

• On clicking next, if the sandbox doesn’t exist then a confirmation


is asked whether to create the new sandbox or not. Clicking yes
creates the sandbox and checks out the object mentioned to this
sandbox.

May 18, 2010


Check out of files using GDE
• You will be prompted to enter the sandbox locations of stdenv
and any common projects associated with the project, unless the
sandbox has already these values specified or the sandbox is a pre-
existing one.

May 18, 2010


Check out of files using GDE
• Clicking on Do Checkout performs the checkout operation and on
its completion a window shows the operations performed.

May 18, 2010


Locking
• A lock must be acquired on the object to be modified in the sandbox
after successful completion of check out. To modify a graph that has
been checked out, first open the graph in the GDE and then click on
the lock symbol on the menu. This checks whether the version in the
sandbox is the latest version of the object in the Datastore and if it is,
the lock symbol turns green showing that the graph is now locked
and is editable.
• If the graph has already been locked in some other sandbox, after
opening the graph in the GDE the lock is red in colour denoting that
there is already a lock on it. A lock can be acquired on an object only
if the sandbox version and the current version of the object in the
EME are the same.
• Once a lock is acquired and the changes are complete the object
must be checked in to the Datastore to create a new version in the
Datastore.
• For non-Ab Initio objects which can’t be locked from the GDE, a lock
can be obtained from the Unix command line using the air commands
available to obtain a lock on the particular object.

May 18, 2010


Check in of files using GDE
Once the project files have been edited and updated they need to be
checked in to create a new version in the EME datastore, which will be
available for other users. Check in wizard is invoked by navigating to
Project>Check in. Before starting the check in wizard, it checks for any
unsaved file in the sandbox and prompts whether to save them or not.
The check in wizard looks as follows:

May 18, 2010


Check in of files using GDE
• Choose the Sandbox host from the drop down list.
• In the Directory or file field, browse to the particular file in
the sandbox that you want to check in. You may select a file
under the sandbox or you may also select the whole sandbox
in which case the whole project would be checked in to the
EME datastore.
• Browse to the parent Project in Project Directory field, which
points to the Project directory in the EME datastore where the
object would be checked in.
• To go to the advanced options in check in click on the
advanced button.
• The check in tab indicates how you want the check in to be
performed. By default “Force overwrite” is unchecked. Once it
is checked the object is checked in even if there are conflicts
and becomes the latest version in the datastore. “Run Host
Setup script” causes to run the host profiles set up script
before each check in. It is advised not to change any settings
here.

May 18, 2010


Check in of files using GDE
The analysis tab specifies how much dependency analysis is done and on
which objects during check in.

May 18, 2010


Check in of files using GDE
• A tag, which is a descriptive piece of text and a comment, can be
attached to the version that will be checked in. This can be mentioned
in the tag tab of advanced options dialog box. The tagging standards
are described in another document.

May 18, 2010


Check in of files using GDE
• After filling in the tag information, on clicking next in the check in
wizard a check in ready dialog is displayed.

• Clicking on “Do Checkin” performs the actual check in and


displays a window similar to the “check out finished” window
with the results of check in and dependency analysis (if
specified in the advanced option).
May 18, 2010
Working with previous versions of
Graph/Object
Many a time a previous version of a graph may be required to check out
and update rather than working with the latest or current version of the
graph as available in the EME data store. Using check out wizard in GDE,
you may check out a tagged version of a graph, which is not the latest
version available. But GDE doesn’t allow locking such versions. In such a
case, the procedure to be followed is:
• Check out the required previous tagged version of the graph to
your sandbox. (V1 in figure below).
• Check it back in with “Force Overwrite” in advanced option in
check in wizard.
• This will make it the current version in the data store.
(V4 in figure below).
• Lock the graph now to make the changes.
• Check in the graph back to the EME data store. This updated
version will become the latest version in the EME data store.
(V5 in figure below)

May 18, 2010


Working with previous versions of
Graph/Object

May 18, 2010


Air Commands
• The air utility is a command-line or shell-script interface for
administering and managing the EME.
• All air commands have the following command-line syntax.
air[-root root] [-version version | tagname][-remote]
 -root root specifies root of the datastore.(Optional)
 -version version (Optional) specifies the version number
of the datastore to be accessed by the air command. The
version number can be specified as an integer or as a tag.
 A tag is a symbolic name that you can associate with a
specific version when you check in a project.
 -remote specifies that the air utility should start the EME
server remotely even if a direct connection is possible.

May 18, 2010


Air object group
The air object commands manipulates objects and directories in the
datastore.

• air object access -Tests whether or not you can read, write,
execute or see the existence of the specified object, given your limited
permissions and the current state of the datastore being accessed as
rpath. The command returns either 1 (True) or 0 (False)

air object access rpath[ r| w| x| f]

rpath Path to a datastore object, or object ID.


r Optional. Tests if you can read the specified object.
w Optional. Tests if you can write the specified object.
x Optional. Tests if you can execute the specified object.
f Optional. Tests if you can see the existence of the specified
object.

The shortcut of this command is air access.


May 18, 2010
Air object group
• air object cat - Writes the literal contents of a datastore object
to standard output.

air object cat rpath

rpath Path to a datastore object, or object ID.

The shortcut of this command is air cat

• air object cd – Changes the current working directory within


the datastore. This command is available only in the batch and
interactive modes.
air object cd rpath

rpath Path to the datastore directory you want to change to.

May 18, 2010


Air object group
• air object chmod – Changes the permissions of one or more
datastore objects.
air object chmod [-Rf] [[+|-] permissions|
+nonversioned | -nonversioned | + inherit | - inherit ]
rpath1 rpath2 ………
-R Optional. Recursively traverses subdirectories of rpath.
-f Optional. Attempts to continue even if the operation fails
for one or more rpaths .
permissions Specify permissions as an octal number, using
standard UNIX conventions.
If the octal permissions number is preceeded by a “+”
it means add (or) in the specified bits with the
existing bits. If preceeded by a “-” it means remove
the given bits from the existing bits.

May 18, 2010


Air object group

nonversioned As an alternative to the octal notation, you can


specify nonversioned, which labels an object as
“nonversioned” : only the most recent version of the
object (if not removed) will be saved. You set it by
specifying “+nonversioned” and clear it by specifying
“-nonversioned”.
inherit Specifies that any future objects created in the
directory specified by rpath should automatically
inherit the permission buts of the directory at
creation time.
rpath Path to a datastore object, or object ID.

May 18, 2010


Air object group
• air object chown – Changes the ownership of one or more
datastore objects. Only the EME administrator can perform this
operation.

air object chown[-Rf]user[.group] rpath......

-R Optional. Recurses into rpaths if they refer to a directory.


-f Optional. Attempts to continue even if the operation fails
for one or more rpaths.
user New owner expressed as a valid user name.
.group Optional. Group designation, preceded by a period.
rpath Path to a datastore object, or object ID, whose
ownership you want to change. You can specify multiple
paths.
The shortcut for this command is air chown.
May 18, 2010
Air object group
• air object cp – Creates a copy of the object known as rpath1 in
a location within the datastore. The command copies rpath1, retaining
links to any other objects in references.

air object cp rpath1 [-R] rpath2

rpath1 Path to a datastore object, or object ID, to be copied.

-R Optional. If specified, command copies the specified object


rpath1 and all objects referred to by rpath1. If not specified,
the specified object rpath1 is copied, but objects referred to
by rpath1 will be linked but not copied.

rpath2 New datastore path for the copied object.

The shortcut for this command is air cp

May 18, 2010


Air object group
• air object create-parameter-set – Creates a
parameter ser file from the parameters associated with a datastore
object.

air object create-parameter-set file graph_path

file Name of the parameter ser file. It should have an


extension of .pset
graph_path Path to the graph in the datastore.
switch The name of a switch parameter.
binding A value specified for switch.

The shortcut for this command is air create-parameter-set.

May 18, 2010


Air object group
• air object delete – Marks one or more objects, and other
objects in the same domains, for permanent removal from the
datastore.

air object delete –R –whole–domain [-force] rpath [ rpath2…]

-R If rpath is a directory, all of its directory child and


grandchildren and all objects in the descendants’
domains will be marked for deletion.
-whole-domain All objects in the same domain as the object
specified by rpath will be similarly removed.
-force Do not prompt for confirmation of deletion.
rpath rpath2… Datastore paths to the objects you want to delete.

The shortcut for this command is air delete.

May 18, 2010


Air object group
• air object In : Creates an additional name , rpath2, for the
object rpath1.

air object In rpath1 rpath2

rpath1 Path to a datastore object, or an object ID.


rpath2 New name for the existing object.

The shortcut for this command is air In.

• air object mv : Renames the object rpath1 to rpath2.


air object mv rpath1 rpath2

rpath1 Path to the datastore object you want to rename.


rpath2 New name for the object rpath1.

The shortcut for this command is air mv.


May 18, 2010
Air object group
• air object load : Loads objects from a file in portable
interchange format. This file was created by running the air object save
command. The objects are loaded to the datastore location
corresponding to the ones in which they were saved.

air object load [-table-of-contents] file-name

file-name Standard UNIX file-name of the file storing the


objects you want to load. If file-name is -, then
the data is read from standard input.

-table-of-contents Specifies that nothing be loaded, but instead


print a listing of the file’s contents.

The shortcut for this command is air load.

May 18, 2010


Air object group
• air object revert : Reverts one or more objects to an earlier
version. Objects newer than specified version are deleted from the
datastore.

air object revert [-revert-links] version rpath……

-revert-links Optional; if specified, then objects linked to rpath will


also be reverted. If not specified, no action is taken on linked objects.

version Version number (integer) or tag (string) of the


datastore that you wish to back out to.

rpath… Path to an object, or object ID.

The shortcut for this command is air revert.

May 18, 2010


Air object group
• air object mkdir : Creates a datastore directory. Parent
directories are created as necessary.
air object mkdir rpath

rpath Path of the directory you want to create.

The shortcut for this command is air mkdir.

• air object unlink : Unlinks a directory pathname of an object.

air object unlink[-f] [-no-project-remove] rpath

-f Specifies that no error be generated if rpath


does not exist.
-no-project-remove Specifies that the object be removed without
removing it from source code control.
rpath Datastore pathname of the directory.

May 18, 2010


Air object group
• air object rm : Removes an entry from a directory. If the entry
is in a datastore project, then the system performs an implicit air project
remove.

air object rm[-f][-r][-no-project-remove] rpath…..

-no-project-remove Specifies the object be removed without


removing it from source code control.
-f Specifies that no error be generated if
rpath does not exist.
-r Specifies that subdirectories of rpath be
removed recursively.
rpath Path to the directory entry you want to
remove.

The shortcut for this command is air rm.

May 18, 2010


Air object group
• air object save : Saves datastore objects to a file in portable
interchange format. You can subsequently load the file by running air
object load. You can use air object save to migrate objects from one
datastore to another.

air object save file-name rpath…


[-include{common| local| rpath…}]…
[-external { common| local| erpath…}]…
[-format {1| 2| 3}]
[-export epath…]
[-settings common| project…]
[-save-jobs]
[-analyzed]
[-no-default]
[-tag tag]
[-comment comment]
[-Stag tag]
[Scomment comment]
[Dtag tag]
[Dcomment comment]
May 18, 2010
Air object group
• air object versions : Displays the revision history of all
tagged versions or all versions of the specified object or the datastore.
History includes version number, date, user, tag, and comment for the
specified object.

air object versions[-verbose][rpath]

-verbose Optional. Displays all versions for an object or the entire


datastore. (If nor specified, only tagged versions will be
displayed.)

rpath Optional. Path to the object about which you want


version information. If not specified, displays
information for the datastore as a whole.

May 18, 2010


Air project group
The air project group of commands manages datastore projects in a
variety of ways, including creation and modification.

• air project add : Manually adds one or more files to a


datastore project.

air project add project-name relative-path...

project-name Name of the project to which you want to add


files.
relative-path... Relative path of the file to be added. You can
add multiple files.

May 18, 2010


Air project group
• air project clone : Copies the settings, but not the content, of
the specified project to a new project. Settings include the project
parameter, the directory list, the extension list, and common projects.

air project clone source-project dest-project

source-project Name of the project whose settings you want to


copy.
dest-project Name of the project into which you want to copy
the settings.
• air project convert : Upgrades a project to the latest format.

air project convert project-name [-basedir basedir] [-dry-run]

project-name Name of the project to be converted.


basedir Base path of the associated sandbox. If this is
specified, the sandbox will be converted too.
-dry-run Specifies that no actual changes be made to
anything.
May 18, 2010
Air project group
• air project create : Creates a new project and specifies
various project attributes, including its location parameter name, a
project parameter prefix, included common projects , and additional
entries in the directory list, import list, or extension list.

air project create project-name


-location location
[-prefix prefix]
[-common project-name...]
[-import-map name project subdir [ name project subdir]...]
[-extension pattern type [pattern type]...]
[-format n ]
[-nodefault]
[-remove]

May 18, 2010


Air project group
• air project default : Restores the default values of a
project’s directory list, extension list, or project parameter, or all three.

air project default project-name


{[-extensions][-parameters] | [-all]}
[-location location][-prefix prefix]

-extensions Optional. Restores the default extension list.


-parameters Optional Restores the default project parameters.
-all Optional. Restores the directory list, the extension
list, and the project parameters to their defaults.
-location Specifies location of the project.
-prefix Specifies a prefix for the set of predefined project
parameters.

May 18, 2010


Air project group
• air project export : Exports a project from the datastore into
a file system sandbox.

air project export project-name


[-basedir root-dir]
[-dry-run]
[-force]
[-no-read-only]
[-cofiles]
[-parameters]
[-find-required-files]
[-files relative-rpath...]
[-set parm-name value]...
[-common rpath common-root-dir]...]

-cofiles Optional. In the event of a conflict between the datastore


and the file system, export creates a conflict file named
file.conflict.extension. If not specified, the system displays
error messages and halts the operation.

May 18, 2010


Air project group
• air project files : Lists information about one or more files in
a datastore project.

air project files project-name[-basedir basedir]


[-all][-verbose][-sizes] –files file-name...}

-basedir basedir Optional. Path to your sandbox directory. If nor


specified, the output includes only the datastore
time stamp and version.
-verbose Specifies that additional timestamp-related
information be reported, This may be useful in
diagnosing problems with the source code control
system.
–all If specified, shows information about all files in the
project.
-sizes Specifies that the sizes of files be displayed.
-files file-name... If specified, shows information about one or more
files in the project.
May 18, 2010
Air project group
• air project find : Lists project files, optionally filtered by either
MIME type or name, and displays their project-relative paths.
air project find project-name
[-type type-pattern] | [-name name-pattern]
-type type-pattern Optional. Filters objects by MIME type. The
filter can select either a single object class or a
set of MIME types.
-name name-pattern Optional. Filters objects by name, expressed as
a string, The filter supports wildcards.
• air project get-required-files : Finds the files that the
specified graph depends on. The graph must previously have been
analyzed by the GDE. This command finds the record format files,
transformation files, database configuration files and .sql files accessed
within a graph, The output consists of sequence of alternating project-
name relative-path pairs.
air project get-required-files project-name[-force] relative-path
-force Causes the command to reduce a possibly incomplete
set of required files if an error is encountered.
relative-path Relative path of the graph whose files you want to find.
May 18, 2010
Air project group
• air project import : Imports data from the file systems into
an existing EME project.

air project import project-name


-basedir basedir [-dry-run][-force][-lock]
[-auto-add][-create]
[-no-read-only][-noparse]
[-files file...][-lock-project]
[-wait][-tag tagname [-comment tag_comment]]

-basedir basedir Specifies the path to the sandbox. The default

sandbox is the value of the environment variable


named by the project’s location parameter.
-dry-run Optional. If specified, generates diagnostics but
does not change the file system of datastore.
-force Optional. If specified, does not check for datastore
conflicts. The import succeeds even if changes made
in the datastore are overwritten. If not specified
and you have conflicts, import fails.
May 18, 2010
Air project group
-lock Optional. If specified, import retains your locks on files in the
project. By default, import unlocks files after importing.
-auto-add Specifies that files found in the file system be automatically
added to the project.
-create Specifies that the project be created before the import takes
place. The location parameter and prefix are determined
from the sandbox.
-no-read-only Specifies that the imported files be left writable, if they
were writable previously. Normally, files are not made
writable after import until an air sandbox lock command
is issued.
-noparse Specifies that DML and XFR files in the import be marked
for reparsing at a later time. Normally, these files are
reparsed during the import, in order that dependency
structures can be built and errors checked for.
-lock-project Specifies that a lock on the project be held while the
import proceeds. The command will fail if the lock is not
obtained immediately, unless –wait is specified.
-wait Specifies that the command wait until a lock is available.

May 18, 2010


Air project group
• air project map-url : Displays the location to which a URL is
mapped by a project.

air project map-url project-name URL ...

project-name Name of the project.


URL ... Path of the file specified (which can contain a dollar
sign).

• air project mkdir : Creates a directory in the specified


project.

air project mkdir project-name [-no-export] directory-name

-no-export Optional. If specified, the directory is not exportable


project-name Name of the project.
directory-name Name of the directory to be created.

May 18, 2010


Air project group
• air project move : Moves a project from one place to
another within the datastore and fixes up internal references to the
project name.

air project move old-name new-name

old-name Original location of the project


new-name Location to which the project is to be moved.

• air project reparse : Reparses some or all of the record


formats and transforms in a project.

air project reparse project-name [-all][-force] file ...

-force Optional. Tries to check in the result of reparsing even if a


serious error is encountered.
-all Reparses all known record formats and transforms in the
specified project.
May 18, 2010
Air project group
• air project revert : Reverts one or more objects in a project
to an earlier version.

air project revert project-name –version version [relative-path]

-version version Version number or tag of the datastore to which


you want to revert the object.
relative-path … Optional. Relative path of the object to be
reverted. If you omit relative-path, the entire
project is reverted.

• air project remove : Manually removes one or more files


from a project. This sets the file’s type to “remove”, which will cause
subsequent exports to remove the file from sandboxes.
air project remove project-name [-r] rpath…
-r Specifies that the contents of directories be recursively
removed. If this is not specified, only empty directories can
be removed.
rpath … Path to the file you want to remove. You can specify
multiple
files. May 18, 2010
Air project group
• air project reset : Removes all entries from the list of
common projects, the directory list, the import list, and the extension
list, as well as all project parameter except the location parameter.

air project reset project name


[-common]
[-directories]
[-import-map]
[-extensions]
[-parameters]
[-all]
-common Removes all references to common projects.
-directories Removes all entries from the directory list.
-import-map Removes all entries from the import list.
-extensions Removes all entries from the extension list.
-parameters Removes all project parameters.
-all Remove all entries from the common projects
list, the directory list, the extension list, as
well as the project parameters.
May 18, 2010
Air project group
• air project set-executable : Ensures a file will be marked
“executable” when it is exported.
air project set-executable project-name rpath [-not]

rpath Relative path if object within project.


-not Specifies that the file be marked not executable.

• air project set-type : Overrides the MIME type assigned to a


single datastore object.

air project set-type project-name relative-path MIME-type

relative-path Relative path to the object whose MIME type you


want to change.
MIME-type Valid MIME type.

• air project show : Displays project attributes in human-


readable form.
air project show project-name
May 18, 2010
Air project group
• air project show-common : Displays a space-separated
list of the specified project’s common projects in the order in which they
need to be loaded. Each common project is described by a full datastore
path.
air project show-common project-name

• air project show-queue : Displays files queued for analysis.

air project show-queue project-name.

• air project tag : Tags a project and all the files it contains.
air project tag project-name tag [comment]

project-name Project to be tagged.


tag Tag(string) to be assigned.
comment Optional. Associated comment for this version.

May 18, 2010


Custom Components

July 6, 2010
Building a Custom Component(GDE)
• Building a Custom Component for use in the Ab
Initio GDE is straightforward.
• The user can build a Program specification file which has
all the capabilities of a built in Ab Initio Component. That is, all
the visual capabilities expected within the GDE and all of the
capability to perform in a graph at run time with all the
capabilities of the program wrapped within the file.
• Program specification files provide the Co>Operating System
with the information it needs to run program or shell script.
• Program specification files should be with .mpc extensions
• All program specification files must start with <mpcfile>
• The <mpcfile> line is followed by a series of attribute:
value lines that describe the attributes of program.

May 18, 2010


Syntax of .mpc file
<mpcfile>
[ label: label ]
[ author: author-name ]
[ version: version-number ]
[ comment: comment ]
image: path
[ exit: code ]
[ port: type direction name [location] [ordering]
[fan-preference] [min-flows] [max-flows] [record-format] ]
[ environment : env-variable value ]
[ argument: literal value1 [value2 ...] ]
[ argument: flow portname ]
[ argument: partition ]
[ argument: depth ]
[ argument: file filename ]
[ argument: expression file filename ]
[ argument: expression string ]
[ argument: transform file filename ]
[ argument: transform string ]
[ parameter: placement name type default-value Þ
description [restrictions] ]
[ metadata type: dest-portname = {source-portname | value} ]
May 18, 2010
Example of .mpc file
A typical .mpc file wrapping a custom program
is simple in form, for example:
<mpcfile>
label: ”Parameterized dd”
author: ”Dil Bert”
version: "1.0"
comment: "Does a unix dd across a file or multifile"
image: "/u/dilbert/develop/run/par_dd.ksh"
exit: 0
port: npipe in infile 1 1
port: npipe out outfile 1 1
port: std out stdout 0 1
parameter: positional blocksize integer "" "*** bs
***" required
parameter: positional count integer "" "*** count
***" required
argument: flow infile
argument: flow outfile
argument: literal $1
argument: literal $2
May 18, 2010
Building a Custom Component(GDE)
The previous .mpc file par_dd.mpc describes a
component that uses the executable:
/u/dilbert/develop/run/par_dd.ksh

The executable Korn Shell program par_dd.ksh


reads as follows:
#!/bin/ksh
exec dd if=$1 of=$2 bs=$3 count=$4

The component has one input named infile, an


output named outfile and an output named stdout
for the output that would normally go to the UNIX
stdout when running the UNIX program dd

May 18, 2010


Program Specification File
• Must include the <mpcfile> line and the image line, all the
other lines are optional.
• Can use as many port, argument, parameter, and metadata
type lines as you need to describe your particular program.
• Comments in a program specification file are denoted by the
following:

# Comments are indicated by a # sign at the


# beginning of a line or
// to the end of the line is a comment or
/*
A block of text using "C”-like comments
*/

May 18, 2010


Creating new .mpc file from the GDE
To write a program specification file in the Component
Organiser:

1.In the GDE, open the Component Organiser.


2.Right-click My Components.
3.From the shortcut menu, choose New > Program. A New
Component icon appears under My Components.
4.Do one of the following: Enter a name in the New
Component icon. Press Enter to accept New Component as
the name.
5.Right click the New Component icon. A pop-up menu
appears.
6.From the pop-up menu, choose Edit As text. The Edit
Program Component window opens.
7.Write your own program specification file by editing the
template in the window.

May 18, 2010


Steps to create new Component

May 18, 2010


Steps to create new Component

May 18, 2010


Steps to create new Component

May 18, 2010


Descriptions of Attribute & Variables

The label tag :

This line specifies the name you want to appear for


the component in the Component Organiser and
on the component icon in the GDE

label: ”Parameterized dd”


.

May 18, 2010


Descriptions of Attribute & Variables

The mpname tag :

The mpname tag is only used for components


built into the Co>Operating System.

mpname: “unitool”

May 18, 2010


Descriptions of Attribute & Variables

The documentation tags :

These three tags are optional.


author: “John Smith”
version: “1.0”
comment: “Comments visible in GDE...”

If the version tag contains an Ab Initio


Co>Operating System Version, the GDE is
intelligent enough to determine at runtime
whether there is a Version match

May 18, 2010


Descriptions of Attribute & Variables

The image tag :

The executable image of your process must be


supplied. This can be any executable (including a
shell script) and must be the absolute path to the
image source. This path can be determined at
runtime with a UNIX environment variable.

image: “name-of-executable”

May 18, 2010


Descriptions of Attribute & Variables

The exit tag :

The optional exit tag tells Ab Initio what exit codes


indicate a successful program termination.

exit: 0 // the default

Use either an integer value or one of the following:


any, positive, non_positive, negative,
non_negative, even or odd.

May 18, 2010


Descriptions of Attribute & Variables

The environment tag :

The environment tag is used to define the


runtime variables available to your program.
environment: variable value

These may be literals or variables that exist in the


graph’s runtime environment:
environment: LAST_KEY “1234”
environment: DISPLAY $DISPLAY

May 18, 2010


Descriptions of Attribute & Variables

The port tag :

The port tag describes the inputs and outputs of


the component. Use one line per port.

port: type direction name [location]


[ordering] [fan-preference] [min-flows]
[max-flows] [metadata]

May 18, 2010


Descriptions of Attribute & Variables

Port usage

The type indicates how the input or output is


implemented. Use one of the following:

• Soc : Use the Ab Initio Socket interface, part of


the Ab Initio C++ Development Environment
• File : Write a temporary file (required for
random access operations on the port)
• npipe : Use a named pipe
• std : Use UNIX stdin or stdout

May 18, 2010


Descriptions of Attribute & Variables
Port usage

• The direction indicates whether this is an in or


out port.

• The optional ordered argument indicates that


the order flows are attached is important.

• The optional multiple argument indicates


whether the port permits fans in or out. The
default is to allow straight flows only.

• The default for 'in' is the left, and 'out' is on the


right
May 18, 2010
Descriptions of Attribute & Variables

Port usage

• The optional location argument indicates where


the GDE should draw the port. Use one of top,
left, bottom, or right.

• The optional min-flows and max-flows


arguments indicate how many flows can be
attached to the port.

• A value of zero for min-flows indicates that a


flow connection is optional.

May 18, 2010


Descriptions of Attribute & Variables

Port usage

The optional metadata argument sets the


metadata on the port. Use one of the following,
including the quotes:

"=metadata-string"
"&Remote-File-Path"
"LLocal-File-Path"

May 18, 2010


Descriptions of Attribute & Variables

Port Example 1

The following input port definition indicates that a


named pipe is used for input, the port name is
“indata”, the minimum number of connections is 1
(which means that the input must have a flow
attached), and the maximum number of
connections is 1.

port: npipe in indata 1 1

May 18, 2010


Descriptions of Attribute & Variables

Port Example 2

The following output port definition indicates that


stdout is used for output, the port name is
“stdout”, the minimum number of connections is 0
(which means that the output is optional), and the
maximum number of connections is 1.

port: std out stdout 0 1

May 18, 2010


Descriptions of Attribute & Variables

Port Example 3

The following input port definition indicates that a


named pipe is used for input, the port name is
“indat”, the port will be drawn on the top of the
component, the port must have exactly 1
connection, and the metadata for the port is
predefined.

port: npipe in indat top 1 1 "=string('\


n')"

May 18, 2010


Descriptions of Attribute & Variables

The parameter tag :

The parameter tag defines the parameters which


the user will set within the GDE. These will
become input arguments to the executable in a
defined order at runtime.

parameter: placement name type default


[description] [restrictions]

May 18, 2010


Descriptions of Attribute & Variables

Parameter usage :
• The placement is either positional or
keyword.
• positional parameters are sent to the
executable in the given order.
• The GDE may re-order keyword parameters
and will pass the values to the executable
preceded by the a dash (-) and the keyword
name.
• The name indicates the name of the parameter
as seen in component properties on the
parameters tab. It is also the switch used for
keyword parameters.
May 18, 2010
Descriptions of Attribute & Variables

Parameter usage :

type describes what kind of parameter in this. It


must be one of the following:

Expression, transform, integer, float, string,


expression file, dataset, metadata, transform,
Layout, date, mode, protection, bool, special,
choice, literal ,infile, outfile

May 18, 2010


Descriptions of Attribute & Variables

Parameter usage :

• The default value for the parameter is given


within a quoted string. For choice parameters, the
first listed is the default
• The description is an optional, short, quoted
string describing the parameter’s meaning. It is
displayed within the GDE
• The restriction indicates whether this
parameter is optional or required (the default).

May 18, 2010


Descriptions of Attribute & Variables

Parameter Example 1 :

A positional parameter named collator which takes


an expression as its value. It is defaulted to a null
string and carries the description “Aggregation
key”

parameter: positional collator expression


“” “Aggregation key”

May 18, 2010


Descriptions of Attribute & Variables

Parameter Example 2 :

A keyword parameter named force which is a


boolean, has a default value of “True” and carries
the description “Adds -force to…”

parameter: keyword force bool "True" "Adds


-force to mp line if set to TRUE”

May 18, 2010


Descriptions of Attribute & Variables

Parameter Example 3 :

A positional parameter named count which takes


an integer as its value, is defaulted to a null and
has the documentation “*** count ***” and is a
required to be set in the GDE.

parameter: positional count integer “” “***


count ***” required

May 18, 2010


Descriptions of Attribute & Variables

The argument tag :

• The argument tag defines the arguments which


are presented to the executable at runtime.

argument: type [value]

• The argument type must be one of the


following:

literal, flow, partition, file, depth, expression


file, transform, transform file

May 18, 2010


Descriptions of Attribute & Variables

Argument usage :

•literal arguments in quotes are sent to the


executable as an exact text string:
argument: literal “-file_in”

• literal arguments such as $n, where n is an


integer, refer to the defined parameters. The
executable will receive the parameter value as an
argument:

argument: literal $1
//send the first parameter

May 18, 2010


Descriptions of Attribute & Variables

Argument usage :

• flow arguments send the runtime definition of


the named port:
argument: flow inport
argument: flow outport

• For npipe and file ports, the full filename will be


sent as an argument.

May 18, 2010


Continuous Flows

July 6, 2010
Outline

• What are continuous flows?


• Why continuous flows?
• How do they work?
• How do you can use them?
• Practice exercise

May 18, 2010


What are Continuous Flows ?
• A continuous job is a job which produces usable
output before it ends.
• Continuous flow graphs are intended to run for an
indefinite period of time, continually taking in new
input, and continually producing new output.

May 18, 2010


Why Continuous Flows ?

Performance
We don’t have to pay for job startup costs

Latency
Results are available sooner

Flexibility
Enables processing from unreliable data sources.

May 18, 2010


Key Concepts

• Publisher and Subscriber

• Compute Points

• Checkpoints

• Queues

May 18, 2010


Publisher and Subscriber
• Subscriber : a component which reads data from
some source and originates Checkpoints and
Compute points.
• Publisher : produces output

All data starts in a Subscriber and flows to a Publisher.

May 18, 2010


Compute point

• Causes components in a job to do any


pending computations
(sort sorts, in-memory components do their thing, etc.)

• Causes publisher-type components to


make data available.
• Many active at once in same graph
• No launcher or agent involvement

May 18, 2010


Checkpoint
Is a compute point which...
• Saves enough state so that a job can be
restarted there.
• Heavyweight (only one at a time)
• Potentially large latency penalty
• Point at which job can be cleanly
shutdown
• Involves the launcher and agents
May 18, 2010
Queue
• Queues are data storage for continuous
flows
• Used by our publisher/subscriber
components
• Look like a multidirectory
• Managed by m_queue command
• Not yet visible in GUI

May 18, 2010


How queues work ?
Publishing

• Publisher writes file into main queue directory


Publisher makes hard link to file just created in each
subscriber subdirectory.
Publisher deletes file in main directory
• Publisher updates cursor file in main directory

Subscribing

• Subscriber looks for next file


Waits for cursor in main directory to be “greater” than
desired file
Starts reading new file when allowed by cursor
• Deletes file when data from it is fully committed
May 18, 2010
Queue Structure
cfqueue/:
cursor subscribe.sub1 subscribe.sub2

cfqueue/subscribe.sub1:
cursor data.00000003 data.00000004
data.00000005

cfqueue/subscribe.sub2:
cursor data.00000002 data.00000004
data.00000001 data.00000003
data.00000005

May 18, 2010


m_queue Command
Used to create, delete and maintain queues

m_queue create [-f] directory


m_queue subscribe [-f] [-dynamic [-quiet] | -static]
[-timeout seconds] directory id
m_queue reset-subscriber [-f] [-dynamic [-quiet] | -
static]
[-timeout seconds] directory id
m_queue reset-publisher [-f] directory
m_queue unlock directory [id]
m_queue unsubscribe [-f] directory id
m_queue stop directory id
m_queue end directory id
m_queue touch directory id [generation]
m_queue delete [-f] directory

May 18, 2010


Restrictions on Continuous Graphs

• All components must be continuously enabled.


• All components with no outgoing flows must be
publishers.
• All subscribers must issue checkpoints and

compute points in the same sequence.


Near in time is probably also a good idea
Must issue same sequence on recovery

• There must be at least one publisher


• Job must be single-phase

May 18, 2010


Shutting down jobs

• m_shutdown <job name>


Causes subscribers to exit cleanly after next
checkpoint

• m_queue stop <queue URL>


Causes queue subscribers to exit “soon” (not
recommended)

• m_kill <job name>


Kills job immediately, can be recovered by re-running.

May 18, 2010


Job Recovery

• Restart by re-running
• Job restarts from last committed checkpoint
• Tracking information is cumulative from job
start
• Checkpoints, queues, etc. all cleaned up
when job exits (example is shutdown with
m_shutdown)

May 18, 2010


Publisher Component
Writes to a queue (or a file)

• queue - the URL of the output queue


(or file)

• publish_style
queue - output to an Ab Initio queue (default)
files_after - output to a sequence of files
appended - append results to a single file

Note: Layout of the component needs to match layout of the queue.

May 18, 2010


Multipublish Component
Like publish but…

• Multiple multipublishers may write to same queue


• queue - the URL of the output queue (or file)
• Useful for cases in which multiple, dynamic or
unreliable data sources need to be collected
together for processing

May 18, 2010


Subscriber Component
The Subscribe component has the following options:

• infile - file or queue to read from


• id - which subscriber to the queue am I? (for
queue subscriptions)
• more - what to do after finishing with current file
• checkpoint_trigger - how are checkpoints
generated
• package - DML functions for Checkpointing
• wait - (default true) wait for additional data
• nodelete - (default false) don’t delete consumed
data

May 18, 2010


Subscribe more Parameter
The Subscribe component’s more parameter must
have one of the following values:

• appended - one file, data written to the end of


file
• files_after - (default) get next file in lexical
sequence
• rotating - Assume current file will be renamed
and reopened. Connect to new file when it
appears.
• none - exit at end of file

May 18, 2010


Subscribe checkpoint_trigger
Parameter
The Subscribe component’s checkpoint_trigger
parameter must have one of the following values:

• time_interval - checkpoint every


AB_SUBSCRIBE_CHECKPOINT_INTERVAL seconds
• record_count - checkpoint every
checkpoint_interval records
• infile_boundary - (default) checkpoint at end of
file/queue element
• dml_driven - use the DML package
• none - never checkpoint

May 18, 2010


Subscribe dml_driven types
The Subscribe package has the following record types:

• temp_type - (user defined) available for user use


• info - contains useful information about state of the
subscriber
records_since_compute - number of records since the last
compute point or checkpoint
records_since_checkpoint - number of records since the last
checkpoint
at_eof - true if the subscriber has reached the end of a file
• out - output type from checkpoint function
checkpoint - if true, do a checkpoint now
computepoint - if true, do a computepoint now
repeat_check - if true, repeat the checking after doing the
checkpoint or computepoint

May 18, 2010


Subscribe dml_driven Functions
The Subscribe package has the following DML
functions:
temp_type::initialize(in, prev_temp, info, prev_out)
• Called on record immediately following computepoint or
checkpoint
out::check_before(in, temp, info, prev_out)
• Called after reading record, but before sending it to the
output flow.
• Allows “peeking ahead”
out::check_after(in, temp, info, prev_out)
• Called after sending current record to the output flow
out::check_event(temp, info, prev_out)
• Called while waiting at end-of-file

May 18, 2010


Continuous Scan/Rollup
These components are like in-memory scan and rollup but

• In-memory only
No spilling to disk.
Fail if they run out of memory.

• Maintain state since the start of the job.


• DML functions enable trimming of state at
checkpoint boundaries

May 18, 2010


Universal Subscribe/Publish
These components allow a user to write a program adhering to a
simple protocol to connect to any continuous stream of data.

• Components are like subscribe and publish


Same parameter names/meaning where appropriate
Command line - the command line of the program to
execute
• Program can be written for any
source/target
Programs have been written in C++, Java, Python
Used to connect to Vitria, MQ Series, …

May 18, 2010


Other enabled components
• Broadcast, Copy, Gather, Gather Logs, Replicate
• Reformat, Merge, Merge-runs, Filter by Expression
• Partition by {Key, Expression, Round-robin}
• Sort, Aggregate, Rollup, Scan, Normalize,
Denormalize
• Continuous {Scan, Rollup, Update Table}
• UniversalPublish, UniversalSubscribe

May 18, 2010


XML

July 6, 2010
Read XML

• Read XML reads an XML document from its in port


and writes records of data to its out port.
• There is one output record for each element of the
document.
• The record includes the type of element and the
hierarchical level at which it occurs.

May 18, 2010


Read XML
Parameters :

May 18, 2010


Read XML
Parameters :

May 18, 2010


Read XML
Runtime Behavior :

May 18, 2010


Write XML

• Write XML reads records from its in port and writes


an XML document body without a DTD ( Document
Type Definition ) to its out port.

May 18, 2010


Write XML
Parameters :

May 18, 2010


Write XML

Runtime Behavior :

May 18, 2010


XML Reformat

• XML Reformat parses or constructs XML documents.

• XML Reformat works like Reformat, but provides


additional built-in types and functions to support
function-based processing of XML documents.

May 18, 2010


XML Reformat
Built-in Types :

• xml_id : It represents a reference to a piece of


storage associated with an XML document, element, or
attribute.
• xml_element : It represents a single XML element. It
directly holds the element type and character data,
and has references of the element’s attributes and
child elements.
• xml_attribute : It holds an attribute name and value.

May 18, 2010


XML Reformat
Built-in Functions to parse XML Document :

May 18, 2010


XML Reformat
Built-in Functions to construct XML Document :

May 18, 2010


XML Reformat
Built-in Functions to construct XML Document :

May 18, 2010


Case Studies

July 6, 2010
Case Study 1
Create a graph which
• reads from the file insource with the record
format:
record
decimal(“,”) f1;
decimal(“\n”) f2;
end;
• checkpoints every 30 seconds
• writes to a queue with subscribers sub1 and
sub2.
Examine the state of the queue at various
intermediate points. Try killing and restarting the job.
Generate insource using the provided dribble
program.

May 18, 2010


Case Study 2
Leave Case Study 1 running.
Create a graph which subscribes to the queue from
Exercise 1 as id “sub1”. Rollup on field f1, and sum the
contents of field f2.
Send the result to a Publish Component that is
appending to a normal file.
Checkpoint on infile_boundary.

May 18, 2010


Case Study 3
Write a shell script and corresponding .mpc file to
replace all uppercase characters with lowercase
characters using tr.
Build a graph that reads visits.dat, runs it through
your new component, and writes it out.
Use named pipes for the input and output ports.
Use “tr [A-Z] [a-z]” to do the translation.

May 18, 2010


THANK YOU

May 18, 2010 152

You might also like