2013HW70753-EndSemReport-Sagar Agrawal
2013HW70753-EndSemReport-Sagar Agrawal
2013HW70753-EndSemReport-Sagar Agrawal
FINANCE TRANSFORMATION
TARDIS (Transparent Actuarial Reporting Database Insight – UNISURE)
By:
Sagar Agrawal
(2013HW70753)
November, 2017
FINANCE TRANSFORMATION
By:
Sagar Agrawal
(2013HW70753)
November, 2017
Abstract:
This document is a Finance Transformation deliverable and represents the interface macro design
for the UNISURE, covering the movement and transformation of data from the UNISURE
source system extract through to the Persistent Actuarial Database (PAD) for Product Families
Unitized NP Pensions, Unitized WP Pensions and Term Assurance. It will serve as the major
source input into the development of off-shore Technical specification for the same functional
scope.
Acknowledgements
I am highly indebted to Priyanka Katare for all the guidance and constant
supervision as well as for providing necessary information regarding the project &
also for the support in completing the project.
I would like to express my gratitude towards my examiners, Miss Sona Jain & Mr.
Naman Pandey for their kind co-operation and encouragement which help me in
completion of this project.
Version History:
Reviewer(s):
Contents
1 Introduction........................................................................................................................................................11
1.1 Purpose.......................................................................................................................................................11
1.2 Business Background..................................................................................................................................12
1.3 Approach.....................................................................................................................................................13
1.4 Scope...........................................................................................................................................................15
3 Tools Used...........................................................................................................................................................26
5 Archival Strategy...............................................................................................................................................28
7 Non-Functional Requirements..........................................................................................................................30
9 Source Data.........................................................................................................................................................32
9.1 Source Data Specification...........................................................................................................................32
9.2 Extract Process...........................................................................................................................................33
9.3 Staging Source Data...................................................................................................................................33
9.4 Source Data Dictionary..............................................................................................................................34
9.5 Identify the Movement Logic.......................................................................................................................37
11 Error Handling...................................................................................................................................................46
11.1 Source to Staging........................................................................................................................................46
11.2 Staging to PAD...........................................................................................................................................48
11.3 PAD TO MPF Error Handling...................................................................................................................49
11.4 Error Notification Approach.......................................................................................................................50
12 Database Overview............................................................................................................................................51
14 References...........................................................................................................................................................53
15 Appendix.............................................................................................................................................................54
16 Plan of Work.......................................................................................................................................................55
1 Introduction
1.1 Purpose
This document is a Finance Transformation deliverable and represents the interface macro
design for the UNISURE, covering the movement and transformation of data from the UNISURE
source system extract through to the Persistent Actuarial Database (PAD) for Product Families
Unitized NP Pensions, Unitized WP Pensions and Term Assurance. It will serve as the major
source input into the development of off-shore Technical specification for the same functional
scope.
This macro design is one of a number of Finance Transformation designs ultimately concerned
with the production of in-force and movement Model Point Files (MPFs) for consumption by
the Prophet application. For simplicity, the scope of this particular design is highlighted in
yellow in the diagram below:
P1L70 –Product
Families 1 to3
P1L70 – Product
Families 4 to 13
UNISURE
In-Force
And
AR
Movement
Model
Paymaster
(Refresh)
PAD Point Files
For Prophet
Alpha
Administrator
Other Sources
(manual policy
feeds)
Figure 1
This design therefore covers the extraction, transformation and loading of data items from the
UNISURE extracts into the target PAD database. The data transformed and stored is that
required to fulfil the stated requirements of the Finance Transformation Prophet Enhancement
team for the valuation of Product Families Unitized NP Pensions, Unitized WP Pensions and
Term Assurance.
The onward transformation of UNISURE data into Model Point Files (MPFs) for consumption by
Prophet will be covered within a separate macro design.
To meet this challenge, CUSTOMER is planning to transform its Finance function and the way it
serves its customers. The Finance Transformation programme has been established to address
this need through delivering the following outcomes.
Significantly simplify very complex data feeds and so reduce complexity in Prophet
modelling and reducing reconciliation effort; and
The PAD forms a key foundation of the Solution Architecture developed to support these
programmed outcomes, providing:
Data transformations which are documented, agreed and visible to the business via
Informatica’ s Meta Data Manager tool
Below are the product families to which Unisure data goes as modal point files.
Figure 2
1.3 Approach
This macro design has been generated from the following primary inputs:
2 – The specification of required signed off business logic based on stakeholder engagement
with the Finance Data Team and consequent validation with the reporting teams
3 – The representation of that business logic in ETL mappings, applied using the following
design principles to the physical and logical models of the PAD and its data transformations:
4 – Specific design activities in accordance with the principles listed above to determine a) the
physical PAD model, and b) whether a particular piece of logic should be applied between
source extract and PAD or between PAD and MPF. The scope of this design excludes PAD to
MPF data transformations for now. It would be included in other few weeks.
1.4 Scope
1.4.1 In-Scope
Consume the UNISURE source extract and populate the PAD according to stated
requirements and agreed business rules. The stated requirements are explicitly taken
from Prophet IFAD for Unitized NP Pensions, Unitized WP Pensions and Term Assurance.
Application of errors and controls processing in line with the Errors and Controls macro
design
Movements capture for Non New business and Non Incremental business.
Any Model Point File data production (In force and Movement MPF, GMPF, XMPF).
The production of any Data marts or the consumption of any Prophet results
Any interaction with Prophet Automation (the processes which control the interface
between the PAD and Prophet)
Any storing/archiving of source extracts once these files have been read into the PAD
staging tables
Reporting of controls data, trend analysis and reconciliation. This design supports the
storage of that data and the subsequent macro design will incorporate the reporting
functionality.
2.1 Assumptions
Assumption Justification
The UNISURE source extract, ICMS Commissions Consistent with Solution Architecture
and Asset share factors extract files would be expectations
produced on monthly basis and will be pushed to
a location on the Unix file system from where it
can be read by Informatica and the GOMD
scheduling tool
The UNISURE extracts will be in Fixed Width ASCII Specified in the UNISURE Source extract
format specifications
Multiple extracts will be provided. Specified in the UNISURE extract
specifications
Business rules for MPF variables which were not These variables are defaulted to maintain the
present in Unisure would be defaulted during the consistency of MPF variables across Product
MPF generation, so not being covered as a part families.
of loading into PAD
The movements provided on the contract engine Confirmed with FT BD team (Harj Cheema)
extract are understood to represent a super-set
of Finance Transformation requirements. These
will be grouped and filtered as required between
PAD and MPF to derive the movements in which
Prophet is interested.
As a part of DFR requirements, Headers and Based on the communication from Customer
footers will be included in Unisure Inforce and DFR team.
Movement extracts from September – 2009
onwards and accordingly the Pre and Post DFR
files shall be loaded into STG.
There is a potential chance of the occurrence of Based on the communication from
duplicates records in the inforce files due to the CUSTOMER team.
structure of inforce IFAD and these shall be
considered as genuine duplicates and shall be
processed to the stage tables
Approach 2:
A slim long temporary table will be created to
store all the surrogate keys for the data
corresponding to the current run/month and later
a table-minus-table is computed between the PAD
table and this temporary table to infer the logical
deletes from the source.
This temporary table would be partitioned based
on the mapping identifier and source to facilitate
a partition truncate once the logical deletes have
been made.
PERL SCRIPT
Target Files
UNISURE Source Files (139 MPF’s)
Inbound Staging Intermediate Tables
(Fixed width ASCII) PAD Outbound Staging
on Informatica
to store UNISURE
UNIX server data sets
GOMD
Figure 3
The description of the above components and process flow is given below.
GENERIC OMD
Generic OMD is an CUSTOMER scheduling tool. As part of GOMD, a PERL script is the basic
driver to invoke and execute each ETL job throughout the process as explained in the OMD
Macro design document.
OMD Operational Meta Data table will contain data about batch identifiers, for instance: source
system name, received date of the source file, processed date and statuses of each processing
state (such as Pre-PAD Data Stage successful, Loaded to PAD, Post-PAD Data Stage Successful,
and MPF created). The data into this table inserted/updated by PERL script only. There will not
be any direct interface to this table from Informatica mapping.
e.g.: PAD is associated with a Batch Identifier, enabling roll back to be handled by the Batch
Identifier when a session fails abruptly. It is expected that a particular ETL job should be re-
started on failure. More details can be found in the OMD Macro Design Document.
UNPROCESSED DATA:
Error handling and controls data structures and processes are represented in the Errors and
Controls macro design. The source specific implementation of that generic design is contained
in Section 11 of this document.
LOOKUP DATA:
Reference data is stored in multiple database tables. These tables will be looked up and
required reference values will be retrieved during the processes ETL-3 and ETL-4 which loads
data into PAD and outbound staging tables respectively.
ETL-1:
It consists of Informatica mapping/s which reads the data from reference files/tables and loads
into respective lookup database tables.
SOURCE:
CUSTOMER Valuation system on monthly basis will generate UNISURE extract files. The ETL
source will be the UNISURE source extract files which are generated by this valuation system.
The same files will be copied onto Informatica UNIX server at specified location from where the
Informatica service reads and processes the source data.
ICMS Commission system would be generating the extract on monthly basis. The same would
be copied onto Informatica server on the Unix environment at specified location and then
picked up by the Informatica service to processes it.
Asset Share Factors data would be used for referencing the Fund, Asset Share and Unit Price
calculations
ETL-2:
ETL-2 consists of Informatica mappings which read the data from source UNISURE files and
loads into inbound staging table. During the process if any source record violates the business
or database constraint rule, that record will be populated into unprocessed data table using the
process described in the Error Handling Macro Design.
INBOUND STAGING:
Inbound staging table has similar structure to the source data, with the addition of meta data
fields (e.g. Extract Date, Source File Name, Batch Identifier).
Where required, Inbound Staging will contain the data from the current and prior extract (i.e.
two consequent months of data), allowing movements to be detected via the delta between
two extracts, where movement events or transactions are not provided on the source extract.
The stage table will be partitioned in order to enable historical data to be easily deleted (via
truncating the partition) when no longer required.
ETL-3:
ETL -3 consists of Informatica mappings which reads the data from inbound staging table and
loads into Intermediate tables specific to UNISURE. These intermediate tables will contain the
Data sets derived from the UNISURE inbound staging data.
ETL-4:
ETL -4 consists of Informatica mappings which reads the data from inbound staging table and
loads into PAD table. During the process if any record violates the business or database
constraint rule that record will be populated into unprocessed data table according to the Error
Handling Macro Design.
PAD:
PAD (Persistent Actuarial Database) consists of multiple database tables with predefined
relationships.
The following stages are explicitly out of scope for this design:
ETL-5:
ETL-5 consists of Informatica mappings which read the data from the PAD and loads into
outbound staging table. More details are provided as a part of PAD-MPF Macro.
OUTBOUND STAGING:
Data will be staged in the outbound staging before creating the MPF files. More details are
provided as a part of PAD-MPF Macro.
ETL-6:
ETL-6 consists of mappings that will create the MPF files. More details are provided as a part of
PAD-MPF Macro.
TARGET:
The target is Model Point ASCII CSV files with column and business header information. More
details are provided as a part of PAD-MPF Macro.
Figure Description
Figure 1 Holistic view of UNISURE in Finance Transformation Solution
Figure 2 ETL Architecture flow
Figure 3 Data flow from Source to PAD
Figure 4 Data flow from Source to PAD with usage of tools at every stage
Figure 5 Modal point file generation for Unisure Source System
Figure 6 PAD to MPF E2E Architecture
Figure 7 Detail Architecture & flow of PAD to MPF
Figure 8 Types of generated MPFs
Figure 9 Closure ETL Architecture for PAD to Pre POL stage
Figure 10 Data Flow from POL to Grain stage
Figure 11 Data Flow from Grain to Outbound tables on split logic
Figure 12 MPF generation from Outbound Staging tables
Informatica 9.1, 10.1 ETL tools to perform all the Extract Transform
and Load functions
3 Tools Used
The control of specific code releases, and the management of the Informatica repository, is out
of scope of this design. As Release Management activities, they will be specified elsewhere.
5 Archival Strategy
PAD.
7 Non-Functional Requirements
All Non-functional requirements are captured in Requisite Pro. They include response times,
data storage, tracking of data, data lineage, access levels and security etc.
In addition to the documents referenced above, the following ‘Issue Resolution’ requirement
was captured during the design review:
"In the event that reference data used to load a given source system data feed into the PAD is
found to be corrupt, incorrect or incomplete then it must be possible to re-supply or amend the
reference data and re-run the staging to PAD mapping to resolve the errors. In practice, this
would mean:
Policies that had previously been written to the Unprocessed Data table being successfully
loaded to the PAD
Policies that had previously been loaded to the PAD being written to the Unprocessed Data
table
Policies that had previously been written to the PAD being again written to the PAD but with
changed data values
The re-running of Staging to PAD processes will first require the manual deletion of inserted
records on the basis of batch identifier, and the re-setting of record end dates and current flags
for those records which have been updated. This is expected to be a manual BAU process,
though the necessary SQL statements are expected to be provided to aid testing.
Any PAD batch activities which are dependent on the successful load of this source system feed
(e.g. MPF production) should not commence until users confirm that the data loaded to the
PAD for the given source system is of sufficient quality (i.e. that the Unprocessed Data has been
examined and any reprocessing of the kind described above has taken place). This requirement
should be handled by the creation of GOMD meta data and the configuration of GOMD for
Finance Transformation."
Figure 4
9 Source Data
9.1 Source Data Specification
The UNISURE source data is transmitted in fixed width ASCII files. Data is split across 3 extracts,
each containing multiple files.
The Inforce and Movement files come in two sets, one set without headers and footers until
August – 2009 and would be termed as Pre-DFR files the files that arrive after August -2009 will
have headers and footers and would be termed as POST DFR files and the files would be
processed by two different ETL processes.
The metadata stored in the staging tables facilitates the identification of the source file and
batch identifier from which the policy got loaded.
Using Transformation logic (based on assumptions made out of available UNISURE SAS code,
and validated by CUSTOMER) the 3 UNISURE source extract files provided in section 13.1 of this
document will be merged and 10 data sets will be created. These data sets will be stored in
intermediate tables for further processing to PAD.
WH_STG_UNISURE
WH_STG_UNISURE_ICMS_DLEL
WH_STG_UNISURE_ICMS_DEAL
WH_STG_UNISURE_ICMS_CNDL
WH_STG_UNISURE_MOV
WH_STG_UNISURE_ASF_GNGG
WH_STG_UNISURE_ASF_GIPP
WH_STG_UNISURE_ASF_GNG
WH_STG_UNISURE_POL
WH_STG_UNISURE_INCCOV
WH_STG_UNISURE_BENCOMDA
WH_STG_UNISURE_ASSNG
WH_STG_UNISURE_UNTS
WH_STG_UNISURE_ICMS
WH_STG_UNISURE_ASH_FCTR
WH_STG_UNISURE_FPHA_AGR_PRMM
WH_STG_UNISURE_AGR_BNFT
In the process of loading the data from inbound staging to intermediate tables the data is split
and grouped which is discussed in detail in the Data Mapping sheet in section 15.
Code
Variable Source Name
polref Policy number *
inrref Increment reference *
Movements will be captured by comparing the current extract against the previous extract,
rather than from source-supplied transactions. Staging at any point of time will hold one month
of history extract to allow comparison of two sequential extracts.
The Activity table (WH_PAD_ACTVTY) will hold all the data related to movements for a
particular policy. The data mapping sheet for the target WH_PAD_ACTVTY table contains details
of how to populate this table.
PAD to MPF processing is common for all the Source Systems in TARDIS. Aviva uses an actuarial
engine Prophet, which consumes Model Point Files as its input.
PAD to MPF deals with the generation of these Prophet–ready Model Point Files (MPFs) from
PAD data.
Two kinds of MPFs are generated every month: Inforce and Movement.
The inforce and movement MPFs differ only in terms of the PAD extraction criterion. The
transformation logic applied is the same.
PAD to MPF processing is common for all the Source Systems in TARDIS. Organization uses an
actuarial engine Prophet, which consumes Model Point Files as its input.
PAD to MPF deals with the generation of these Prophet–ready Model Point Files (MPFs) from
PAD data.
Two kinds of MPFs are generated every month: Inforce and Movement.
The inforce and movement MPFs differ only in terms of the PAD extraction criterion. The
transformation logic applied is the same.
Figure 5
Figure 6
Figure 7
All Unitized agreements (invested in Unitized product) should have at least one fund
investment or have annual premium greater than zero (for conventional agreements). All
agreements should have at least one life attached.
Figure 8
Based upon source specific business conditions, agreements in PAD are evaluated for “Inforce
status” and processed for actuarial valuation.
All Unitized agreements (invested in Unitized product) should have at least one fund
investment or have annual premium greater than zero (for conventional agreements). All aAs
shown in fig. above,
• Policy marked as (1) would be present in Inforce MPF because it started in the month of
• December 2009 which is standard period of start of policy and there are no transactions
in it till date which means it holds the snapshot of the data in policy till date from the beginning.
• Policy marked as (2) would be present in Movement MPF because it lapsed in the
middle before the end date.
• Policy marked as (3) would be present in Inforce as well as Movement as it started after
the month of December 2009 in which case it would be considered in the next Financial Year
i.e. 2010. Also, in Movement it would be marked as “New Business”.
• Policy marked as (5) would be present only in Movement MPF as it started after the
financial year 2009 and also terminated before the end time.
Figure 9
Figure shows flow for PAD to Pre POL Stage loading of data. Where Pre POL stage is junction tables.
Junction tables are join of few PAD tables based on some specific conditions.
Figure 10
Figure shows data load from POL stages to GRAIN ATTR table depending upon business conditions.
Figure 11
Figure 12
Figure shows the last step to generate MPFs from OB staging tables.
11 Error Handling
Generic Error Handling for all the source systems is described in the Component Macro Design
for Process Optimization and Governance (Controls-Error Handling) Macro Design Document.
The tables below list provide information about the specific check that should be performed
within the Unisure feed into the PAD. For checks which may result in a complete stop to the
data load the “Error Code” column provides the specific error code which should be returned to
indicate the issue that occurred.
E1001 Has extract arrived All of the source files should be received according to the
schedule defined in section 11 of this document.
The SLA table should be updated to reflect these
requirements.
If any of the files have not been received according to the
defined schedule then an e-mail should be sent to the
support team (the e-mail address and message content
should be parameterised to provide flexibility for future
change).
E1002 Has file been received As defined by the control framework upon receipt of the
before? source files a binary comparison of each file against those
processed in the previous month should be performed. In the
event that the files identically match the previous file the
batch should terminate, returning the error code and an e-
mail should be sent to the support team (the e-mail address
and message content should be parameterised to provide
flexibility for future change).
The Unisure Inforce source extract would be consisting of 20
different files split based on the product types, all of the same
format and combined weekly movement extracts
This comparison would be done on the Inforce extracts on a
monthly basis and on the weekly extracts on a weekly basis
E1003 File names are as The file names for the 20 Unisure extracts would be in the
expected?
following format
NUUNL.UNISURE.IFVF.xxxxxxxxx.CRE08366.DEL2600
xxxxxxxxx would specify the actual name of the extract file.
The file names for the 3 Commission extracts would be in the
following format
NUCBL.DESIGNER.DATA.xxxxxxxx.UNLOAD
xxxxxxxx would specify the actual name of the extract file
In the event that any of the files do not match the defined
format then the batch should terminate, returning the error
code and an e-mail should be sent to the support team (the e-
mail address and message content should be parameterised
to provide flexibility for future change).
E1004 Is file structure as In the event that any of the data does not conform to the
expected? structure defined by the Valuation Inforce extract
specification and Valuation Movement extract specification
then the batch should terminate, returning the error code
and an e-mail should be sent to the support team (the e-mail
address and message content should be parameterised to
provide flexibility for future change).
E1007 Is header and footer As of now the Unisure extracts does not consist of header and
found? footer, but if the headers and footers were included at a later
point of time, this particular control should be handled in the
following manner.
Each collection of files should be checked to ensure that they
include both a header and footer record conforming to the
Unisure layout structures.
In the event that either the header or footer cannot be found
then the batch should terminate, returning the error code
and an e-mail should be sent to the support team (the e-mail
address and message content should be parameterised to
provide flexibility for future change).
E1008 Number of records As of now the Unisure extracts does not consist of a footer,
but if the footers were included at a later point of time, this
particular control should be handled in the following manner.
The trailer record contains the number of records that should
be present for each record type. These record counts should
logic
Fatal errors should result in termination of the process with an error code returned to
indicate the type of error, and an e-mail transmitted to alert the support teams.
Non-fatal errors where the data should not be processed should result in the data being
logged to the unprocessed data table and a message logged to the non-fatal errors
table.
Non-fatal errors where the data should be processed should result in a message logged
to the non-fatal errors table.
12 Database Overview
A fully comprehensive overview of the database is covered in the Physical Design Model (PDM)
document. The following are the list of the tables that are applicable for the source to PAD
macro design of UNISURE.
This section will have a mapping sheet which will provide the following mappings
In the mappings, Extract Date is the last day of the month for which extract is provided. For
example for January 2017’s extract date is 31/01/2017, for February 2017 it is 28/02/2017.
14 References
1. Roger S. Pressman, Software Engineering – A Practitioners approach, McGraw-Hill International,
6th Edition, 2005.
2. Len Bass, Paul Clements, Rick Kazman, Software Architecture in Practice, Pearson Education, 2nd
Edition, 2005
3. Harry S. Singh, “Data Warehousing – concepts, Technologies, Implementations, and
Management “, Prentice Hall PTR, New Jersey.
4. Douglas Hackney, “Understanding and Implementing successful DataMart, Addison-
Wesley Developers Press”.
5. http://www.dw-institute.com
6. http://www.datawarehouse.org
15 Appendix
Term Description
Data Mart A logical and physical subset of the data warehouse’s presentation area.
16 Plan of Work
No
Planned
of Specific Deliverable in terms
Tasks to be done Duration Status
task of project
(Weeks)
s
Requirement
1 2.5 IFAD’s, Solution design Completed
Gathering
2 Analysis & Design 2.5 Design Doc & Mapping sheets Completed
S.
Need to Check Y/N
No.
1 Is the Cover page in proper format? Y
2 Is the Title page in proper format? Y
3 Is the Certificate from the Supervisor in proper format? Has it been signed? Y
4 Is Abstract included in the Report? Is it properly written? Y
5 Does the Table of Contents’ page include chapter page numbers? Y
6 Is Introduction included in the report? Is it properly written? Y
7 Are the Pages numbered properly? Y
8 Are the Figures numbered properly? Y
9 Are the Tables numbered properly? Y
10 Are the Captions for the Figures and Tables proper? Y
11 Are the Appendices numbered? Y
12 Does the Report have Conclusions/ Recommendations of the work? Y
13 Are References/ Bibliography given in the Report? Y
14 Have the References been cited in the Report? Y
15 Is the citation of References/ Bibliography in proper format? Y