AAOtoSAO_D5S1_Overview of Data Analytics

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 59

Overview of

Data Analytics

Mandatory Training for AAOs to be empanelled


for promotion as SAOs – Day 5 Session 1
(Common) Stream
Session Overview

 Data analytics and tools


 Data Visualization
 GIS Mapping

2
Learning objective
To have an understanding of :
1. Data analysis vs Data Analytics
2. Data analytics features and application in IAAD
3. Data analytics and tools
 CAG’s Guidelines on Data Analytics
 Data analytics with Knime data analytics software tool
 Data Visualization with Tableau
4. Geographic Information System (GIS)
 spatial / geo-locational analysis
3
References

 CAG’s Guidelines on Data Analytics


Overview of
Data Analytics

( Reference : CAG Office - Guidelines on Data Analytics)


Data Analytics

 Data analytics is the science of analysing raw data in order to


make conclusions about that information.
 Data analytics techniques can reveal trends and metrics that
would otherwise be lost in the mass of information.
 Data analytics help audit to optimize resources (cost, time and
human).
 In IAAD - Centre for Data Management and Analytics (CDMA)
is the nodal body for steering data analytic activities
6
Data analytics features
 Application of scientific approach to gain insights from data
 Macro analysis helps in having macro view of the entire process
 Descriptive analytics helps in drawing logical conclusions about the
existing information
 Descriptive analytics also helps bring out trends and patterns that would
otherwise be lost in the mass of information
 Diagnostic analytics helps in finding the causes
 Predictive analytics helps in projects for future planning based on existing
data
 Prescriptive analytics come out with recommendation to correct or
improve the existing process
Data Analysis vs Data Analytics
 Data analysis and data analytics are often treated as interchangeable
terms, but they hold slightly different meanings
 Data analytics is the application of data science approaches to gain
insights from data. Thus Data analytics is a broader term. Data analytics is
an overarching science or discipline that encompasses the complete
management of data.
 Data analytics not only includes analysis, but also data collection,
organisation, storage, and all the tools and techniques used.
 Data analysis, on the other hand is a subcomponent of Data analytics. Data
analysis does not cover data management process.
8
How could Data Analytics help IAAD?
 Faster computerization of the Government entities led to
generation of transactions in electronic format;
 Availability of data can help us to unearth various patterns,
unknown hitherto.
 Better understanding of the risks leads to better and
focused utilisation of otherwise scarce (human) resources.
Purpose of Data Analytics in IAAD functionality
Developing Audit
identifying
Designthe
Matrix
sample of audit units
Identification of
Identifying
Risks audit objectives
Audit Planning

Audit Execution Verification


Drill down
analysis
of
Risk exceptions
assessment
Dashboards at unit level

10
Steps involved in Data analytics
Pre
pa
rat ping
elo s
Da ion o v t
ta f De sigh
in

Collec del
tion o M o
f Data
eparing
Pr

Data Analytics
11
Challenges

In-house technical Identifying patterns,


expertise trends, exceptions,
inconsistencies in data
Data is available from sets
different sources

Identifying the areas


Data is available in of interest of risk and
different forms exceptions

12
Data Analytic Process

13
Types of data

14
Structured Data
Qualitative data

Nomin Ordinal Quantitate


al data data data

Continuo Discrete
us data data
Data not amenable to Data amenable to Temperature which Expenditure of a
ordering example ordering example is amenable to company that can be
Name, gender, etc. Ranking based on identifying compared as multiples
quality of service. differences. of one another

15
Types of Data
 Qualitative Data - information about qualities; (attributes)
information that can't actually be measured.
Some examples: High risk, pale colour etc.,
 Qualitative Data can be of
 Nominal (eg Gender)
 Ordinal (eg APR Grading)
 Quantitative Data - is information about quantities; that is,
information that can be measured and written down with
numbers.
Some examples: height, shoe size, age, income etc., 16
Types of Data
 Continuous data - have an infinite number of steps, which form a
continuum: Any number between – infinity to + infinity.
 Ex: 6.8 cm, .0899079707 inches etc.,

 Discrete data - Discrete data have finite values, or buckets. You can
count them. (The number of villages in a District would be discrete--
there are a finite and countable number)
 Always be a whole number

17
Types of Data
Structured Data
 Refers to any data which is in a tabular form.
Data  This includes data contained in relational databases

and spreadsheets.

Structured  Unstructured data (or unstructured information)


refers to information that either does not have a pre-
defined data model or is not organized in a pre-defined
Unstructured
manner.
 Examples are Text, Audio files, Video files.

18
Master and Transaction data
 Master data is data that does not change often and
Data is always needed in the same way by business.
 Ex: One time activities like creating Company Codes,
Master Materials, Vendors, Customers etc.
data  Transaction data keeps on changing and deals with
day to day activities carried out in business.
Transactio
n data
 Transactions done by or with Customers, Vendors,
and Materials etc. generate Transaction Data. So data
related to Sales, Purchases, Deliveries, Invoices etc.
represent transaction data.
19
Sources of data

Internal Source
 VLC data
 GPF and Pension data
 Data from audit process
 Finance and Revenue a/c
 Data available in the department

20
Sources of data
 External Source
 Auditable entity data
 Third Party data
 Census data
 Data.gov.in
 NSSO data
 From various ministries
21
Data access process

Read only
rights in Cloud
the services
audited online
Manual entity data
records system process

Data Electronic Real time


shared on Transfer of data
removable data sharing
media

22
Data access process

Collection of Third party Ownership of


data data data
• Authenticity, • Data sets • Ownership
integrity, whose of data sets
relevance, ownership remains
usability is not with with
and security auditable auditable
of the data entities entities or
sets should third party
be ensured data
sources

23
Data access process – contd.,

• Data security protocols applicable to the


Data security audited entity may be followed

• Data reliability can be affected because of the


Data reliability methods of generation/ capture of data

24
Data Preparation
Steps in data preparation:
1. Data Restoration:

Data from the data source should be copied and restored


2. Data Identification
Identification of relevant field/table/ variable of interest
3. Importing into analytical tool
to read flat files into the software or connect to a database and read tables.
4. Data cleaning
the process of detecting and correcting or removing corrupt or inaccurate records from a record
set, table, or database.
5. Data Integration
data collected from various data sources combined to obtain the final dataset 25
Types of Analysis

26
Descriptive Analytics
 Descriptive analytics looks at data and analyzes past events
 Provides an understanding of the past transactions that occurred in
the organisation
 Raw data is summarized so that it can be understood by the user
 Summarization of data through numerical or visual descriptions
 For example - almost all management reporting such as sales, marketing,
operations, and finance, uses this type
 It is a post-mortem analysis.

27
Diagnostic Analytics
 Diagnostic Analytics is an advanced form of descriptive analytics and
tries to answer the question “why did it happen” or “how did it
happen”.
 Diagnostic analytics involves seeking relationship between relatable
data sets and identification of specific transactions/ transaction sets
along with their behavior and underlying reasons.
 Drill down and statistical techniques like correlation assist in this
endeavour to understand the causes of various events.
 Ex: Low sales in a zone caused by one of the three salesmen.

28
Predictive Analytics
 Predictive analytics is the branch of the advanced analytics which is used
to predict about future trends and patterns.
 Predictive analytics uses data to determine the probable future outcome of
an event or a likelihood of a situation occurring.
 Predictive analytics uses many techniques from data mining, statistics,
modelling, machine learning, and artificial intelligence to analyse current
data to make predictions about future.
Example:
 Sales forecasting, Predicting the likelihood of insolvency of a
customer/organisation etc.,
29
Prescriptive Analytics
 Prescriptive analytics goes beyond predicting future outcomes by also
suggesting actions to benefit from the predictions and showing the
decision maker the implications of each decision option.
 Prescriptive analytics not only anticipates what will happen and when
it will happen, but also why it will happen
 prescriptive analytics can continually and automatically process new
data to improve prediction accuracy and provide better decision
options
Example:
 To predict the optimum inventory for a particular perishable product.
30
Data Analytic Techniques
Use of Statistical measures to derive insights about the dataset
using Knime – data analytics tool

31
Data analytics - benefits
 Comprehensibility:
 makes information and relationships easily understandable

 Comprehensiveness:
 presents information for the entire selected data set

 Focussed:
 facilitates concise and ‘to the point’ communication.

 Less complexity:
 simplifying the presentation of large amounts of data.

 Establishing patterns:
 enables identification of patterns in the data.

 Analysis:
 promotes thinking on ‘substance’ rather than on ‘methodology’.
32
Documentation of data analytic process
Areas to be documented:
 Data identification
 Data collection
 Importing data into analytic software
 Analytic technique used
 Results of analysis
 Data Analytic Model
 Feedback from use in audit
33
Use of Data Analytics in Audit
Annual audit planning Audit Execution Audit Reporting
• Risk Analysis • Dashboards for Audit • Usage of Graphics
team and Visualisation for
• Issue identification better understanding
• Risk assessment
• Leads for setting • Simplifying the
Audit Objectives • Identification of presentation of large
exceptions amounts of data
• Selection of sample
• Drill down analysis • ‘to the point’
• Unit level Planning
communication

34
Data Visualization
Data visualization
 Data visualization means presenting raw data through graphical
representations of visuals, graphs and charts
 that allow us to explore the data and uncover deep insights.
 it is much easier to comprehend information through visuals
rather than the raw reports.
 This visual format enables one to make quick and effective
conclusions.
 Use of visuals, graphs and charts helps us to derive an
understanding and insight into the dataset 36
Benefits of data visualization
Some the main benefits of data visualization techniques are:
 Live updates of visuals, since connected to source data
 Easier understanding of the domain
 Better analysis
 Grasping the Trends
 Identifying patterns
 Finding errors
 Exploring for deeper insights
37
Data visualization
Some of the popular data
visualization tools :
 Tableau

 Power BI

 Zoho Reports

 Google analytics

 DOMO

 IBM Watson analytics

 Sisense

 SAP Analytics Cloud


Tableau Bar chart
 Chartio 38
Visualization of dataset with column chart

39
Correlation with scatter plot

40
Data analytics and visualization
Summary:
 Data Analytics and visualization are useful in:
 the audit planning at a macro level
 to understand the issues in the implementation
 to identify sample for substantive audit based on risk weightage and
outlier analysis
 The Dashboards given to field audit parties will assist:
 them in micro level audit planning to identify
 specific issues pertaining to the unit selected for substantive audit

41
Geographic Information Systems (GIS)
and Remote Sensing Data (RSD)

( Reference : CAG Office - Guidance Note on Usage


of Remote Sensing Data and Geographic Information
System for effective audits )
GIS - introduction
 A geographic information system (GIS) is a computer system for
capturing, storing and displaying data related to positions on Earth’s
surface.
 GIS can show many different kinds of data on one map.
 This enables us to see, analyse, and understand patterns and
relationships.
 GIS provides a medium for representation, storage and
manipulation of geographic information in terms of a database
 GIS technology integrates common database operations such as
query and statistical analysis with the unique visualization and
geographic analysis benefits offered by maps. 43
GIS and compters
 GIS incorporates graphical features with tabular data in order to assess real-
world problems.
 This system came into being with the discovery of the fact that:
 maps could be programmed using simple code and then stored in a computer
allowing for future modification when necessary.
 This was a welcome change from the era of hand Cartography when maps
had to be painstakingly created by hand
 In GIS data is represented as:
 –Spatial
 –Attribute
44
GI system functions
 Digital Capture
 Digitisation of maps
 Remote sensing data
 GPS
 Digital Compilation
 Relating spatial features to
attributes
 Cleaning
 Correcting errors
 Digital Storage
 Raster
 Vector
 Image
45
Vector /Raster / Image
Vector Raster Image

46
GIS data retrieval and analysis
Manipulations
–Data retrieval
–Measuring area and perimeter
–Overlaying maps
–Performing map algebra
–Reclassifying map data

•Analysis
–Database Query
–Overlay
–Proximity Analysis
–Network Analysis
–Statistical and Tabular Analysis 47
GIS Data availability
•GIS data available in the
–Bihar (Forest departments)
–Karnataka (Forest departments)
–Government of UP
–Government of Tamil Nadu
–GIPL
•Use of GIS data from other service providers
–National Remote Sensing Centre (NRSC)
–NIC
–State Remote Sensing Agencies
–Geo Spatial Delhi Ltd.
48
Use of GIS in Audit
 Assessing relevant audit risks
 Designing the audit
 Conducting the audit
 Analysing audit results and
 Communicating audit results
Challenges:
 Use of GIS/RSD in audits involves a lot of technical expertise
 May have to be outsourced to an expert / institute
49
Collaboration with experts
Audit
 Identification of area Expert/institute
 Gather digital data from  Obtaining satellite
department, if available imageries and
 Identify area/location (in
interpretation of the
consultation with expert if
required)
same
 Liaising among audited  Selecting sample for
departments ground truthing/
 Arranging joint field visits
verification
with expert and officials of
Department  Analysis and
 Issue of audit observation reporting
based on the final report of
the expert 50
ISSAI 5540 (under review)
 ISSAI 5540 introduces GIS as an Audit tool
 Auditors to improve and expand auditing by the use of geospatial
information.
 Application areas
 urban planning,
 Environment Impact Assessment,
 execution of public works,
 disaster management,
 infrastructure creation
 management of resources of the Earth etc. 51
Checklist for use of geospatial data in audit
 What geospatial data is needed to answer the audit questions?
 What accuracy is required of the geospatial data?
 What is the required timeframe of the geospatial data?
 What geospatial data is available?
 From which sources can the required geospatial data been derived from and how
reliable are they?
 What is the quality of the available geospatial data?
 What are the costs of the available geospatial data?
 If the required geospatial data are not available, could they be gathered as part of the
audit process and budget?
 Do the auditors involved have the required knowledge to gather and analyse the
required geospatial data or should external expertise be sourced?
-ISSAI 5540 52
Audit Report / Case studies using GIS

 Case study: Odisha Disaster Recovery Project (ODRP):


 Tamil Nadu: Audit of sand mining was successfully conducted and the
observations included in the CAG’s Audit Report
 Data source, tools used :
 Remote sensing data
 Google maps
 Data from drone survey
 Case Study: Compensatory Afforestation Fund Management and Planning
Authority CAMPA (State of Bihar)

53
Odisha Disaster Recovery Project (ODRP)

 Cyclone Phailin, hit the state of Odisha in October 2013 and affected
a densely populated area in coastal belt
 Govt of Odisha took up ‘Odisha Disaster Recovery Project’ (ODRP) for
implementation with the financial loan assistance of World Bank
 the objectives was to restore and improve housing and public
services in targeted districts
 The data provided to audit party was plotted in the GIS maps,
analysed, observations made and furnished to audit party for
necessary examination at the field

54
Plotting of beneficiaries in Khallikote, Rangueilunda blocks

55
Plotting of beneficiaries in Ganjam, Chatrapur blocks

56
GIS maps analysis – some of the results
 In Chikiti Block:
 the villages Lunimathi, Keutakaitha, Chandanbada and Ekasingi are within the 5Km
range of high tidal wave but they are not covered
 though the Kaithapadabadua village is not within the 5km range of high tidal wave
but 165 beneficiaries were covered under the scheme
 In Chatrapur Block:
 though many of the villages namely Laxmipur, Humuribana, Kanamana, Jimi,
Damodarpurpariklo, Podapadar, Allipur, Humuri etc. are within the 5Km range of high
tidal wave but they are not covered under the scheme

In Rangueilunda Block:
 the villages namely Dhepanuapada, Kostapeta, Tulu, Kodarapalli are though within
the 5Kms range from high tidal wave but they are not covered under the scheme.

57
GIS data analysis:

Summary:
 Once an office has collected GIS data and builds capacity to a analyse it using

GIS tools, spatial/geographic analysis of any scheme can be easily done.


 The use of GIS tool in analysing the data :

 gives an additional edge in planning


 gives a complete picture of the performance throughout the State
 identifies the region wise risk areas.
 best used for geo-locational analysis.
 The GIS technology can also be integrated into framework of other
information of non-spatial nature for better analysis.

58
Thank you

You might also like