0% found this document useful (0 votes)
308 views

(Chapter 8) Data Structures and CAATTs For Data Extraction

Uploaded by

Trayle Heart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
308 views

(Chapter 8) Data Structures and CAATTs For Data Extraction

Uploaded by

Trayle Heart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIVERSITY OF RIZAL SYSTEM

Province of Rizal
Binangonan Campus
COLLEGE OF ACCOUNTANCY

CHAPTER 8:
Data Structures and CAATTs for
Data Extraction

Submitted by:
Anore, Louis Antoinette G.
Biteranta, Janelle B.
Grantusa, Claire Euniz E.
Haral, Ma. Veronica B.
Quisquino, Angel Rose A.

BSA 3-2

Submitted to:
Prof. Liberty T. Ocampo, CPA, MBA
1

LEARNING OBJECTIVES:

After studying this chapter, you should:


● Understand the concepts of data structures and how these are used to achieve
data-processing operations.
● Be familiar with structures used in flat-file systems, including sequential,
indexes, hashing, and pointer structures.
● Be familiar with relational database structures and the principles of
normalization.
● Understand the features, advantages, and disadvantages of the embedded audit
module approach to data extraction.
● Know the capabilities and primary features of generalized audit software.
● Become familiar with the more commonly used features of ACL.

This chapter delves into the intricate realm of data structures and the utilization
of Computer-Assisted Audit Techniques (CAATTs) for data extraction and analysis. It
commences with a comprehensive examination of data structures, emphasizing their
significance in both physical and logical data organization within files and databases.
Various structures such as flat-file, navigational database, and notably, relational
database structures are scrutinized, with a particular focus on relational databases due
to their prevalence in contemporary business environments.

Another important use of data extraction software is in performing substantive


tests. Most audit testing occurs in the substantive-testing phase of the audit. These
procedures are called substantive tests because they are used for, but not limited to, the
following:

• Determining the correct value of inventory.


• Determining the accuracy of prepayments and accruals.
• Confirming accounts receivable with customers.
• Searching for unrecorded liabilities.

CAATTs for data extraction software fall into two general categories: embedded audit
modules and general audit software. The chapter describes the features,
advantages, and disadvantages of the embedded audit module (EAM) approach. It then
outlines typical functions and uses of generalized audit software (GAS). The chapter
closes with a review of the key features of ACL (audit command language), the leading
product in the GAS market.

Chapter 8 - Data Structures and CAATTs for Data Extraction


2

DATA STRUCTURES

Data structures have two fundamental components: organization and access method.
Organization refers to the way records are physically arranged on the secondary
storage device. This may be either sequential or random. The records in sequential files
are stored in contiguous locations that occupy a specified area of disk space. Records in
ran- dom files are stored without regard for their physical relationship to other records
of the same file. Random files may have records distributed throughout a disk. The
access method is the technique used to locate records and to navigate through the
database or file. While several specific techniques are used, in general, they can be
classified as either direct access or sequential access methods.

Flat-File Structures
The flat-file approach is a single view model that characterizes legacy systems. Data files
are structured, format- ted, and arranged to suit the specific needs of the owner or
primary user. Such structur- ing, however, may omit or corrupt data attributes that are
essential to other users, thus preventing successful integration of systems across the
organization.

SEQUENTIAL STRUCTURE
This refers to organizing data in a file in a specific sequence based on the
values of their primary keys, such as ascending or descending order. Records are
stored one after another in contiguous storage spaces.

Sequential files are efficient for tasks like reading the entire file or
processing a large portion of it at once, such as updating numerous records in
bulk. However, they become inefficient for small operations or when direct access
to specific records is required. Retrieving a single record necessitates scanning
through all preceding records, making it impractical for such tasks. Direct access
is limited in sequential files, requiring alternative data structures for tasks
needing it.

Chapter 8 - Data Structures and CAATTs for Data Extraction


3

INDEXED STRUCTURE
An indexed structure comprises both the data file and a separate index file
containing record addresses. Each entry in the index file corresponds to a record
in the data file and contains the numeric location of that record's physical storage
on the disk.
In an indexed random file, records are stored scattered across the disk
without regard for their physical proximity. Searching and retrieval are facilitated
by an index that maps key values to storage locations. Adding records involves
selecting an available disk location and updating the index accordingly. Indexes
can be organized either sequentially or randomly, with sequential indexes
offering rapid search capabilities despite being more complex to maintain.
Algorithms can be utilized to expedite searches through the index, which is
particularly advantageous for large data files.

Chapter 8 - Data Structures and CAATTs for Data Extraction


4

Virtual Storage Access Method (VSAM)


VSAM is utilized for very large files requiring routine batch processing
alongside moderate individual record processing, such as customer files in public
utility companies. VSAM's sequential organization facilitates efficient batch
processing and allows direct access through the use of indexes.VSAM files often
occupy multiple cylinders of contiguous disk storage. The system uses indexes to
summarize the contents of each cylinder, facilitating direct access. While VSAM
indexes don't provide exact physical addresses for records, they identify the disk
track where the record resides, requiring a sequential search within that track.

Chapter 8 - Data Structures and CAATTs for Data Extraction


5

HASHING STRUCTURE
A hashing structure utilizes an algorithm to directly convert a record's
primary key into a storage address, eliminating the need for a separate index.
This approach streamlines record retrieval by calculating addresses rather than
referencing an index.
Hashing algorithms divide primary keys by a prime number to generate
storage locations, enabling faster record retrieval compared to indexed
structures. This method bypasses index searches by algorithmically determining
storage addresses. For instance, in an inventory file with 100,000 items, the
inventory number is divided by a prime number to determine the storage
location. Hashing often results in random file organization due to the dispersed
nature of record addresses across the storage device.

● Advantage of Access Speed: The primary advantage of hashing is its fast access
speed. Calculating a record's address is quicker than searching for it through an
● Disadvantages of Hashing:
1. Inefficient Storage Usage: Hashing does not utilize storage space efficiently.
The chosen storage location for a record is determined solely by its primary key
value, leading to potential wastage of disk space, with as much as one-third of the
disk pack being unused.
2. Collision Problem: Different record keys may produce the same or similar
residuals, resulting in collisions where two records attempt to occupy the same
storage location. This slows down access to records and complicates record
deletion operations.

Chapter 8 - Data Structures and CAATTs for Data Extraction


6

POINTER STRUCTURES
Figure 8.6 illustrates the pointer structure used to create linked-list files.
This method involves storing the address (pointer) of a related record in a field
within each record. Records are spread across the disk without regard for their
physical proximity, with pointers establishing connections between them. The
last record in the list typically contains an end-of-file marker.

Pointers within a file can link records together, as seen in the example
where Record 124 points to Record 125, Record 125 points to 126, and so forth.
This creates a sequence of linked records facilitating traversal through the file.
Pointers may also link records between different files, as depicted in Figure 8.7.
For instance, an accounts receivable (AR) record may have pointers linking it to
the next AR record within the AR file and to associated sales invoices and
remittance advice records. This facilitates accessing related records across
multiple files. By accessing an accounts receivable record, all associated sales
invoices and remittances can be located, aiding in account management and
transaction tracking. New transaction records are appended to the appropriate
linked-list, with their addresses stored in preceding records for future access.

Chapter 8 - Data Structures and CAATTs for Data Extraction


7

● Physical Address Pointer: This type of pointer contains the actual disk
storage location (cylinder, surface, and record number) required by the disk
controller to access the record directly. It offers speed advantages as no further
manipulation is needed to determine the record's location. However, it poses two
disadvantages: (1) If the related record is moved, the pointer must be updated,
which can be problematic during disk reorganizations or copies; (2) Physical
pointers lack a logical relationship to the records they identify, meaning if a
pointer is lost, the referenced record is also lost.
● Relative Address Pointer: A relative address pointer specifies the position of
a record within the file, such as the 135th record. To access the record, further
manipulation is required to convert the relative address to the actual physical
address. Conversion software calculates this using the physical address of the
file's beginning, the record length, and the relative address being sought.
● Logical Key Pointer: This pointer contains the primary key of the related
record. A hashing algorithm converts this key value into the record's physical
address. This method allows for quick access to the record by utilizing the logical
relationship between keys and their corresponding records.

Chapter 8 - Data Structures and CAATTs for Data Extraction


8

Hierarchical and Network Database Structures


● Early hierarchical and network database models employed many of the preceding
flat-file techniques as well as new proprietary database structures. A major
difference between the two approaches is the degree of process integration
and data sharing that can be achieved.
● Two-dimensional flat files exist as independent data structures that are not
linked logically or physically to other files. Database models were designed to
support flat-file systems already in place while allowing the organization to move
to new levels of data integration.

Hierarchical Database Model — is a database model in which the data are


organized into a tree-like structure.
Network Database Model — is a database model conceived as a flexible way of
representing objects and their relationships.

Figure 1 - Hierarchical Model

Chapter 8 - Data Structures and CAATTs for Data Extraction


9

Figure 2 - Network Model

Relational Database Structure, Concepts, and Terminology


● Relational databases are based on the indexed sequential file structure. This
structure uses an index in conjunction with a sequential file organization.
● Multiple indexes can be used to create a cross-reference, called an inverted list,
which allows even more flexible access to data.

Relational Database Theory


● This model has its foundations in relational algebra and set theory, which
provide the theoretical basis for most of the data manipulation operations used.
Accordingly, a system is relational if it:
1. Represents data in the form of two-dimensional tables.
2. Supports the relational algebra functions of restrict, project, and join.

❖ Restrict: Extracts specified rows from a specified table. This operation creates a virtual table
that is a subset of the original table.
❖ Project: Extracts specified attributes (columns) from a table to create a virtual table.
❖ Join: Builds a new physical table from two tables consisting of all concatenated pairs of
rows, from each table.

Chapter 8 - Data Structures and CAATTs for Data Extraction


10

Relational Database Concepts

Entity, Occurrence, and Attributes


● An entity is anything about which the organization wishes to capture data.
Entities may be physical, such as inventories, customers, or employees.
● This data model is the blueprint for ultimately creating the physical database.
The graphical representation used to depict the model is called an entity
relationship (ER) diagram.
● The term occurrence is used to describe the number of instances or records
that pertain to a specific entity.
● Attributes are the data elements that define an entity. For example, an
Employee entity may be defined by the following partial set of attributes: Name,
Address, Job Skill, Years of Service, and Hourly Rate of Pay.

Associations and Cardinality


● The labeled line connecting two entities in a data model describes the nature of
the association between them. This association is represented with a verb,
such as ships, requests, or receives.
● Cardinality is the degree of association between two entities. Simply stated,
cardinality describes the number of possible occurrences in one table that are
associated with a single occurrence in a related table.
○ Four basic forms of cardinality are possible: zero or one (0,1), one and
only one (1,1), zero or many (0,M), and one or many (1,M).
These are combined to represent logical associations between entities.

Figure 3 - Relational Model

Chapter 8 - Data Structures and CAATTs for Data Extraction


11

The Physical Database Tables


● Physical database tables are constructed from the data model with each entity
in the model being transformed into a separate physical table. Across the top of
each table are attributes forming columns. Intersecting the columns to form the
rows of the table are tuples (record/occurrence).
● Properly designed tables possess the following four characteristics:
1. The value of at least one attribute in each occurrence (row) must be
unique. This attribute is the primary key.
2. All attribute values in any column must be of the same class.
3. Each column in a given table must be uniquely named.
4. Tables must conform to the rules of normalization.

Linkages between Relational Tables


● Logically related tables need to be physically connected to achieve the
associations described in the data model. Using foreign keys accomplishes this, as
illustrated in Figure 8.14.

User Views
● A user view is the set of data that a particular user sees. Examples of user views
are computer screens for entering or viewing data, management reports, or
source documents such as an invoice.

Chapter 8 - Data Structures and CAATTs for Data Extraction


12

Anomalies, Structural Dependencies, and Data Normalization

Database Anomalies
● Improperly normalized tables can cause DBMS processing problems that restrict,
or even deny, users access to the information they need. Such tables exhibit
negative operational symptoms called anomalies.
1. Update Anomaly. The update anomaly results from data redundancy in
an unnormalized table.
2. Insertion Anomaly. Assume a new vendor enters a marketplace. The
organization does not yet purchase from the vendor, but may wish to add
the vendor to the database. This is not possible, however, because the
primary key for the Inventory table is PART NUM—the supplier data
cannot be added to the table.
3. Deletion Anomaly. The deletion anomaly involves the unintentional
deletion of data from a table.
❖ One or more of these anomalies will exist in tables that are not normalized or
are normalized at a low level, such as first normal form (1NF) or second
normal form (2NF). To be free of anomalies, tables must be normalized to the
third normal form (3NF) level.

Normalizing Tables
● The database anomalies described above are symptoms of structural problems
within tables called dependencies. Specifically, these are known as repeating
groups, partial dependencies, and transitive dependencies. The
normalization process involves identifying and removing structural dependencies
from the table(s) under review.

The resulting tables will then meet the two conditions below:
1. All nonkey (data) attributes in the table are dependent on (defined by) the
primary key.
2. All nonkey attributes are independent of the other nonkey attributes.

3NF - Third Normal Form Table is a database schema design approach for
relational, databases which uses normalizing principles to reduce the duplication of
data, avoid data anomalies, ensure referential integrity, and simplify

Linking Normalized Tables


● When unnormalized tables are split into multiple 3NF tables, they need to be
linked together via foreign keys so the data in them can be related and made
accessible to users. The degree of association (joint cardinality) between the

Chapter 8 - Data Structures and CAATTs for Data Extraction


13

resulting tables (that is, 1:1, 1:M, or M:M) determines how the linking occurs. The
key-assignment rules for linking tables are discussed next.

Normalize Database Tables - A normalized database table is structured in a way


that minimizes redundancy and dependency by organizing the data into separate
tables and establishing relationships between them. This reduces data duplication and
improves data integrity and efficiency in database operations.

Keys in 1:M Associations


Where a 1:M (or 1:0,M) association exists, the primary key of the 1 side is embedded in
the table of the M side. To demonstrate the logic behind this key-assignment rule,
consider two alternative business rules for purchasing inventory from suppliers.

● Business Rule 1. Each vendor supplies the firm with three (or fewer) different
items of inventory, but each item is supplied by only one vendor.
● Business Rule 2. Each vendor supplies the firm with any number of inventory
items, but each item is supplied by only one vendor.

Chapter 8 - Data Structures and CAATTs for Data Extraction


14

Keys in M:M Associations


To represent the M:M association between tables, a link table needs to be created. The
link table has a combined (composite) key consisting of the primary keys of two related
tables.

Chapter 8 - Data Structures and CAATTs for Data Extraction


15

● Business Rule 3. Each vendor supplies the firm with any number of inventory
items, and each item may be supplied by any number of vendors.

Auditors and Data Normalization


● Database normalization is a technical matter that is usually the responsibility of
systems professionals. The subject, however, has implications for internal control
that make it the concern of auditors also.
● The auditor needs to know how the data are structured before he or she can
extract data from tables to perform audit procedures.

DESIGNING RELATIONAL DATABASES

● Database design is a component of a much larger systems development


process that involves extensive analysis of user needs.

SIX PHASES OF DATABASE DESIGN


1. Identify entities.
2. Construct a data model showing entity associations.
3. Add primary keys and attributes to the model.
4. Normalize the data model and add foreign keys.
5. Construct the physical database.
6. Prepare the user views.

Identify Entities
View modeling begins by identifying the primary entities of the business function in
question. Recall that entities are things about which the organization wishes to capture
data.

KEY features of a simplified purchasing system:


1. The purchasing agent reviews the inventory status report (Figure 8.15) for
items that need to be reordered.
2. The agent selects a supplier and prepares an online purchase order.
3. The agent prints a copy of the purchase order (Figure 8.20a) and sends it
to the supplier.
4. The supplier ships inventory to the company. Upon its arrival, the
receiving clerk inspects the inventory and prepares an online receiving
report (Figure 8.20b). The computer system automatically updates the
inventory records.

Chapter 8 - Data Structures and CAATTs for Data Extraction


16

Entities are represented as nouns in a system description, A number of candidate


entities can be identified in the previous description: Purchasing Agent, Receiving Clerk,
Inventory, Supplier, Inventory Status Report, Purchase Order, and Receiving Report.
Not all of these candidates are true entities that need to be modeled in the database. To
pass as valid entities, two conditions need to be met:

● Condition 1. An entity must consist of two or more occurrences.


● Condition 2. An entity must contribute at least one attribute that is not
provided through other entities.

We need to test these conditions for each candidate to eliminate any false entities.
● Purchasing Agent. We need to determine what data about the agent that is
unique to his or her role of order placing needs to be captured.
● Receiving Clerk. We will assume that no clerk-specific data needs to be
captured that requires a dedicated table.
● Inventory. The Inventory entity meets both conditions. The description
suggests that the organization holds many items of inventory; thus this entity
would contain multiple occurrences.
● Supplier. The description states that multiple vendors supply inventory; hence
the Supplier entity meets the first condition. We can also assume that it meets the
second condition since no other entity would logically provide supplier data. The
Supplier entity, therefore, will be included in the data model.
● Inventory Status Report. The Inventory Status Report is a user view derived
from the Inventory and Supplier entities (see Figure 8.17). While it contains
multiple occurrences, it is not an entity because it does not satisfy Condition 2.
The view is derived entirely from existing entities and provides no additional data
that requires a separate entity. The view will be carefully analyzed, however, to
ensure that all the attributes needed for it are included in the existing entities.
● Purchase Order. The Purchase Order entity meets both conditions. Many
purchase orders will be processed in a period; thus the entity will have many
occurrences.
● Receiving Report. The Receiving Report meets both conditions. Many receipt
events will take place in the period; thus a Receiving Report entity will have
multiple occurrences. A Receiving Report entity will contain attributes such as
date received and quantity received that are unique to this entity and thus not
provided by other entities in the model.

Chapter 8 - Data Structures and CAATTs for Data Extraction


17

Construct a Data Model Showing Entity Associations


The next step in view modeling is to determine the associations between entities and
document them with an ER diagram. Recall that associations represent business rules.
Sometimes the rules are obvious and are the same for all organizations.

The underlying business rules.


1. There is a 0,M:M association between the Purchase Order and Inventory entities.
This means that each inventory item may have been ordered many times or never
ordered in the current business period.
2. There is an M:M association between the Inventory and Supplier entities. This
means that one or more vendors supply each inventory item, and each of them
supplies one or more items of inventory.
3. There is a 1:0,M association between the Supplier and the Purchase Order
entities.This means that in the current period, each supplier may have received
zero or many purchase orders, but each order goes to only one supplier.
4. There is a 1:1 association between the Purchase Order and Receiving Report
entities. A single receiving report record reflects the receipt of goods that are
specified on a single purchase order record. Multiple purchase orders are not
combined on a single receiving report.

Chapter 8 - Data Structures and CAATTs for Data Extraction


18

5. The association between the Receiving Report and Inventory entities is 0,M:M.
This signifies that within the period, each item of inventory may have been
received many times or never. Also, each receiving report is associated with at
least one and possibly many inventory items.

Add Primary Keys and Attributes to the Model

Add Primary Keys. The next step in the process is to assign primary keys to the
entities in the model. The analyst should select a primary key that logically defines the
nonkey attributes and uniquely identifies each occurrence in the entity.

Add Attributes. Every attribute in an entity should appear directly or indirectly (a


calculated value) in one or more user views. Entity attributes are, therefore, originally
derived and modeled from user views. In other words, if stored data are not used in a
document, report, or a calculation that is reported in some way, then it serves no
purpose and should not be part of the database.

Normalize Data Model and Add Foreign Keys

1. Repeating Group Data in Purchase Order. The attributes Part Number,


Description, Order Quantity, and Unit Cost are repeating group data. This means that
when a particular purchase order contains more than one item, then multiple values will
need to be captured for these attributes. To resolve this, these repeating group data were
removed to a new PO Item Detail entity. The new entity was assigned a primary key that
is a composite of Part Number and PO Number. The creation of the new entity also

Chapter 8 - Data Structures and CAATTs for Data Extraction


19

resolved the M:M association between the Purchase Order and Inventory entities by
providing a link.

2. Repeating Group Data in Receiving Report. The attributes Part Number,


Quantity Received, and Condition Code are repeating groups in the Receiving Report
entity and were removed to a new entity called Rec Report Item Detail. A COMPOSITE
KEY composed of PART NUMBER and REC REPT NUMBER was assigned.

3. Transitive Dependencies. The Purchase Order and Receiving Report entities


contain attributes that are redundant with data in the Inventory and Supplier entities.
These redundancies occur because of transitive dependencies in the Purchase Order and
Receiving Report entities and are dropped.

Chapter 8 - Data Structures and CAATTs for Data Extraction


20

Construct the Physical Database

Chapter 8 - Data Structures and CAATTs for Data Extraction


21

The next step is to create the physical tables and populate them with data. This is an
involved step that must be carefully planned and executed and may take many months
in a large installation. Programs will need to be written to transfer organization data
currently stored in flat files or legacy databases to the new relational tables. Data
currently stored on paper documents may need to be entered into the database tables
manually. Once this is done, the physical user views can be produced.

Prepare the User Views

The normalized tables should be rich enough to support the views of all users of the
system being modeled. To illustrate the relationship, the fields in the user view are
cross-referenced via circled numbers to the attributes in the supporting tables. Keep in
mind that these tables may also provide data for many other views not shown here, such
as the receiving report, purchase requisition listing, inventory status report, and vendor
purchases activity report.

Global View Integration

The view modeling process described previously pertained to only one business
function— the purchases system—and the resulting tables and views constitute only
a subschema of the overall database schema. A modern company, however, would need
hundreds or thousands of views and associated tables. Combining the data needs of all
users into a single schema or enterprise-wide view is called view integration. This is a
daunting undertaking when creating the entire database from scratch.

EMBEDDED AUDIT MODULE

● An EAM is a specially programmed module embedded in a host application to


capture predetermined transaction types for subsequent analysis.
● The objective of the embedded audit module (EAM), also known as
continuous auditing, is to identify important transactions while they are being
processed and extract copies of them in real time.

Chapter 8 - Data Structures and CAATTs for Data Extraction


22

● The EAM approach allows selected transactions to be captured throughout the


audit period.
● Captured transactions are made available to the auditor in real time, at period
end, or at any time during the period, thus significantly reducing the amount of
work the auditor must do to identify significant transactions for substantive
testing.

Disadvantages of EAMs

Operational Efficiency
● From the user’s point of view, EAMs decrease operational performance. The
presence of an audit module within the host application may create significant
overhead, especially when the amount of testing is extensive.

Verifying EAM Integrity


● The EAM approach may not be a viable audit technique in environments with a
high level of program maintenance. When host applications undergo frequent
changes, the EAMs embedded within the hosts will also require frequent
modifications. The integrity concerns raised earlier regarding application
maintenance apply equally to EAMs. The integrity of the EAM directly affects the
quality of the audit process.

Chapter 8 - Data Structures and CAATTs for Data Extraction


23

GENERALIZED AUDIT SOFTWARE


Generalized audit software (GAS) is the most widely used CAATT for IS auditing. GAS
allows auditors to access electronically coded data files and perform various operations
on their contents. Some of the more common uses for GAS include:

• Footing and balancing entire files or selected data items


• Selecting and reporting detailed data contained in files
• Selecting stratified statistical samples from data files
• Formatting results of tests into reports
• Printing confirmations in either standardized or special wording
• Screening data and selectively including or excluding items
• Comparing multiple files and identifying any differences
• Recalculating data fields

The widespread popularity of GAS is due to four factors:

(1) GAS languages are easy to use and require little computer background on the
part of the auditor;
(2) many GASproducts can be used on both mainframe and PC systems;
(3) auditors can perform their tests independent of the client’s computer service
staff; and
(4) GAS can be used to audit the data stored in most file structures and formats.

Using GAS to Access Simple Structures

Gaining access to flat-file structures is a relatively simple process, as illustrated in


Figure below. In this example, an inventory file is read directly by the GAS, which
extracts key information needed for the audit, including the quantity on hand, the dollar
value, and the warehouse location of each inventory item. The auditor’s task is to verify
the existence and value of the inventory by performing a physical count of a
representative sample of the inventory on hand. Thus, on the basis of a materiality
threshold provided by the auditor, the GAS selects the sample records and prepares a
report containing the needed information.

Chapter 8 - Data Structures and CAATTs for Data Extraction


24

Using GAS to Access Complex Structures

Gaining access to complex structures, such as a hashed file or other form of


random file,may pose a problem for the auditor. Not all GAS products on the market
may be capable of accessing every type of file structure. If the CAATT in question is
unable to deal with a complex structure, the auditor may need to appeal to systems
professionals to write a special program that will copy the records from their actual
structure to a flat-file sequential structure for easy retrieval. Figure below illustrates this
approach.

Chapter 8 - Data Structures and CAATTs for Data Extraction


25

To illustrate the file flattening process, consider the complex database structure
presented in Figure 8.29.

Audit Issues Pertaining to the Creation of Flat Files

The auditor must sometimes rely on computer services personnel to produce a


flat file from the complex file structures. There is a risk that data integrity will be
compromised by the procedure used to create the flat file. For example, if the auditor’s
objective is to confirm accounts receivable, certain fraudulent accounts in the complex
structure may be intentionally omitted from the flat-file copy that is created. The sample
of confirmations drawn from the flat file may therefore be unreliable. Auditors skilled in
programming languages may avoid this potential pitfall by writing their own data
extraction routines.

ACL SOFTWARE

In the past, public accounting firms developed proprietary versions of GAS,


which they used in the audits of their clients. More recently, software companies have
serviced this market. Among them, ACL (audit command language) is the leader in the
industry. ACL was designed as a meta-language for auditors to access data stored in
various digital formats and to test them comprehensively. In fact, many of the problems
associated with accessing complex data structures have been solved by ACL’s Open
Database Connectivity (ODBC) interface.

Chapter 8 - Data Structures and CAATTs for Data Extraction


26

Data Definition

We have already established that a client’s system may store data using a number
of flat file or database structures, including sequential files, VSAM files, linked lists, and
relational tables. One of ACL’s strengths is the ability to read data stored in most
formats. ACL uses the data definition feature for this purpose. To create a data
definition, the auditor needs to know both where the source file physically resides and
its field structure layout. Small files can be imported via text files or spreadsheets. Very
large files may need to be accessed directly from the mainframe computer.

Chapter 8 - Data Structures and CAATTs for Data Extraction


27

Customizing a View

A view is simply a way of looking at data in a file; auditors seldom need to use all
the data contained in a file. ACL allows the auditor to customize the original view
created during data definition to one that better meets his or her audit needs. The
auditor can create and reformat new views without changing or deleting the data in the
underlying file. Only the presentation of the data is affected.

Filtering Data

ACL provides powerful options for filtering data that support various audit tests.
Filters are expressions that search for records that meet the filter criteria. ACL’s
expression builder allows the auditor to use logical operators such as AND, OR, , , NOT
and others to define and test conditions of any complexity and to process only those
records that match specific conditions.

Chapter 8 - Data Structures and CAATTs for Data Extraction


28

Chapter 8 - Data Structures and CAATTs for Data Extraction


29

Stratifying Data

ACL’s stratification feature allows the auditor to view the distribution of records
that fall into specified strata. Data can be stratified on any numeric field such as sales
price, unit cost, quantity sold, and so on. The data are summarized and classified by
strata, which can be equal in size (called intervals) or vary in size (called free).

Statistical Analysis

ACL offers many sampling methods for statistical analysis. Two of the most
frequently used are record sampling and monetary unit sampling (MUS). Each method
allows random and interval sampling. The choice of methods will depend on the
auditor’s strategy and the composition of the file being audited. On one hand, when
records in a file are fairly evenly distributed across strata, the auditor may want an
unbiased sample and will thus choose the record sample approach. Using inventory to
illustrate, each record, regardless of the dollar amount of the inventory value field, has
an equal chance of being included in the sample. On the other hand, if the file is heavily
skewed with large value items, the auditor may select MUS, which will produce a sample
that includes all the larger dollar amounts.

Chapter 8 - Data Structures and CAATTs for Data Extraction

You might also like