0% found this document useful (0 votes)

308 views

(Chapter 8) Data Structures and CAATTs For Data Extraction

Uploaded by

Trayle Heart

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

308 views

(Chapter 8) Data Structures and CAATTs For Data Extraction

Uploaded by

Trayle Heart

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

UNIVERSITY OF RIZAL SYSTEM

Province of Rizal
Binangonan Campus
COLLEGE OF ACCOUNTANCY

CHAPTER 8:
Data Structures and CAATTs for
Data Extraction

Submitted by:
Anore, Louis Antoinette G.
Biteranta, Janelle B.
Grantusa, Claire Euniz E.
Haral, Ma. Veronica B.
Quisquino, Angel Rose A.

BSA 3-2

Submitted to:
Prof. Liberty T. Ocampo, CPA, MBA
1

LEARNING OBJECTIVES:

After studying this chapter, you should:

● Understand the concepts of data structures and how these are used to achieve
data-processing operations.
● Be familiar with structures used in flat-file systems, including sequential,
indexes, hashing, and pointer structures.
● Be familiar with relational database structures and the principles of
normalization.
● Understand the features, advantages, and disadvantages of the embedded audit
module approach to data extraction.
● Know the capabilities and primary features of generalized audit software.
● Become familiar with the more commonly used features of ACL.

This chapter delves into the intricate realm of data structures and the utilization
of Computer-Assisted Audit Techniques (CAATTs) for data extraction and analysis. It
commences with a comprehensive examination of data structures, emphasizing their
significance in both physical and logical data organization within files and databases.
Various structures such as flat-file, navigational database, and notably, relational
database structures are scrutinized, with a particular focus on relational databases due
to their prevalence in contemporary business environments.

Another important use of data extraction software is in performing substantive

tests. Most audit testing occurs in the substantive-testing phase of the audit. These
procedures are called substantive tests because they are used for, but not limited to, the
following:

• Determining the correct value of inventory.

• Determining the accuracy of prepayments and accruals.
• Confirming accounts receivable with customers.
• Searching for unrecorded liabilities.

CAATTs for data extraction software fall into two general categories: embedded audit
modules and general audit software. The chapter describes the features,
advantages, and disadvantages of the embedded audit module (EAM) approach. It then
outlines typical functions and uses of generalized audit software (GAS). The chapter
closes with a review of the key features of ACL (audit command language), the leading
product in the GAS market.

Chapter 8 - Data Structures and CAATTs for Data Extraction

DATA STRUCTURES

Data structures have two fundamental components: organization and access method.
Organization refers to the way records are physically arranged on the secondary
storage device. This may be either sequential or random. The records in sequential files
are stored in contiguous locations that occupy a specified area of disk space. Records in
random files are stored without regard for their physical relationship to other records
of the same file. Random files may have records distributed throughout a disk. The
access method is the technique used to locate records and to navigate through the
database or file. While several specific techniques are used, in general, they can be
classified as either direct access or sequential access methods.

Flat-File Structures
The flat-file approach is a single view model that characterizes legacy systems. Data files
are structured, format- ted, and arranged to suit the specific needs of the owner or
primary user. Such structur- ing, however, may omit or corrupt data attributes that are
essential to other users, thus preventing successful integration of systems across the
organization.

SEQUENTIAL STRUCTURE
This refers to organizing data in a file in a specific sequence based on the
values of their primary keys, such as ascending or descending order. Records are
stored one after another in contiguous storage spaces.

Sequential files are efficient for tasks like reading the entire file or
processing a large portion of it at once, such as updating numerous records in
bulk. However, they become inefficient for small operations or when direct access
to specific records is required. Retrieving a single record necessitates scanning
through all preceding records, making it impractical for such tasks. Direct access
is limited in sequential files, requiring alternative data structures for tasks
needing it.

Chapter 8 - Data Structures and CAATTs for Data Extraction

INDEXED STRUCTURE
An indexed structure comprises both the data file and a separate index file
containing record addresses. Each entry in the index file corresponds to a record
in the data file and contains the numeric location of that record's physical storage
on the disk.
In an indexed random file, records are stored scattered across the disk
without regard for their physical proximity. Searching and retrieval are facilitated
by an index that maps key values to storage locations. Adding records involves
selecting an available disk location and updating the index accordingly. Indexes
can be organized either sequentially or randomly, with sequential indexes
offering rapid search capabilities despite being more complex to maintain.
Algorithms can be utilized to expedite searches through the index, which is
particularly advantageous for large data files.

Chapter 8 - Data Structures and CAATTs for Data Extraction

Virtual Storage Access Method (VSAM)

VSAM is utilized for very large files requiring routine batch processing
alongside moderate individual record processing, such as customer files in public
utility companies. VSAM's sequential organization facilitates efficient batch
processing and allows direct access through the use of indexes.VSAM files often
occupy multiple cylinders of contiguous disk storage. The system uses indexes to
summarize the contents of each cylinder, facilitating direct access. While VSAM
indexes don't provide exact physical addresses for records, they identify the disk
track where the record resides, requiring a sequential search within that track.

Chapter 8 - Data Structures and CAATTs for Data Extraction

HASHING STRUCTURE
A hashing structure utilizes an algorithm to directly convert a record's
primary key into a storage address, eliminating the need for a separate index.
This approach streamlines record retrieval by calculating addresses rather than
referencing an index.
Hashing algorithms divide primary keys by a prime number to generate
storage locations, enabling faster record retrieval compared to indexed
structures. This method bypasses index searches by algorithmically determining
storage addresses. For instance, in an inventory file with 100,000 items, the
inventory number is divided by a prime number to determine the storage
location. Hashing often results in random file organization due to the dispersed
nature of record addresses across the storage device.

● Advantage of Access Speed: The primary advantage of hashing is its fast access
speed. Calculating a record's address is quicker than searching for it through an
● Disadvantages of Hashing:
1. Inefficient Storage Usage: Hashing does not utilize storage space efficiently.
The chosen storage location for a record is determined solely by its primary key
value, leading to potential wastage of disk space, with as much as one-third of the
disk pack being unused.
2. Collision Problem: Different record keys may produce the same or similar
residuals, resulting in collisions where two records attempt to occupy the same
storage location. This slows down access to records and complicates record
deletion operations.

Chapter 8 - Data Structures and CAATTs for Data Extraction

POINTER STRUCTURES
Figure 8.6 illustrates the pointer structure used to create linked-list files.
This method involves storing the address (pointer) of a related record in a field
within each record. Records are spread across the disk without regard for their
physical proximity, with pointers establishing connections between them. The
last record in the list typically contains an end-of-file marker.

Pointers within a file can link records together, as seen in the example
where Record 124 points to Record 125, Record 125 points to 126, and so forth.
This creates a sequence of linked records facilitating traversal through the file.
Pointers may also link records between different files, as depicted in Figure 8.7.
For instance, an accounts receivable (AR) record may have pointers linking it to
the next AR record within the AR file and to associated sales invoices and
remittance advice records. This facilitates accessing related records across
multiple files. By accessing an accounts receivable record, all associated sales
invoices and remittances can be located, aiding in account management and
transaction tracking. New transaction records are appended to the appropriate
linked-list, with their addresses stored in preceding records for future access.

Chapter 8 - Data Structures and CAATTs for Data Extraction

● Physical Address Pointer: This type of pointer contains the actual disk
storage location (cylinder, surface, and record number) required by the disk
controller to access the record directly. It offers speed advantages as no further
manipulation is needed to determine the record's location. However, it poses two
disadvantages: (1) If the related record is moved, the pointer must be updated,
which can be problematic during disk reorganizations or copies; (2) Physical
pointers lack a logical relationship to the records they identify, meaning if a
pointer is lost, the referenced record is also lost.
● Relative Address Pointer: A relative address pointer specifies the position of
a record within the file, such as the 135th record. To access the record, further
manipulation is required to convert the relative address to the actual physical
address. Conversion software calculates this using the physical address of the
file's beginning, the record length, and the relative address being sought.
● Logical Key Pointer: This pointer contains the primary key of the related
record. A hashing algorithm converts this key value into the record's physical
address. This method allows for quick access to the record by utilizing the logical
relationship between keys and their corresponding records.

Chapter 8 - Data Structures and CAATTs for Data Extraction

Hierarchical and Network Database Structures

● Early hierarchical and network database models employed many of the preceding
flat-file techniques as well as new proprietary database structures. A major
difference between the two approaches is the degree of process integration
and data sharing that can be achieved.
● Two-dimensional flat files exist as independent data structures that are not
linked logically or physically to other files. Database models were designed to
support flat-file systems already in place while allowing the organization to move
to new levels of data integration.

Hierarchical Database Model — is a database model in which the data are

organized into a tree-like structure.
Network Database Model — is a database model conceived as a flexible way of
representing objects and their relationships.

Figure 1 - Hierarchical Model

Chapter 8 - Data Structures and CAATTs for Data Extraction

Figure 2 - Network Model

Relational Database Structure, Concepts, and Terminology

● Relational databases are based on the indexed sequential file structure. This
structure uses an index in conjunction with a sequential file organization.
● Multiple indexes can be used to create a cross-reference, called an inverted list,
which allows even more flexible access to data.

Relational Database Theory

● This model has its foundations in relational algebra and set theory, which
provide the theoretical basis for most of the data manipulation operations used.
Accordingly, a system is relational if it:
1. Represents data in the form of two-dimensional tables.
2. Supports the relational algebra functions of restrict, project, and join.

❖ Restrict: Extracts specified rows from a specified table. This operation creates a virtual table
that is a subset of the original table.
❖ Project: Extracts specified attributes (columns) from a table to create a virtual table.
❖ Join: Builds a new physical table from two tables consisting of all concatenated pairs of
rows, from each table.

Chapter 8 - Data Structures and CAATTs for Data Extraction

Relational Database Concepts

Entity, Occurrence, and Attributes

● An entity is anything about which the organization wishes to capture data.
Entities may be physical, such as inventories, customers, or employees.
● This data model is the blueprint for ultimately creating the physical database.
The graphical representation used to depict the model is called an entity
relationship (ER) diagram.
● The term occurrence is used to describe the number of instances or records
that pertain to a specific entity.
● Attributes are the data elements that define an entity. For example, an
Employee entity may be defined by the following partial set of attributes: Name,
Address, Job Skill, Years of Service, and Hourly Rate of Pay.

Associations and Cardinality

● The labeled line connecting two entities in a data model describes the nature of
the association between them. This association is represented with a verb,
such as ships, requests, or receives.
● Cardinality is the degree of association between two entities. Simply stated,
cardinality describes the number of possible occurrences in one table that are
associated with a single occurrence in a related table.
○ Four basic forms of cardinality are possible: zero or one (0,1), one and
only one (1,1), zero or many (0,M), and one or many (1,M).
These are combined to represent logical associations between entities.

Figure 3 - Relational Model

Chapter 8 - Data Structures and CAATTs for Data Extraction

The Physical Database Tables

● Physical database tables are constructed from the data model with each entity
in the model being transformed into a separate physical table. Across the top of
each table are attributes forming columns. Intersecting the columns to form the
rows of the table are tuples (record/occurrence).
● Properly designed tables possess the following four characteristics:
1. The value of at least one attribute in each occurrence (row) must be
unique. This attribute is the primary key.
2. All attribute values in any column must be of the same class.
3. Each column in a given table must be uniquely named.
4. Tables must conform to the rules of normalization.

Linkages between Relational Tables

● Logically related tables need to be physically connected to achieve the
associations described in the data model. Using foreign keys accomplishes this, as
illustrated in Figure 8.14.

User Views
● A user view is the set of data that a particular user sees. Examples of user views
are computer screens for entering or viewing data, management reports, or
source documents such as an invoice.

Chapter 8 - Data Structures and CAATTs for Data Extraction

Anomalies, Structural Dependencies, and Data Normalization

Database Anomalies
● Improperly normalized tables can cause DBMS processing problems that restrict,
or even deny, users access to the information they need. Such tables exhibit
negative operational symptoms called anomalies.
1. Update Anomaly. The update anomaly results from data redundancy in
an unnormalized table.
2. Insertion Anomaly. Assume a new vendor enters a marketplace. The
organization does not yet purchase from the vendor, but may wish to add
the vendor to the database. This is not possible, however, because the
primary key for the Inventory table is PART NUM—the supplier data
cannot be added to the table.
3. Deletion Anomaly. The deletion anomaly involves the unintentional
deletion of data from a table.
❖ One or more of these anomalies will exist in tables that are not normalized or
are normalized at a low level, such as first normal form (1NF) or second
normal form (2NF). To be free of anomalies, tables must be normalized to the
third normal form (3NF) level.

Normalizing Tables
● The database anomalies described above are symptoms of structural problems
within tables called dependencies. Specifically, these are known as repeating
groups, partial dependencies, and transitive dependencies. The
normalization process involves identifying and removing structural dependencies
from the table(s) under review.

The resulting tables will then meet the two conditions below:
1. All nonkey (data) attributes in the table are dependent on (defined by) the
primary key.
2. All nonkey attributes are independent of the other nonkey attributes.

3NF - Third Normal Form Table is a database schema design approach for
relational, databases which uses normalizing principles to reduce the duplication of
data, avoid data anomalies, ensure referential integrity, and simplify

Linking Normalized Tables

● When unnormalized tables are split into multiple 3NF tables, they need to be
linked together via foreign keys so the data in them can be related and made
accessible to users. The degree of association (joint cardinality) between the

Chapter 8 - Data Structures and CAATTs for Data Extraction

resulting tables (that is, 1:1, 1:M, or M:M) determines how the linking occurs. The
key-assignment rules for linking tables are discussed next.

Normalize Database Tables - A normalized database table is structured in a way

that minimizes redundancy and dependency by organizing the data into separate
tables and establishing relationships between them. This reduces data duplication and
improves data integrity and efficiency in database operations.

Keys in 1:M Associations

Where a 1:M (or 1:0,M) association exists, the primary key of the 1 side is embedded in
the table of the M side. To demonstrate the logic behind this key-assignment rule,
consider two alternative business rules for purchasing inventory from suppliers.

● Business Rule 1. Each vendor supplies the firm with three (or fewer) different
items of inventory, but each item is supplied by only one vendor.
● Business Rule 2. Each vendor supplies the firm with any number of inventory
items, but each item is supplied by only one vendor.

Chapter 8 - Data Structures and CAATTs for Data Extraction

Keys in M:M Associations

To represent the M:M association between tables, a link table needs to be created. The
link table has a combined (composite) key consisting of the primary keys of two related
tables.

Chapter 8 - Data Structures and CAATTs for Data Extraction

● Business Rule 3. Each vendor supplies the firm with any number of inventory
items, and each item may be supplied by any number of vendors.

Auditors and Data Normalization

● Database normalization is a technical matter that is usually the responsibility of
systems professionals. The subject, however, has implications for internal control
that make it the concern of auditors also.
● The auditor needs to know how the data are structured before he or she can
extract data from tables to perform audit procedures.

DESIGNING RELATIONAL DATABASES

● Database design is a component of a much larger systems development

process that involves extensive analysis of user needs.

SIX PHASES OF DATABASE DESIGN

1. Identify entities.
2. Construct a data model showing entity associations.
3. Add primary keys and attributes to the model.
4. Normalize the data model and add foreign keys.
5. Construct the physical database.
6. Prepare the user views.

Identify Entities
View modeling begins by identifying the primary entities of the business function in
question. Recall that entities are things about which the organization wishes to capture
data.

KEY features of a simplified purchasing system:

1. The purchasing agent reviews the inventory status report (Figure 8.15) for
items that need to be reordered.
2. The agent selects a supplier and prepares an online purchase order.
3. The agent prints a copy of the purchase order (Figure 8.20a) and sends it
to the supplier.
4. The supplier ships inventory to the company. Upon its arrival, the
receiving clerk inspects the inventory and prepares an online receiving
report (Figure 8.20b). The computer system automatically updates the
inventory records.

Chapter 8 - Data Structures and CAATTs for Data Extraction

Entities are represented as nouns in a system description, A number of candidate

entities can be identified in the previous description: Purchasing Agent, Receiving Clerk,
Inventory, Supplier, Inventory Status Report, Purchase Order, and Receiving Report.
Not all of these candidates are true entities that need to be modeled in the database. To
pass as valid entities, two conditions need to be met:

● Condition 1. An entity must consist of two or more occurrences.

● Condition 2. An entity must contribute at least one attribute that is not
provided through other entities.

We need to test these conditions for each candidate to eliminate any false entities.
● Purchasing Agent. We need to determine what data about the agent that is
unique to his or her role of order placing needs to be captured.
● Receiving Clerk. We will assume that no clerk-specific data needs to be
captured that requires a dedicated table.
● Inventory. The Inventory entity meets both conditions. The description
suggests that the organization holds many items of inventory; thus this entity
would contain multiple occurrences.
● Supplier. The description states that multiple vendors supply inventory; hence
the Supplier entity meets the first condition. We can also assume that it meets the
second condition since no other entity would logically provide supplier data. The
Supplier entity, therefore, will be included in the data model.
● Inventory Status Report. The Inventory Status Report is a user view derived
from the Inventory and Supplier entities (see Figure 8.17). While it contains
multiple occurrences, it is not an entity because it does not satisfy Condition 2.
The view is derived entirely from existing entities and provides no additional data
that requires a separate entity. The view will be carefully analyzed, however, to
ensure that all the attributes needed for it are included in the existing entities.
● Purchase Order. The Purchase Order entity meets both conditions. Many
purchase orders will be processed in a period; thus the entity will have many
occurrences.
● Receiving Report. The Receiving Report meets both conditions. Many receipt
events will take place in the period; thus a Receiving Report entity will have
multiple occurrences. A Receiving Report entity will contain attributes such as
date received and quantity received that are unique to this entity and thus not
provided by other entities in the model.

Chapter 8 - Data Structures and CAATTs for Data Extraction

Construct a Data Model Showing Entity Associations

The next step in view modeling is to determine the associations between entities and
document them with an ER diagram. Recall that associations represent business rules.
Sometimes the rules are obvious and are the same for all organizations.

The underlying business rules.

1. There is a 0,M:M association between the Purchase Order and Inventory entities.
This means that each inventory item may have been ordered many times or never
ordered in the current business period.
2. There is an M:M association between the Inventory and Supplier entities. This
means that one or more vendors supply each inventory item, and each of them
supplies one or more items of inventory.
3. There is a 1:0,M association between the Supplier and the Purchase Order
entities.This means that in the current period, each supplier may have received
zero or many purchase orders, but each order goes to only one supplier.
4. There is a 1:1 association between the Purchase Order and Receiving Report
entities. A single receiving report record reflects the receipt of goods that are
specified on a single purchase order record. Multiple purchase orders are not
combined on a single receiving report.

Chapter 8 - Data Structures and CAATTs for Data Extraction

5. The association between the Receiving Report and Inventory entities is 0,M:M.
This signifies that within the period, each item of inventory may have been
received many times or never. Also, each receiving report is associated with at
least one and possibly many inventory items.

Add Primary Keys and Attributes to the Model

Add Primary Keys. The next step in the process is to assign primary keys to the
entities in the model. The analyst should select a primary key that logically defines the
nonkey attributes and uniquely identifies each occurrence in the entity.

Add Attributes. Every attribute in an entity should appear directly or indirectly (a

calculated value) in one or more user views. Entity attributes are, therefore, originally
derived and modeled from user views. In other words, if stored data are not used in a
document, report, or a calculation that is reported in some way, then it serves no
purpose and should not be part of the database.

Normalize Data Model and Add Foreign Keys

1. Repeating Group Data in Purchase Order. The attributes Part Number,

Description, Order Quantity, and Unit Cost are repeating group data. This means that
when a particular purchase order contains more than one item, then multiple values will
need to be captured for these attributes. To resolve this, these repeating group data were
removed to a new PO Item Detail entity. The new entity was assigned a primary key that
is a composite of Part Number and PO Number. The creation of the new entity also

Chapter 8 - Data Structures and CAATTs for Data Extraction

resolved the M:M association between the Purchase Order and Inventory entities by
providing a link.

2. Repeating Group Data in Receiving Report. The attributes Part Number,

Quantity Received, and Condition Code are repeating groups in the Receiving Report
entity and were removed to a new entity called Rec Report Item Detail. A COMPOSITE
KEY composed of PART NUMBER and REC REPT NUMBER was assigned.

3. Transitive Dependencies. The Purchase Order and Receiving Report entities

contain attributes that are redundant with data in the Inventory and Supplier entities.
These redundancies occur because of transitive dependencies in the Purchase Order and
Receiving Report entities and are dropped.

Chapter 8 - Data Structures and CAATTs for Data Extraction

Construct the Physical Database

Chapter 8 - Data Structures and CAATTs for Data Extraction

The next step is to create the physical tables and populate them with data. This is an
involved step that must be carefully planned and executed and may take many months
in a large installation. Programs will need to be written to transfer organization data
currently stored in flat files or legacy databases to the new relational tables. Data
currently stored on paper documents may need to be entered into the database tables
manually. Once this is done, the physical user views can be produced.

Prepare the User Views

The normalized tables should be rich enough to support the views of all users of the
system being modeled. To illustrate the relationship, the fields in the user view are
cross-referenced via circled numbers to the attributes in the supporting tables. Keep in
mind that these tables may also provide data for many other views not shown here, such
as the receiving report, purchase requisition listing, inventory status report, and vendor
purchases activity report.

Global View Integration

The view modeling process described previously pertained to only one business
function— the purchases system—and the resulting tables and views constitute only
a subschema of the overall database schema. A modern company, however, would need
hundreds or thousands of views and associated tables. Combining the data needs of all
users into a single schema or enterprise-wide view is called view integration. This is a
daunting undertaking when creating the entire database from scratch.

EMBEDDED AUDIT MODULE

● An EAM is a specially programmed module embedded in a host application to

capture predetermined transaction types for subsequent analysis.
● The objective of the embedded audit module (EAM), also known as
continuous auditing, is to identify important transactions while they are being
processed and extract copies of them in real time.

Chapter 8 - Data Structures and CAATTs for Data Extraction

● The EAM approach allows selected transactions to be captured throughout the

audit period.
● Captured transactions are made available to the auditor in real time, at period
end, or at any time during the period, thus significantly reducing the amount of
work the auditor must do to identify significant transactions for substantive
testing.

Disadvantages of EAMs

Operational Efficiency
● From the user’s point of view, EAMs decrease operational performance. The
presence of an audit module within the host application may create significant
overhead, especially when the amount of testing is extensive.

Verifying EAM Integrity

● The EAM approach may not be a viable audit technique in environments with a
high level of program maintenance. When host applications undergo frequent
changes, the EAMs embedded within the hosts will also require frequent
modifications. The integrity concerns raised earlier regarding application
maintenance apply equally to EAMs. The integrity of the EAM directly affects the
quality of the audit process.

Chapter 8 - Data Structures and CAATTs for Data Extraction

GENERALIZED AUDIT SOFTWARE

Generalized audit software (GAS) is the most widely used CAATT for IS auditing. GAS
allows auditors to access electronically coded data files and perform various operations
on their contents. Some of the more common uses for GAS include:

• Footing and balancing entire files or selected data items

• Selecting and reporting detailed data contained in files
• Selecting stratified statistical samples from data files
• Formatting results of tests into reports
• Printing confirmations in either standardized or special wording
• Screening data and selectively including or excluding items
• Comparing multiple files and identifying any differences
• Recalculating data fields

The widespread popularity of GAS is due to four factors:

(1) GAS languages are easy to use and require little computer background on the
part of the auditor;
(2) many GASproducts can be used on both mainframe and PC systems;
(3) auditors can perform their tests independent of the client’s computer service
staff; and
(4) GAS can be used to audit the data stored in most file structures and formats.

Using GAS to Access Simple Structures

Gaining access to flat-file structures is a relatively simple process, as illustrated in

Figure below. In this example, an inventory file is read directly by the GAS, which
extracts key information needed for the audit, including the quantity on hand, the dollar
value, and the warehouse location of each inventory item. The auditor’s task is to verify
the existence and value of the inventory by performing a physical count of a
representative sample of the inventory on hand. Thus, on the basis of a materiality
threshold provided by the auditor, the GAS selects the sample records and prepares a
report containing the needed information.

Chapter 8 - Data Structures and CAATTs for Data Extraction

Using GAS to Access Complex Structures

Gaining access to complex structures, such as a hashed file or other form of

random file,may pose a problem for the auditor. Not all GAS products on the market
may be capable of accessing every type of file structure. If the CAATT in question is
unable to deal with a complex structure, the auditor may need to appeal to systems
professionals to write a special program that will copy the records from their actual
structure to a flat-file sequential structure for easy retrieval. Figure below illustrates this
approach.

Chapter 8 - Data Structures and CAATTs for Data Extraction

To illustrate the file flattening process, consider the complex database structure
presented in Figure 8.29.

Audit Issues Pertaining to the Creation of Flat Files

The auditor must sometimes rely on computer services personnel to produce a

flat file from the complex file structures. There is a risk that data integrity will be
compromised by the procedure used to create the flat file. For example, if the auditor’s
objective is to confirm accounts receivable, certain fraudulent accounts in the complex
structure may be intentionally omitted from the flat-file copy that is created. The sample
of confirmations drawn from the flat file may therefore be unreliable. Auditors skilled in
programming languages may avoid this potential pitfall by writing their own data
extraction routines.

ACL SOFTWARE

In the past, public accounting firms developed proprietary versions of GAS,

which they used in the audits of their clients. More recently, software companies have
serviced this market. Among them, ACL (audit command language) is the leader in the
industry. ACL was designed as a meta-language for auditors to access data stored in
various digital formats and to test them comprehensively. In fact, many of the problems
associated with accessing complex data structures have been solved by ACL’s Open
Database Connectivity (ODBC) interface.

Chapter 8 - Data Structures and CAATTs for Data Extraction

Data Definition

We have already established that a client’s system may store data using a number
of flat file or database structures, including sequential files, VSAM files, linked lists, and
relational tables. One of ACL’s strengths is the ability to read data stored in most
formats. ACL uses the data definition feature for this purpose. To create a data
definition, the auditor needs to know both where the source file physically resides and
its field structure layout. Small files can be imported via text files or spreadsheets. Very
large files may need to be accessed directly from the mainframe computer.

Chapter 8 - Data Structures and CAATTs for Data Extraction

Customizing a View

A view is simply a way of looking at data in a file; auditors seldom need to use all
the data contained in a file. ACL allows the auditor to customize the original view
created during data definition to one that better meets his or her audit needs. The
auditor can create and reformat new views without changing or deleting the data in the
underlying file. Only the presentation of the data is affected.

Filtering Data

ACL provides powerful options for filtering data that support various audit tests.
Filters are expressions that search for records that meet the filter criteria. ACL’s
expression builder allows the auditor to use logical operators such as AND, OR, , , NOT
and others to define and test conditions of any complexity and to process only those
records that match specific conditions.

Chapter 8 - Data Structures and CAATTs for Data Extraction

Stratifying Data

ACL’s stratification feature allows the auditor to view the distribution of records
that fall into specified strata. Data can be stratified on any numeric field such as sales
price, unit cost, quantity sold, and so on. The data are summarized and classified by
strata, which can be equal in size (called intervals) or vary in size (called free).

Statistical Analysis

ACL offers many sampling methods for statistical analysis. Two of the most
frequently used are record sampling and monetary unit sampling (MUS). Each method
allows random and interval sampling. The choice of methods will depend on the
auditor’s strategy and the composition of the file being audited. On one hand, when
records in a file are fairly evenly distributed across strata, the auditor may want an
unbiased sample and will thus choose the record sample approach. Using inventory to
illustrate, each record, regardless of the dollar amount of the inventory value field, has
an equal chance of being included in the sample. On the other hand, if the file is heavily
skewed with large value items, the auditor may select MUS, which will produce a sample
that includes all the larger dollar amounts.

Chapter 8 - Data Structures and CAATTs for Data Extraction

Chap 3 AIS
100% (1)
Chap 3 AIS
14 pages
Amazon's Fire Phone Failure: Assignment: IT Business Failure January 25, 2016
No ratings yet
Amazon's Fire Phone Failure: Assignment: IT Business Failure January 25, 2016
4 pages
BADVAC1X MOD 1 Business Combi
No ratings yet
BADVAC1X MOD 1 Business Combi
4 pages
Exercises
No ratings yet
Exercises
16 pages
AUD CIS CH 1-6
No ratings yet
AUD CIS CH 1-6
25 pages
Far Eastern University - Manila: Open Using Adobe Reader and PC
No ratings yet
Far Eastern University - Manila: Open Using Adobe Reader and PC
5 pages
EDP Auditing Week 13 - Business Ethics and Fraud
No ratings yet
EDP Auditing Week 13 - Business Ethics and Fraud
23 pages
Chapter 4 Notes Hall Aud Cis
No ratings yet
Chapter 4 Notes Hall Aud Cis
3 pages
General de Jesus College
No ratings yet
General de Jesus College
12 pages
What Are The Basic Benefits and Purposes of Develo
No ratings yet
What Are The Basic Benefits and Purposes of Develo
7 pages
Handouts 04.04 - Part 3
No ratings yet
Handouts 04.04 - Part 3
3 pages
Auditing Reviewer and Answers
No ratings yet
Auditing Reviewer and Answers
15 pages
Unit III Financial Reporting in Hyperinflationary Ecnomies
No ratings yet
Unit III Financial Reporting in Hyperinflationary Ecnomies
6 pages
Bsa201 CH12 Quiz
No ratings yet
Bsa201 CH12 Quiz
2 pages
Applied Auditing Module 2
No ratings yet
Applied Auditing Module 2
6 pages
AUD CIS QUIZ 2 Answer Key
No ratings yet
AUD CIS QUIZ 2 Answer Key
8 pages
XD AT.06.10 - Selecting Items For Testing and Audit Sampling
No ratings yet
XD AT.06.10 - Selecting Items For Testing and Audit Sampling
5 pages
Interim Exam Questionnaire - Solution Manual
No ratings yet
Interim Exam Questionnaire - Solution Manual
22 pages
ReSA MAS 1805 Final Preboard
No ratings yet
ReSA MAS 1805 Final Preboard
13 pages
Conceptual Framework and Accounting Standards
No ratings yet
Conceptual Framework and Accounting Standards
4 pages
Chapter 13 (Part 2)
No ratings yet
Chapter 13 (Part 2)
25 pages
Chapter 10 The Rea Approach To Business Process Modeling
67% (3)
Chapter 10 The Rea Approach To Business Process Modeling
13 pages
Chapter 10 Test Bank
100% (1)
Chapter 10 Test Bank
48 pages
CHAPTER 6 Questions and Answers
No ratings yet
CHAPTER 6 Questions and Answers
3 pages
Week 4 - Lesson 4 Cash and Cash Equivalents
No ratings yet
Week 4 - Lesson 4 Cash and Cash Equivalents
21 pages
Premiums and Warranty
100% (1)
Premiums and Warranty
2 pages
Modules1 20 Merged
No ratings yet
Modules1 20 Merged
84 pages
DQ Cost Acctg Semi-Finals
No ratings yet
DQ Cost Acctg Semi-Finals
9 pages
Audprob Quizzers
No ratings yet
Audprob Quizzers
12 pages
Ae34 (Business Law & Regulations) : Obligations of The Vendee
100% (1)
Ae34 (Business Law & Regulations) : Obligations of The Vendee
2 pages
Subs Testing Proc AAA
No ratings yet
Subs Testing Proc AAA
18 pages
Substabtive Test of Cash
No ratings yet
Substabtive Test of Cash
14 pages
Chapter 7 Auditing in A Computerized Environment
No ratings yet
Chapter 7 Auditing in A Computerized Environment
25 pages
Auditing Problems Preweek Batch 4
No ratings yet
Auditing Problems Preweek Batch 4
13 pages
The Conversion Cycle: Principles of Accounting Information Systems, Asia Edition
No ratings yet
The Conversion Cycle: Principles of Accounting Information Systems, Asia Edition
31 pages
Aasi Compilation
No ratings yet
Aasi Compilation
23 pages
Chapter 4 Auditing Database Systems (Multiple Choice)
100% (1)
Chapter 4 Auditing Database Systems (Multiple Choice)
3 pages
19 Special Audit & Non-Audit Considerations
No ratings yet
19 Special Audit & Non-Audit Considerations
7 pages
At 12 PDF
0% (2)
At 12 PDF
3 pages
Chapter 15
No ratings yet
Chapter 15
1 page
Auditing Operating System and Network Multiple Choice Questions (With Answers) Micah Caraggayan
100% (1)
Auditing Operating System and Network Multiple Choice Questions (With Answers) Micah Caraggayan
6 pages
The REA Approach To Business Process Modeling: Accounting Information Systems, 5
No ratings yet
The REA Approach To Business Process Modeling: Accounting Information Systems, 5
37 pages
Icare 08 Ap Final PB and Answer Key
No ratings yet
Icare 08 Ap Final PB and Answer Key
6 pages
Quiz Auditing
No ratings yet
Quiz Auditing
11 pages
Chapter 18 Part 1
No ratings yet
Chapter 18 Part 1
61 pages
CDD Final-Examination RFBT-Sales
No ratings yet
CDD Final-Examination RFBT-Sales
22 pages
2. Tax issues involving preferred stock
No ratings yet
2. Tax issues involving preferred stock
2 pages
The Expenditure Cycle Part 1: Purchases and Cash Disbursements Procedures
No ratings yet
The Expenditure Cycle Part 1: Purchases and Cash Disbursements Procedures
26 pages
Auditing Theory - Test Bank
No ratings yet
Auditing Theory - Test Bank
15 pages
Learning Objective 11-1: Chapter 11 Considering The Risk of Fraud
100% (1)
Learning Objective 11-1: Chapter 11 Considering The Risk of Fraud
25 pages
Export-Order-Process 20240312 142622 0000
No ratings yet
Export-Order-Process 20240312 142622 0000
33 pages
Chapter 11 AIS
No ratings yet
Chapter 11 AIS
4 pages
ACCTG3225 - M9 - Auditing Business Functions and Activities - Accounting
No ratings yet
ACCTG3225 - M9 - Auditing Business Functions and Activities - Accounting
5 pages
AFAR - Comphrensive Exam Reviewer
No ratings yet
AFAR - Comphrensive Exam Reviewer
23 pages
Smartbooks Advance With Analytics Workbook v2023
No ratings yet
Smartbooks Advance With Analytics Workbook v2023
18 pages
Chapter 5: The Expenditure Cycle Part 1: Purchases and Cash Disbursements Procedures
No ratings yet
Chapter 5: The Expenditure Cycle Part 1: Purchases and Cash Disbursements Procedures
7 pages
Z17520020220174028New PPT Ch10
No ratings yet
Z17520020220174028New PPT Ch10
30 pages
Data Structure-Physical and Logical Arrangement of Data in Files or Databases
No ratings yet
Data Structure-Physical and Logical Arrangement of Data in Files or Databases
5 pages
Chapter-8 Narrative Report PDF
No ratings yet
Chapter-8 Narrative Report PDF
9 pages
2 Lecture Notes Chapter8 CAATTS Data Extraction Analysis
No ratings yet
2 Lecture Notes Chapter8 CAATTS Data Extraction Analysis
34 pages
Data Structures and Caatts For Data Extraction
No ratings yet
Data Structures and Caatts For Data Extraction
33 pages
Module 4 Valuation and Concepts
No ratings yet
Module 4 Valuation and Concepts
10 pages
Module 2 Valuation and Concepts
No ratings yet
Module 2 Valuation and Concepts
7 pages
Law 3 Unit 3 Module 8.1 (1)
No ratings yet
Law 3 Unit 3 Module 8.1 (1)
10 pages
Module 1 Vluation and Concepts
No ratings yet
Module 1 Vluation and Concepts
18 pages
Module 10
No ratings yet
Module 10
12 pages
FORMALITIES-INTERPRETATION-AND-REFORMATION-OF-CONTRACT
No ratings yet
FORMALITIES-INTERPRETATION-AND-REFORMATION-OF-CONTRACT
4 pages
Chapter 6
No ratings yet
Chapter 6
19 pages
Module 9
No ratings yet
Module 9
15 pages
ACCTG ELEC 1 Module 6
No ratings yet
ACCTG ELEC 1 Module 6
12 pages
ACCTG ELEC 1 Module 10
No ratings yet
ACCTG ELEC 1 Module 10
16 pages
CHAPTER 5-Systems Development and Program Change Activities
No ratings yet
CHAPTER 5-Systems Development and Program Change Activities
30 pages
ACCTG ELEC 1 Module 8
No ratings yet
ACCTG ELEC 1 Module 8
13 pages
Health-Www.10 Q3 Students Mod1 HealthTrendsIssues
No ratings yet
Health-Www.10 Q3 Students Mod1 HealthTrendsIssues
40 pages
MSDG Problems Perpetual
No ratings yet
MSDG Problems Perpetual
2 pages
Untitled
No ratings yet
Untitled
32 pages
Business Plan 3
No ratings yet
Business Plan 3
62 pages
Jaraza - Unit Two Assignment
No ratings yet
Jaraza - Unit Two Assignment
13 pages
Economic Integration
No ratings yet
Economic Integration
11 pages
Monzales - Case Digest
No ratings yet
Monzales - Case Digest
2 pages
SINAMICS Startdrive V16 Supported Drives and Functions en
No ratings yet
SINAMICS Startdrive V16 Supported Drives and Functions en
9 pages
International Journal of Data Mining & Knowledge Management Process (IJDKP)
No ratings yet
International Journal of Data Mining & Knowledge Management Process (IJDKP)
3 pages
MS WORD - Shortcut keys
No ratings yet
MS WORD - Shortcut keys
3 pages
Answer Sheet Final 1
No ratings yet
Answer Sheet Final 1
5 pages
MCA_Sysnopsis_Guidelines
No ratings yet
MCA_Sysnopsis_Guidelines
5 pages
SRS as per the IEEE
No ratings yet
SRS as per the IEEE
2 pages
Client VPN OS Configuration
No ratings yet
Client VPN OS Configuration
57 pages
4.how To Secure A Network With Linux
No ratings yet
4.how To Secure A Network With Linux
10 pages
Agile Software Development 05
No ratings yet
Agile Software Development 05
19 pages
HTML Notes For Class 10 Cbse 2
No ratings yet
HTML Notes For Class 10 Cbse 2
7 pages
Caepipe: Tutorial For Modeling and Results Review Problem 1
No ratings yet
Caepipe: Tutorial For Modeling and Results Review Problem 1
43 pages
26e6c8
No ratings yet
26e6c8
5 pages
UserManual en Parte3
No ratings yet
UserManual en Parte3
150 pages
Pro linux system administration Learn to build systems for your business using free and open source software Lieverdink instant download
100% (2)
Pro linux system administration Learn to build systems for your business using free and open source software Lieverdink instant download
60 pages
Fortigate Debug Diagnose Complete Cheat Sheet
No ratings yet
Fortigate Debug Diagnose Complete Cheat Sheet
17 pages
Chapter 3 Mmedia
No ratings yet
Chapter 3 Mmedia
63 pages
com.magma.cheat_logcat
No ratings yet
com.magma.cheat_logcat
196 pages
Ai Scripting Docsforadobe Dev en Latest
No ratings yet
Ai Scripting Docsforadobe Dev en Latest
884 pages
Training Neural Network Controller
No ratings yet
Training Neural Network Controller
7 pages
Final Exam Information: 1) Prepare and Test Your Setup
No ratings yet
Final Exam Information: 1) Prepare and Test Your Setup
3 pages
Understanding Text Structures
No ratings yet
Understanding Text Structures
101 pages
9500 MPR Users Manual Tempest Telecom Solutions
100% (1)
9500 MPR Users Manual Tempest Telecom Solutions
188 pages
Aditya PHP File
No ratings yet
Aditya PHP File
35 pages
2003 - Serriano - Form Follows Software
No ratings yet
2003 - Serriano - Form Follows Software
21 pages
Latest Vce Dumps, Valid Certification Dumps
No ratings yet
Latest Vce Dumps, Valid Certification Dumps
5 pages
The Difference Between API Gateways and Service Mesh
No ratings yet
The Difference Between API Gateways and Service Mesh
26 pages
Red Hat Virtualization-4.3-Data Warehouse Guide-en-US PDF
No ratings yet
Red Hat Virtualization-4.3-Data Warehouse Guide-en-US PDF
61 pages
ARM Processor Families Comparison
60% (5)
ARM Processor Families Comparison
2 pages
Inheritance Q&A
No ratings yet
Inheritance Q&A
12 pages