Data Structures and Caatt’s for Data Extraction

DATA STRUCTURES AND CAATT’s FOR DATA EXTRACTION

CHAPTER 8

DATA STRUCTURES• Data structures have two fundamental components:

organization and access method.• Organization refers to the way records are physically

arranged on the secondary storage device. This may be either sequential or random.

• The records in sequential files are stored in contiguous locations that occupy a specified area of disk space

• Records in random files are stored without regard for their physical relationship to other records of the same file. Random files may have records distributed throughout a disk

• The access method is the technique used to locate records and to navigate through the database or file classified as either direct access or sequential access methods

Flat-File Structures

• End users in this environment own their data files rather than share them with other users

• Data files are structured, formatted, and arranged to suit the specific needs of the owner or primary user

• See fig 8.1 sequential storage and access method

• Sequential files are simple and easy to process

• An indexed structure is so named because, in addition to the actual data file, there exists a separate index that is itself a file of record addresses

• The data file itself may be organized either sequentially or randomly

• The Virtual Storage access method (VSAM) structure is used for very large files that require routine batch processing and a moderate degree of individual record processing

Hashing Structure

• employs an algorithm that converts the primary key of a record directly into a storage address

• Advantage :– Speed access

• Disadvantage :– this technique does not use storage space

efficiently

Pointer Structures

• See fig 8.6• used to link records between files• Types of Pointers (see 8.8)– physical address– Relative address– logical key pointer

Relational Database

• based on the indexed sequential file structure• entity is anything about which the

organization wishes to capture data• Occurrence is used to describe the number of

instances or records that pertain to a specific entity

• Attributes are the data elements that define an entity

• Cardinality is the degree of association between two entities

• zero or one (0,1), • one and only one (1,1), • zero or many (0,M), and • one or many (1,M)

Database Anomalies

• update anomaly, • the insertion anomaly, and • the deletion anomaly

Normalizing table

• The database anomalies described above are symptoms of structural problems within tables called dependencies.

• Specifically, these are known as repeating groups, partial dependencies, and transitive dependencies.

• The normalization process involves identifying and removing structural dependencies from the table(s) under review

DESIGNING RELATIONAL DATABASES

• six phases of database design (view modeling):1. Identify entities.2. Construct a data model showing entity associations.3. Add primary keys and attributes to the model.4. Normalize the data model and add foreign keys.5. Construct the physical database.6. Prepare the user views

EMBEDDED AUDIT MODULE

• also known as continuous auditing, is to identify important transactions while they are being processed and extract copies of them in real time.

• An EAM is a specially programmed module embedded in a host application to capture predetermined transaction types for subsequent analysis

GENERALIZED AUDIT SOFTWARE

• GAS allows auditors to access electronically coded data files and perform various operations on their contents

• Footing and balancing entire files or selected data items• Selecting and reporting detailed data contained in files• Selecting stratified statistical samples from data files• Formatting results of tests into reports• Printing confirmations in either standardized or special

wording• Screening data and selectively including or excluding

items• Comparing multiple files and identifying any differences• Recalculating data fields

• widespread popularity of GAS is due to four factors: (1) GAS languages are easy to use and require little computer background on the part of the auditor; (2) many GAS products can be used on both mainframe and PC systems; (3) auditors can perform their tests independent of the client’s computer service staff; and (4) GAS can be used to audit the data stored in most file structures and formats

ACL SOFTWARE

• Data file definition• Customizing view• Filtering data• Stratifying data• Statistical analysis

Documents

Data Structures and Caatt’s for Data Extraction