SYSTEMS ARCHITECTURESweb.mit.edu/smadnick/www/wp2/1991-03.pdfTHE STRATEGIC IMPLEMENTATION OF A DATA DICTIONARY WITHIN INFORMATION SYSTEMS ARCHITECTURES: THE INSURANCE COMPANY CASE

THE STRATEGIC IMPLEMENTATIONOF A DATA DICTIONARY WITHIN

INFORMATION SYSTEMS ARCHITECTURES:THE INSURANCE COMPANY CASE

Dino E. Browne

May 1991 WP # CIS-91-03

The Strategic Implementation of a Data Dictionary within Information SystemsArchitectures

The Insurance Company Case

Dino E. Browne

Bachelor of Science Thesis in Electrical Science and EngineeringMassachusetts Institute of Technology

Cambridge, MA 02139

A bstractThis thesis analyzes the use of Information Systems within the context of property and

casualty insurance analysis. As a part of the Composite Information Systems Laboratory(CISL) research project this case study examines the use of a database data dictionary and its fitinto a present and new system architecture designed to provide a single data source as opposedto several.

The CISL research project is a study of different types of solutions to real problemsfaced by individuals and corporations who need the flexibility of data and systems integration.This paper documents the purpose, function, and progress of the CISL and relates that to theproblems and data needs of The Insurance Company (TIC). Through interviews ofmanagement and corporate literature review this study shows TIC's case to be one warrantingparticular information systems needs and resources and calls for the development of a systemwhich can more effectively provide data to solve its business problems.

This study has found that recent CISL theories regarding data semantics and semanticreconciliation are applicable in the case of The Insurance Company. These theories can assistin reducing data quality and data access problems as well as assist in the development of a datadictionary.

Thesis Supervisor: Stuart E. MadnickTitle: John Norris Maguire Professor of Information Technology

Table of Contents

1. Introduction..................................................................................................... 62. The Insurance Company..............................................................................................................8

2.1 Information Systems Department............................................................................... 82.2 Present Systems Analysis......................................................................................... 10

2.2.1 Database System Description.................................................................... 102.2.2 Database Structure, Feeds

and W orkflow..................................................................................... 112.3 Problem Analysis of Present

IS Structure..................................................................................................... 142.3.1 Data Quality ............................................................................................ 142.3.2 Data Access............................................................................................ 142.3.3 Data Transfer.......................................................................................... 142.3.4 Data Semantics........................................................................................ 16

3. Present Solutions to Information Systems Problems................................................................ 173.1 Data Quality.............................................................................................................173.2 Data Access..............................................................................................................173.3 Data Transfer.............................................................................................................173.4 Data Semantics..........................................................................................................18

3.4.1 W EBSTER............................................................................................ 184. The Insurance Company's Proposed System Architecture............................................................. 20

4.1 The Corporate Data Resource (CDR)Access Facility............................................................................................... 204.1.1 Business Objectives................................................................................ 204.1.2 System Objectives....................................................................................21

4.2 Strategic Systems Architecture................................................................................. 224.2.1 Proposed Four Level Information

Architecture...................................................................................... 224.2.1.1 Level 1-Transaction Stores......................................................... 224.2.1.2 Level 2-Data W arehouse........................................................... 224.2.1.3 Level 3-Information Stores....................................................... 224.2.1.4 Level 4-Personal Data............................................................... 25

4.2.2 Proposed Architecture System Flow......................................................... 224.3 Data Dictionary/Data Repository............................................................................. 274.4 Conceptualized Solutions to Existing Problems.......................................................... 28

4.4.1 Data Quality.......................................................................................... 284.4.2 Data Access.......................................................................................... 284.4.3 Data Transfer........................................................................................ 294.4.4 Data Semantics...................................................................................... 29

5. Analysis and Evaluation of TIC Proposed Solution............................................................... 305.1 Unresolved Problem Issues..................................................................................... 30

5.1.1 Data Transfer-Interdepartmental W orkflows............................................... 305.1.2 Data Quality.......................................................................................... 30

5.2 New Problem Issues............................................................................................. 305.2.1 Data Quality.......................................................................................... 31

5.2.1.1 Time Dependent Reports......................................................... 315.2.1.2 Other Data Quality Issues......................................................... 31

5.2.2 Data Dictionary/Data Repository Issues..................................................... 315.2.2.1 Digital Common Data Dictionary Plus....................................... 32

6. Summary and Conclusions................................................................................................. 346.1 Summary................................................................................................................346.2 Discussion..............................................................................................................346.3 Possible CISL Solutions...................................................................................... 356.4 Semantics Issues........................... ................................. 35

6.4.1 Data Definition Change Withinthe Data Dictionary/Repository........................................................... 35

6.4.2 M ultiple Definition Issues...................................................................... 366.4.3 Data Value Transformations.................................................................... 366.4.4 Data Source n ............................................................................ 37

6.5 Conclusions..........................................................................................................377. Bibliography........................................................................................................................38Appendices

FIG URES

Figure 1:

Figure 2:

Figure 3:

Figure 4:

Figure 5:

Figure 6:

Figure 7:

Appendix

Appendix

The Insurance Company's Organizational Structure (p.9)

HOCOMP Database Dataflow (p.12)

COBRA Database Dataflow (p. 13)

Property System Database Dataflow (p. 15)

The Insurance Company's Strategic Systems Architecture(p. 23)

The Insurance Company's Four-Level InformationArchitecture (p. 24)

Proposed Architecture System Flow (p. 26)

Workers' Compensation Claim Workflow (p. 40)

Digital CDD/Plus Architecture (p. 43)

ACKNOWLEDGMENTS

I would like to thank Maryann Burke, John Mckenna, Carrie Blake, Grace Mayo, DebbieBrewitt, Neil Taitel, Jim Truselle, and Mitch Soivenski of The Insurance Company for theirtime and consideration. I would especially like to thank Professor Stuart Madnick and Dr.Yang Lee for their unlimited guidance, faith, and support. I would like to thank the MIT CISLResearch Group and the MIT International Financial Services Research Center for its supportof my research. I would also like to thank my family for their support.

CHAPTER 1

1. Introduction

The increasingly complex and globalized economy has driven many corporations to expandbusiness beyond their traditional organizational and geographic boundaries. Companies haverecognized that today a strong information systems environment is imperative for remainingcompetitive in their respective markets. It is widely recognized that many importantapplications require access to and integration of multiple heterogeneous database systems(referred to as composite information systems). It is through efficient connectivity of theseheterogeneous systems that firms can attain a competitive advantage in their respectiveindustries. However, many companies are suffering losses in both time and efficiency from alack of information system cohesiveness. Several technical and organizational problemscaused by poor information system structure have hindered financial service firms fromachieving top performance. The questions which we are left to answer are exactly what arethese problems caused by poor database interaction, and how different firms should go aboutsolving them.

The background to this type of problem is not rooted in any particular theory, however, therehas been a great deal of recent research and developments at the Massachusetts Institute ofTechnology concerning this problem. The Composite Information Systems Laboratory (CISL)project at MIT is engaged in research into different types of connectivity problems and solutionmethodologies. Three types of database connectivity have been researched:

1. Strategic Connectivity: The identification of the strategicrequirements for easier, more efficient, integrated intra-organizational and inter-organizational access to information.

2. Organizational Connectivity: The ability to connect interdependentcomponents of a loosely-coupled organization in the face ofopposing forces of centralization (e.g., in support of strategicconnectivity) and decentralization (e.g., in response to the needs oflocal conditions, flexibility, distribution of risk, groupempowerment).

3. Technical Connectivity: The technologies that can help a loosely-coupled organization appear to be more tightly-coupled.

The CISL research uses three methodologies:

1. Field Studies: Detailed studies of several major corporations andgovernment agencies to understand their current connectivitysituation, future requirements and plans, and major problemsencountered or anticipated.

2. Prototype Implementation: A prototype information system,called the Composite Information Systems/ITool Kit (CIS/TK), hasbeen developed to test solutions to many of the problemsidentified from the field studies.

3. Theory Development: There are several areas in which problemshave been found for which no directly relevant theories havebeen identified. Two particular technical connectivity theory

development efforts which may apply in the case of TheInsurance Company are semantic reconciliation which deals withthe integration of data semantics among disparate informationsystems and source tagging which keeps track of originating andintermediate data sources used in processing a query.

This thesis involves the case study of a major insurance company, which we will refer to asThe Insurance Company (TIC). The purpose of the study was to determine exactly what typesof information systems problems were being experienced at TIC, how their present systemarchitecture was handling those problems, and to determine whether further CISL researchcould benefit The Insurance Company. Information as to the present and proposed systemstructures and room for present information systems improvements was obtained throughinterviews with executives and management within the Information Systems (IS) andInformation Resources departments. Information was also obtained via literature on thepresent system and changes that the IS department intended to make. After compilinginformation on the present system structure, problem issues being experienced presently, andTIC's proposed system architecture, the present problem issues were analyzed with respect tothe proposed system architecture and how it would conceptually solve similar problems. I thenanalyzed TIC's proposed system architecture inlight of my own criteria to determine problemswhich the proposed architecture would leave unresolved and to determine new problems whichwould be caused by the proposed system architecture. I then analyzed TIC's present situationin order to prescribe possible CISL solutions.

This chapter gives an overview of the The Insurance Company. Particular focus is given to theInformation Systems department and their need for a change in information systems structure.After an explicit description of the present system structure, it then documents an analysis ofthe different types of problems being experienced presently by The Insurance Company.

2. The Insurance Company (TIC)

The Insurance Company is a team of companies providing a wide range of insurance andfinancial services products to individual and business customers. Known as the industry'sleader in providing workers' compensation insurance benefits, The Insurance Company is thecountry's fifth largest property and casualty insurer. Internationally TIC has grown to morethan 24,000 employees in 350 offices and has approximately $17 billion in assets.

After a major organizational restructure in 1987, Strategic Business Units (SBUs) wereestablished to serve as an organizational structure for running different businesses within anintegrated organization. The SBUs are the Business Insurance Markets, the PersonalInsurance Markets, the Life/Health Insurance Markets, and a separate unit for the TIC'sFinancial Companies. A senior executive is responsible for the operations of each SBU and isaccountable for its performance.

In support of the SBUs as well as the overall organization, Field and Corporate Center unitswere established. The Field Center units include Field Operations, Claims, andAdministration. The Corporate Center units include functions that are primarily Home Officebased: Actuarial, Advertising, Public Relations, Financial, Legal, Information Systems,Corporate Research, and Investments. Figure 1 shows TIC's organizational structure.

2.1 Information Systems Department

The Information Systems department at TIC has undergone several drastic changes over thepast few years. Driven by deregulation and other industry changes, the Senior Vice Presidentof Information Systems led the department through new vendor platforms as well as thedevelopment of a new system architecture and data model. Four years ago TIC's informationsystems infrastructure was almost completely IBM hardware, today more than half of TIC'ssystems are running on different platforms. TIC is one of the first companies within theinsurance industry to move towards open systems. Some examples of TIC systems runningon non-IBM platforms include:

(I) Applications on the Apple Macintosh for employees selling TICproducts to businesses.

(2) A DEC VAX-based report generator with a Macintosh front end.(3) Applications developed on Sun Sparcstations.(4) Relational databases running on a Teradata 1012 machine with

access from several varied vendor platforms.

There were several reasons for the need for these changes according to the Senior VicePresident. About five years ago the insurance industry was highly regulated and structured sothat a select few individuals made all decisions. Deregulation and new business complexitiescaused reduced profitability and pushed decision-making down to middle-management. This,in turn, led to the need for pertinent information to be made available to middle-management.This need for more information coupled with poor database and information connectivitycaused several information issues for TIC. The identification of the need for improvement

THE INSURANCE COMPANY

Strategic Business Units

Business Personal Insurance Life/Health Liberty FinancialMarkets Market Insurance Market Companies

Field Center and Corporate Center Units

( Field Operations

- Field Operations

- Claims

- Administration

Source: The Insurance CompanyFigure 1

Home Office Operations )- Actuarial- Advertising- Public Relations- Financial- Legal- Information Systems- Corporate Research- Investments

became incentive for the Information Systems department to change the way in which TIChandled and viewed its corporate information.

It is important to note that there were and still are barriers to these changes which exist on anorganizational level as well as a technical one. While I will focus primarily on the technicalbarriers to information systems improvements let me mention a few of the organizationalbarriers. Limited knowledge of senior level executives as to how difficult it would be toachieve the systems needs that they themselves desire oftentimes stands in the way of the ISdepartment. In addition there is corporate resistance of employees to change the way in whichthey have been performing business tasks for several years. The IS department oftentimesfinds lack of cooperation from other departments an issue.

2.2 Present Systems Analysis

Presently, TIC's information systems are made up of various rather loosely coupled databasesystems which are task specific. Several different departments access different databasesdependent upon the business task they are trying to accomplish. The following is a summary ofTIC's primary database systems:

2.2.1 Database System DescriptionBOCOMP- Branch Office Compensation database; This is a non-relational, bottom-leveldatabase, developed on IMS. Some of BOCOMPs main functions under the present systemare: primary claim maintenance, primary claim coverage maintenance, processing payments andrefunds, purging old claims, recovering old claims.

ACES-Branch Office database; Parallel level to that of BOCOMP however, follows differentsystems feeds. Holds additional compensation information not found in BOCOMP.

HOCOMP-Home Office relational database; Developed on IMS. Main source of informationto Corporate Database. Several functions depend on department: report generation, transactionprocessing and editing, property loss analysis, status request maintenance, data maintenance,location code handling, workfile correction processing, weekly and monthly correctionprocessing, provision of claim change activity, maintenance of summary data, and tape masterfiles. HOCOMP feeds several databases. Primarily fed by BOCOMP.

COBRA-Relational database; Maintains TIC's loss database for Auto liability, Auto Physicaldamage, General liability, Miscellaneous Crime, and Burglary losses. Produces weekly andmonthly extract information (claim images, transaction support) for feeding to other reportingsystems (Corporate Reporting, Policyholder Reporting, Bureau Reporting, etc.). Primarily fedby ACES.

Loss Reserve Database-Relational database; Maintained on a Teradata DBC/1012 system.Contains aggregate loss reserve information. Primarily fed by COBRA.

Property System Database-Tape masterfile system; Maintains loss masterfile for Fire,Homeowners, and Inland Marine losses for both personal and non-personal lines of coverage.Broken into two pieces: "Balance and Edit" portion which runs daily, and the actual masterfileupdate system.

LAWPACK-Legal database; Tracks attorney fees, billable time, case info, legal expense, etc.Feeds HOCOMP and COBRA which in turn updates claims records.

10

Corporate Database (CDB)-Main database; Holds all detailed corporate data. Fed byseveral databases.

These primary databases feed other reporting databases which are used by differentdepartments: Corporate Reporting, Loss Billing, Policyholder Reporting, Bureau Reporting,Claims Reporting, Statistical Reporting and Loss Forecasting. These reporting systems are thebasis upon which business tasks are completed at The Insurance Company.

2.2.2 Database Structure. Feeds and WorkflowPresently the databases are loosely organized. Workflow, however, is extremely complex.The are two main systems flows (with which we will be concerned) which supply informationto the reporting databases and the Corporate database (CDB).

HOCOMP General Data Flow:Figure 2 shows a detailed example of the HOCOMP dataflow. HOCOMP is fed nightly byBOCOMP with Claims, Statistical, and Financial Information. The Claims department isresponsible for most of the on-line input to the BOCOMP system. Information fromBOCOMP is then combined with additional Statistical department data (via Lewiston ManualInput), as well as weekly and monthly workfile corrections from the Unit/Workfile correctionsystems. Oftentimes, depending upon business task and department, this updated informationis sent back to the BOCOMP system where it is accessed by several Claims departmentreporting databases. This feedback of information from HOCOMP to BOCOMP can alsooccur via the Bank Account Reconciliation (BAR) systems which processes checks and billingrequirements. The information is usually once again updated monthly with Staff LegalExpense information from LAWPACK. The information is then repassed on to HOCOMPwhere it is accessed by several Claims processing databases. The revised HOCOMPinformation is input into the CDB and from there it is recombined with processed informationfrom other departments and aggregated to feed the various reporting systems (CorporateReporting, Claims Reporting, Loss Billing, etc.). BOCOMP and HOCOMP are used mostlyby the Claims, Actuarial, and Statistical departments. Appendix A shows the workflow for aWorkers' Compensation Claim which corresponds to a HOCOMP data flow.

COBRA General Data FlowThe other major system flow with regard to corporate information is that of the COBRAsystem. Figure 3 shows an example of a COBRA dataflow. The information is passed toCOBRA daily in the form of transactions, which COBRA uses to update the CDB. COBRAmaintains an inception to date image of individual claims. Automated transactions are passed toCOBRA daily from ACES. These include transactions such as new claim registrations, claimsclosings and reopenings, and miscellaneous information. Financial transactions such aspayments and refunds are passed from ACES to BAR which in turn passes the transactions toCOBRA. Financial transactions are also passed on a daily basis. Monthly COBRA receivesTIC Staff Legal Expense information from LAWPACK.

On a daily basis, COBRA also receives what are termed MANUAL transactions (via LewistonManual input). These are registrations, closings and reopenings, and payments to claims thatare not in the ACES system (i.e., BOCOMP type information). ACES became country-wide inFebruary of 1991 which greatly reduced input transactions. Weekly and monthly COBRAproduces various extract information from its database. The extracted claim images andsometimes supporting transactions, are passed on to Corporate Reporting systems,Policyholder Reporting systems, Bureau Reporting systems and several other systems. At theend of a work month, COBRA isolates a copies of the claim images and maintains them untilthe end of the next work month. The Financial departments are the main users of ACES andCOBRA.

(Ctunif(g @ t

(I Claims Unit input(b) Coaerega Unit input40 Reduadeftig keed

corrections (7)

Meathig:(1) Supp De TN(2) letre Loc Chg TN13) Cal stat Chg TN(4) Turnaround Edit TN

Unit/Work tileSYSTEMSieekig Workfiie Corrections TN

eathig Unit Corrections TN(euciuding pal # V eff date)4enthig Unit Corrections TN(pai # 0 off datel

Monthig:(11n Starl Legal Eup IN

Net Losses &Deduct info

/ I /9 I

usses

Lewsutson

[::anuaI Input

S304it ijml

COBRA GENERAL DATA FLOW

AUTOMATEDSOURCES

STAFFLEGAL

EXPENSE

MANUALSOURCES

CLAIMSSTATISTICAL

LOSSFORECAST ING

Figure 3

13

Presently systems management is looking for ways to take BOCOMP data and put it directlyonto the CDB. This would to an extent eliminate the need for HOCOMP. Other recentdevelopments include the use of a tool called "D-Prop" which generates DB2 entries andautomatically updates COBRA. Another systems workflow worth mentioning is the Propertysystem dataflow. Figure 4 shows an example of the Property system dataflow. The Propertysystem maintains TIC's loss masterfile for fire, homeowners, and inland marine losses forpersonal and non-personal lines of coverage. Daily transactions such as registrations, andrevisions are passed from the ACES system to the initial balance and edit system. This systemedits the incoming transactions. Daily financial transactions are also passed from the BARsystem. These transactions are also edited and balanced to insure that all money passed to andfrom the system is accounted for.

2.3 Problem Analysis of Present IS Structure

While IS department is striving to achieve vast information improvements, room forimprovement remains within the present system structure. The problem issues within TIC'spresent information systems structure are broken down into four categories: (1) data quality,(2) data access, (3) data transfer, and (4) data semantics.

2.3.1 Data OualityA great deal of the initial data entered into the bottom level databases (i.e. BOCOMP, ACES) isshared later with several different systems from varying departments. This is shown in theworkflow examples. Information is often input by certain departments (i.e. Claims whichenters most of the data), is tailored specifically to that specific department. This doesn't takeinto account use later by different departments. This lack of "datacare" leads to missinginformation which may be imperative to other departments and results in poor data quality.

To get a more vivid example, examine the example of a Workers' Compensation Claimworkflow in Appendix A. Suppose that the Claims department inputs specific claimsinformation on a recent fire suffered by a client. There may be casualty information (i.e. livestaken, time of day, fire alarm performance) that may be left out by the Claims representativesimply because he is not concerned about this information. However, to an actuary, thisinformation is critical. Data quality problems of this type usually come in the form of missingdata fields within databases, data entry registration, or claims identification (i.e. Is it aregistered claim or a reported one?).

2.3.2 Data Access

Corporate information is spread across multiple databases which can make accessing theappropriate data for complex report generation extremely difficult in some cases. The lack of acentral data catalog also leads to time wasted in attempting to locate certain pieces of corporatedata. Data access problems can also arise from the lack of organization of data at the user levelcausing difficulties in accessing accurate, aggregated information.

2.3.3 Data Transfer

The workflow examples show the need for different departments to constantly share andupdate each others information. Poor connectivity between the multiple relational databasesmakes the transfer of data between departments difficult. In many cases, data must betransferred via hardcopy between departments.

14

PROPERTY GENERAL DATA FLOW

AUTOMATEDSOURCES

STAFFLEGAL

EXPENSE

MANUALSOURCES

CLAIMSSTATISTICALLOSS BILLING

15

Figure

2.3.4 Data Semantics

There are several variables and terms used at TIC (and insurance companies in general) thathave different definitions, formulas or interpretations depending on which department isinterpreting the data. These multiple definition variables become problematic within databaseswhen information is shared between departments and the variables are viewed and useddifferently by various departments. This leads to inaccurate report information and canpropagate throughout several departments.

In addition to these multiple definition variables there are also concerns with terms whosedefinitions have changed with time and with the change in the industry. However while somedepartments have adapted to the definition changes, others have not and this can lead to thesame types of misinformation problems as with multiple definition variables. TIC'sInformation Systems department developed a "data dictionary" to work within their systemarchitecture, aimed at ordering and defining variables to reduce the types of data problems.Chapter 3 suggests ways of handling of the data semantics issues as well as those of dataquality, access, and transfer. Although TIC has solved some of their data semantics issueswith the data dictionary, some still remained. This was part of the reason why a newinformation system design was required.

16

CHAPTER 3

This chapter analyzes TIC's present solutions to the types of problems described in Chapter 2.Particular focus is given to problems dealing with data quality and data semantics. Systemsproblems which are left unresolved serve as the basis for this chapter.

Present Solutions to Information Systems Problems

3.1 Data Quality

A great deal of the blame for the data quality problems described in Chapter 2 is attributed tothe Claims department. The Claims department inputs most of the information to several of thebottom level and upper level databases and problems such as missing data fields withindatabases and improper registration can usually be traced back to the Claims department. Partof the problem came with the change in regulation of the industry. Higher productivitydemands placed on Claims department representatives led to carelessness and a lack ofconsideration for other departments in data entry. Several IS executives feel that an educationprocess is necessary for Claims representatives so that they will be more responsible andcomprehensive in their data entry techniques.

While there are no statistical processes to track data quality presently, there are monthlyreviews to check for correct data coding as well as data auditing reviews carried out by a coupleof departments by re-examining entered data. Data quality problems can also be the result ofinconsistencies between database systems. Editing differences between HOCOMP andBOCOMP is a common example of this pointed out by IS executives. HOCOMP rounds itsfigures, while BOCOMP truncates figures. Presently, the only control on data qualityproblems is the responsibility of the input departments (i.e. Claims, Financial). The problem isnot otherwise being dealt with.

3.2 Data Access

The data access problems described in Chapter 2 are a function of the multiple data collectionsystems and their organization as well as the nature of the insurance industry. These problemswill not be solved without the design of cohesive systems which reflect the businessworkflows within The Insurance Company. Today, TIC uses a system called PRISM which isa hardcopy request form for business users to send information requests to the IS department.With this system, however, time is usually lost in that analysts must frequently contact therequester to clarify what is being requested. The PRISM system is not presently meeting theaccess needs of the users.

Presently, the Corporate Database has seen access improvements and complexity reduction; itallows Corporate Research users to access relational production databases directly using theStructured Query Language (SQL). Other departments have refused the use of SQL for dataaccess because of a lack of user-friendliness. While there have been other attempts made atreducing the use of HOCOMP (from D-Prop tool example described in the Present SystemsAnalysis section of Chapter 2) and subsequently reducing complex data accessing procedures,these attempts have been inadequate.

3.3 Data Transfer

Similarly, the data transfer problems described in Chapter 2 are a result of the nature of theinsurance industry. Workflows require information to travel complex paths through differentdepartments for report generation. Attempts have not been made to reduce the amount ofinformation transferred via hardcopy. A resistance to change individual work habits is a barrier

17

to change in data transfer from hardcopy to on-line information transfer. Individuals who haveanalyzed statistical receipts on paper for years usually do not want to be trained to perform thesame task via computer. Organizational barriers such as this must be overcome perhaps viaemployee education as to the need for increased productivity and changes in business taskprocedures.

3.4 Data Semantics

The problem of multiple and changing definition variables is one that has not been resolved atTIC despite concentrated efforts. Here are some examples of multiple definition variables atTIC:

TABLE 1Variable DefinitionReserve Defined as a loss cost estimate (Claims

department) or the amount left to be paid ona claim (Actuarial department).

Net Written Premium Defined as premium on net of dividendreturned. Gets calculated differently bydifferent departments. "Net" is calculateddifferently by department.

Claim Count Defined as the number of claims openwithin a department (Actuarial and Financialdefinition), or the total number of claimswithin a department both registered andnon-registered (Claims definition).

Earned Premium Variable which is calculated in threedifferent manners by the Actuarial,Financial, and Corporate Researchdepartments.

Linecode Formally represents a line of business(Actuary, Claims, Financial) and is used toidentify specified report information.Different departments take differentbusinesses into account when referring to aparticular linecode. Leads to incorrect non-synchronized reports.

Standard, Retro Premium Final premium, and retroactive premium,respectively. Formulas used to generatethem vary depending upon department.

Incurred Losses Amount paid by LM to casualty bearersplus "reserves". Note that reserve hasmultiple definitions.

3.4.1 WEBSTEREach of these terms are important and their accuracy is crucial to almost all of the departmentswithin The Insurance Company. About a two years ago, TIC developed a data dictionary,called WEBSTER, to prioritize multiple definition variables such as the ones in Table 1.WEBSTER was also designed to communicate business requests from different departments tothe system in order to provide integrated service and information to customers. A year after itsdevelopment, WEBSTER became defunct and was no longer used for several differentreasons. WEBSTER was recently evaluated to determine compatibility with TIC's proposed

18

system structure. The assessment pointed to several areas where incomplete, out-dated, andinaccurate information existed.WEBSTER was assessed on the following criteria which are all considered important to theoverall quality of the dictionary as resource to the organization:

- Names must conform to current Naming Standards andabbreviations conventions.

- All tabled data in Corporate Database must have correspondinginformation in the dictionary; conversely, all data elements inWEBSTER would match tabled data on the Corporate Database.

- Definitions reasonable reflect clear and distinct statements of aterm's meaning.

- Supplemental data must be up-to-date, useful and providevaluable, pertinent information.

WEBSTER's major problems areas included:

- Several terms need to be renamed based upon new namingstandards enforced by TIC. Names of several current terms werenot updated when the standards were redefined.

- Definitions which were not complete. Many definitions simplyrestated original term.

- Several data elements exist on the CDB without correspondingnames and definitions in WEBSTER.

- Conversely, data elements were found on WEBSTER that did notcorrespond to Teradata (CDB) data.

There are other reasons for WEBSTER's disfunction. Aside from its misinterpretationproblems, WEBSTER was found to be incompatible with certain systems and it did not fit inwith the direction in which the IS department was trying to move. One IS executive explainedthat WEBSTER did not have a good "public relations" agent within the department.WEBSTER obtained a bad reputation for misinformation soon after it became active.

As for the handling of multiple definition, and other data semantics problems, IS executivesstate that presently they "wing it", and hope that experience will reduce problems ofmisinformation. In some cases certain corporate reports are checked over as many as eighttimes by four different departments to insure correct variable usage. In Chapter 4 we will seeTIC's conceptual model for a new information systems architecture and how it will plan tosolve problems such as those stated in this chapter.

1 9

CHAPTER 4

This chapter looks at TIC's proposed system architecture. This architecture was designed tosolve the unresolved problems left by the present system. The chapter discusses the way inwhich the proposed system will be developed and how it will operate to solve the unresolvedproblems we saw in Chapter 3.

The Insurance Company's Proposed System Architecture

4.1 The Corporate Data Resource (CDR) Access Facility

Given many of the problems examined in the last chapter, the Information Systems departmentset out to develop a new system architecture in October of 1990 which would enable users todirectly access a single data resource that is integrated, reliable and consistent. Let us brieflyreview the business issues which caused the need for a new system structure:

. Several disparate systems, both manual and automated, thatcollect data. Difficult to find data and ensure its accuracy.

- Lack of a consolidated dictionary of data defined by businesspersons. Difficult to communicate business requests.

. Lack of a central data catalog. Large amount of time is necessaryto determine whether data exists and where it is located.

- Lack of integration and organization of data on the user level.Difficult to deliver integrated service and information to users.

- Lack of a process for common definition of data. Results inmultiple definitions for similar variables and poor data quality.

- Business processes have become outdated. Many businessprocesses and systems were designed years ago to supportbusiness as it was at that time. Many of these processes andsystems are ineffective in supporting business today.

- Technological changes have enhanced the business processes ofcompetitors, customers, and regulating agencies. Results in anincreased demand for data, required in much shorter time frames.

With these business issues in mind the IS department began to develop a system structureaimed at providing the business user with a uniform data resource which would maximize userperformance. The IS department established a set of business objectives and system objectivesto serve as standards for the new system architecture.

4.1.1 Business ObjectivesThe IS business plan outlines strategies that support the primary corporate objective ofreducing the loss and expense ratio. The business plan breaks down this primary objective intosupporting SBU, Corporate, and Field center objectives. The following objectives from thebusiness plan relate to the CDR effort:

. Deliver integrated office automation systems. Automated officeswill include automated information networking and sharing viaelectronic mail and other tools. This transfer and sharing of

20

information will depend on integrated and consistent data.

- To provide information that helps create and manage marketingand product strategies. Creative market and product strategieswill cross department boundaries and require innovativesolutions. These innovative solutions will allow for competitiveadvantage gains and will require access to integrated data.

- Development of loss/expense management tools. Implementationof systems that will track and control expenses. TIC will soondivide itself into new profit centers and target new areas forbusiness. This will require systems that will allow SBUs toidentify profitable geographic areas and seek out desirablecustomers.

- Productivity improvements. Provide systems that will reducedependence on costly technical and human resources.

- Data consistency. Systems needed to improve the quality,integrity and availability of corporate data.

4.1.2 System ObjectivesThe IS department laid down the following system objectives to ensure that the Corporate DataResource is understood and accessible and provide consistent and reliable results acrossbusiness functions.

- Catalog/Dictionary. Provide a data catalog and dictionary forbusiness users to be aware of all information resources available.Will ensure that users will not redevelop information that alreadyexists.

- Access Mechanism. Mechanism to provide users with ability tosatisfy 95 percent of their information resource requests fromAccess Facility with no IS department involvement. Mechanismwill use icons and menus familiar to non-systems professionals(pull-down menus, point-and-click icons and other user-friendlydevices).

- Data Consistency. Use of CDR data dictionary to ensure consistentnames and descriptions on all data derived from Access Facility.Data dictionary will register all changes to methods used inderiving data so as to inform users of all possible inconsistenciesin their information.

- Order Management. Provide optimum service to business usersby developing an order management function which will trackinformation requests, prioritize them according to business needsand deliver them in specified formats.

- Performance Management. Ensure that the system continues tomeet the needs of the business users by analyzing performancestatistics generated during Access Facility activities.

21

4.2 Strategic Systems Architecture

The Corporate Data Resource is to be based on a strategic systems architecture (Figure 5) withfive conceptual layers. The architecture is composed of a user interface, an applications layer,a utilities layer, a data layer and a network layer. The network layer includestelecommunications hardware, network management software and communications software.(will be called the TIC Integrated Network) The data layer and the utilities layer will work inconjunction with the CDR Access Facility and will be discussed in further detail. Theapplications layer includes all applications and data programs which use the accessed data. Theuser interface layer consists of desktop software and will be managed by the user.

The five layers allow separation of the design into modular units which can then be optimizedfor specific purposes while at the same time shared by multiple groups or departments. Newsystems or technologies will be able to be added to the layers and made to interface withexisting systems on other layers so long as the new systems conform to the interfacespecifications of the new architecture. The current system structure has several independentsystems that would operate on different layers. Let us now look more closely at the data andutilities layers of the proposed system architecture.

4.2.1 Proposed Four-Level Information Architecture

Information on the data layer of the CDR Access Facility will be broken down into four levelswhich will help to define the data's function as well as the function and interfacing of thedatabases involved with the data access. Figure 6 shows us TIC's Four-Level InformationArchitecture in which data is broken down into the following levels:

4.2.1.1 Level 1-Transaction Stores

These will be operational data status stores to be accessed by data capture systems. Thesesystems will focus on data capture from various sources. LM will have operational systemswhich will pick up this data. Systems will be optimized for the performance of the localbusinesses.

4.2.1.2 Level 2-Data Warehouse

The data warehouse will store all master copies of historical detailed data. This data will not beoptimized for any one function. It will be raw data stored as a sort of clearing house. Level 2data will be recombined in several ways for different business purposes in Level 3. Level 2data will remain in a relational database but the data will not be totally normalized.

4.2.1.3 Level 3-Information Stores

Information stores will contain collections of data from the data warehouse. This informationwill be structured for business requirements. This aggregated data will exist for specificinformation processing needs which can't be met by Level 2. Level 3 data will be provided fordecision support, executive information systems and reporting. Reports will be based uponuser group requests which will be evaluated by the IS Access Facility staff. Report requestswill include data definitions, reporting frequency, and format. The reports can be specialtemporary reports which will be generated once, or standing requests which are generated at auser defined frequency specifying time of week or month. The reports are extracted viaaggregated data from the data warehouses and will be recorded in cache databases.

22

User Interface

ppli atior s

Utilities

Data

Network

Source: The Insurance Company

Corporate Data ResourceAccess Facility

Figure 5

TIC's Strategic Systems Architecture

PersonalInformation

Information

DataWarehouse

TransactionStores

Data

Figure. 6

4.2.1.4 Level 4-Personal Data

This is user created data which is to be maintained by the user and his department. An exampleof Level 4 data is information stored on the hard disk of a user's desktop computer. This datais to be managed by the user and will not be subject to the rules and policies of the CDR.

If we look back to the systems architecture diagram in Figure 5, we see that Access Facilitywill work on both the data and the utilities layer of the architecture. The utility layer of thearchitecture will provide a framework for storing data in an organized, efficient and meaningfulmanner. This layer will also provide data integrity and security features which we will look atlater on in this chapter.

Looking at our information model in Figure 6, one can derive that the flow of the data is in anupward motion, that is from Level 1 (raw captured data) to Level 4 (user maintained data). Fora brief review of how data is to move conceptually: (1) Data will be entered through Level I(data transaction stores). (2) These data stores feed the raw data into the data warehouses(Level 2). (3) Level 2 information is aggregated according to user request and stored intoreporting databases in Level 3. Aggregated data can either be stored in table or reportingformat as an ad-hoc database in Level 3. (4) Table and report formatted data is accessed byusers and taken to personal information databases (Level 4). The question of how and wherethe present-day databases come into play in the 4-level information architecture requires someanalysis.

An interesting point to note here is that there are differing views within the IS department as tothe role of the present-day database within the proposed system architecture. This lack of acommon understanding within the IS department will lead to difficulties in systemsdevelopment in the future.

4.2.2 Proposed Architecture System Flow (Figure 7)Level 1 reflects the type of information you would find in the BOCOMP or ACES databasesystems. Raw data will continue to be input both manually and on-line to bottom leveldatabase systems until automation of data capture systems are complete. This will eliminate theneed for databases such as BOCOMP and ACES. The proposed system will contain multiplecapture systems (Level 1) for data capture from databases with varying development (e.g. IMSCapture system-D-Prop example, DB2 Capture system, etc.).

Level 2 will contain all corporate detailed data. All database systems will somehow feed intothe Level 2 data warehouses. The IS department proposes to accomplish this with the use of aCDR input system which will take all of the data from the data capture systems. The CDRinput system will also take information from upper level databases such as LAWPACK, andthe Loss Reserve database and input this information into the data warehouses (Level 2).Functions that were previously performed by upper level relational databases such asHOCOMP and COBRA will be replaced by Level 3 of the CDR and the CDR Output system.The reporting information that was held and distributed by HOCOMP and ACES will become apart of the information stores of Level 3. Level 3 will allow for on-line inquiries foraggregated and summarized data as HOCOMP and ACES do presently. Figure 7 shows us anupdated version of Figure 2 under the new system structure. As stated before, there aredifferences in conceptions as to what presently existing databases will be present within theCDR.

25

If(e) (

Unit /WorkfileSYSTEMS

Monthig: ' I(Il Supp Beta TV(2) Retire Lee Chg TV1st CAJ lst Chg TH 1(4) Turnareund Edit TN

(5) IWeekig Workille Corrections IN(6) Meathig Unmi Corrections TN

(eaniudiag pei & R off date)_ lA M7a1"thl Is

(Pei o tt date)*1

Monthig:(i0) staff Legal Eap IN

IMS ChangeCapture(DProp)

Nel Losses 01Deduct Into

level 2

CoR

Leyel 3Actius Claim 032 Changes

asses

10) Cilais Unit input(h) Coparage Unit Input

3/1/91

(6/11) Accepts,(9/12) @ejects

8tyaifbl

While Level 2, the data warehouse is supposed to be a successor to the Corporate Database,there are individuals both within IS and Corporate Research that are of the position that theCDB, will be a part of the Level 3 architecture. On the other hand, systems architects haveinformed me that the CDB will get a "bullet in the brain" and not take part in the newarchitecture. Some current aggregated tables and historical tables will be retained or movedinto Level 3 but the CDB as a whole is supposed to disappear (e.g. loss reserve information inCDB will be moved into Level 3, but soon thereafter, loss reserve information will be builtfrom the data warehouse rather than an independent system). There is a similar discrepancy inopinion as to the future role of the HOCOMP system. While Figure 7 implies that HOCOMP'sfunctions will be reassigned to Level 3 functions of the CDR, individuals in IS have spoken of"when the HOCOMP system is operating under Level 3". IS needs to come to a uniformunderstanding of how this new system is going to operate in relation to the present one.Otherwise similar confusion can perpetuate throughout the business divisions of the firm.

4.3 Data Dictionary/Data Repository

One of the explicit system objectives of the CDR Access Facility is to provide data consistencyby means of a data dictionary which will "ensure consistent names and descriptions on all dataderived from Access Facility and register all changes to methods used in deriving data so as toinform users of all possible inconsistencies in their information". Presently, within the ISdepartment, there is a CDR Data Dictionary team which maintains present data definitions anddescriptions and is collecting requirements for a new, more sophisticated data dictionary. TheData Dictionary team set forth the following objectives for the new data dictionary:

. To create conceptual data and process models for the targetData Dictionary environment.

- Graphic representation of the dictionary environment using aconceptual and process data model. Textual documentationdescribing each object, down to the entity level, of the dictionarysystem and sub-systems will accompany the model.

- Documentation of standards and procedures for using andmaintaining the dictionary environment.

- Descriptions, definitions, and capture of dictionary componentsidentified in and by the release plan.

TIC plans to develop rather than a strict data dictionary, a data repository. Thedictionary/repository is a critical part of the way in which IS plans to improve upon the dataquality and data semantics of the CDR. The dictionary/repository will contain the descriptionsand definitions normally found in a data dictionary, and also will contain the data model, ahistory of variable definitions, and an inventory of information system resources.

The dictionary/repository will run on the CDD/Plus repository. This is a Digital product whichhas the ability of resolving and creating data structures for COBOL and tie information togetherfrom various platforms. Applications on the user level will make use of metadata to locate datain the data warehouse as opposed to using a table of definitions which application developersuse. The dictionary/repository is to store all definitions for data defined on the CDR. TheDigital CDD/Plus was chosen because of its compatibility with the data model (way in whichyou graphically show model) being used for the Access Facility, the Enterprise Data Model.

27

The dictionary/repository will work on Levels 1, 2, and 3 of the information architecture, inother words, all who use the CDR will use the dictionary/repository. Thedictionary/repository's action on Level I is most important. Developers of Level I databasesmust make certain that their definitions adhere to those of the corporate standards which will beset in the repository. The "raw data" definitions are the most important in that they are the keyto the proper location of the desired data. Business group users will use the repository first tolocate the appropriate data within the system, and then later to properly interpret data receivedfrom data inquires and system generated reports.

Perhaps the most highlighted feature of the dictionary/repository is that it will deliver to usersan inventory of previously requested reports and data in order to prevent the regeneration of thesame data or report.

4.4 Conceptualized Solutions to Existing Problems

4.4.1 Data QualityMany of the data quality problems being experienced by TIC today would be handledsuccessfully by the proposed system architecture. Poor data quality presently has its roots inthe carelessness of those who input to the bottom-level databases. Conceptually, with theinformation architecture, every piece of data entered into a Level 1 system will be done throughan authorized process which will be recorded in the data repository. The effect of this processis to identify a business group and manager for each data entry process. This way data qualityissues of this type can be traced to the owner of the process. This is potentially a viable meansby which to track data quality.

Presently data is often input using "edits". I have interpreted these to be programs or systemcode which is checked by the present database system for completeness of data. However,certain edits are stronger (check data more stringently) than others which would mean thatsome systems can't control or adequately restrict the data that is entered. With the CDR,uniform edits will be placed between Level 1 and Level 2 databases according to a TICdatabase manager. Data that does not meet corporate standards will not be permitted by thedata warehouse.

Another source of data quality problems presently, lies in the multiple disparate sources ofcorporate data. Theoretically, under the new system, all detailed data will be in the Level 2data warehouses.

The data quality problem of regeneration of data is somewhat addressed by the proposedsystem. In certain instances reports are duplicated and regenerated which are necessary due tothe fact that certain data information changes with time. This would mean that these time-relying reports must be regenerated regularly. The problem becomes not one of lost time dueto a regenerating a previously generated report, but one in which one report of several similarlynamed reports is taken to be the most recent when in reality it is not. A simple example of thisis the Updated Claim Count Report. While the CDR does not provide a built in solution to thisproblem, systems architects at TIC state that they will try to overcome this problem with theuse of a time and date stamp on each report that comes from a Level 3 information storedatabase. While Level 2 is the sole source of data, it has not been determined as to how thisproposed solution will work. The "time-dependent" report issue is one still to be resolved atTIC in regard to the proposed system.

4.4.2 Data Access

Data access problems presently come from complex workflows coupled with the lack of acentral data catalog and data source. While the complex workflows won't change due to their

28

roots in the industry, the Level 2 data warehouse provides the single data resource while theproposed dictionary/repository will provide the data catalog to provide for access to accuratedata for business users. Access problems caused by the lack of data organization on the userlevel cannot be controlled by the CDR.

4.4.3 Data Transfer

The data transfer problems which I described in Chapters 2 and 3, are very much a result ofindustry workflow and the nature of the insurance industry. The use of the CDR can reducethe amount of hardcopy transfer of data dependent upon the willingness and ability of businessusers to adequately use the Access Facility. The main organizational barrier to changes in datatransfer is the resistance of individuals to change their work habits. The CDR can only providean alternative means to data transfer via hardcopy. It is up to the business users to takeadvantage of the Corporate Data Resource.

4.4.4 Data Semantics

The dictionary/repository of the proposed system architecture provides some conceptualsolutions to the semantics problems discussed in Chapters 2 and 3, but these solutions seemharder to conceptualize given the nature of the semantics problems. When asked how theproposed system would handle the many multiple definition variables which are present inseveral databases, systems managers and systems architects tell us that common definitionswill be determined on a case by case basis at first. A systems architect tells us that for certaincases a process will be developed to examine how each of these multiple definition elements arederived, and then to develop an appropriate name for derived elements. For example the caseof the variable "standard premium" will probably have 6 or more derived elements for thevarious ways in which it is interpreted (i.e. standard-premiumfinal as opposed tostandard_premium-mean). In cases of more widespread variables such as "claim count" or"earned premium", corporate standards will be imposed for use across all departments.

Business users will be supplied with a historical table of all changes in the definitions ofvariables within the dictionary/repository. This feature will make users aware of currentvariable definitions so as to control data quality.

The data dictionary group is still developing methodologies to handle the problems of multipleand changing definitions in variables. While there was some disagreement among a "proper"manner in which to proceed, two ideas seem prevalent and will probably be followed indetermining a method for defining variables across departments:

(1) All database fields must have rules or formulas defined orassociated with it and field histories (different types of definitionsfor common field) must be stored within the repository.

(2) Departments must jointly develop one-to-one relationshipsbetween each departmental database field and its associatedformulas.

The former notion seems a more expedient solution in that it will take a great deal of time fordepartments to get together with IS determine which variables are the same and which aredifferent. Potential solutions to this type of problem exist but are hard to conceptualize giventhat the proposed system is solely a conceptual model.

29

CHAPTER 5

This chapter will analyze the proposed solutions of Chapter 4 to the information systemsproblems being experienced by TIC. We will find that while conceptually the proposed modelseems sound, it is still a prposed model. There are still unresolved issues which will remaineven if such a system, as described in Chapter 4, is developed. In addition, there are potentialproblems that could arise with the proposed system.

Analysis and Evaluation of The Insurance Company's Proposed Solution

5.1 Unresolved Problem Issues

Many of the unresolved issues that will not be properly addressed by the proposed systemremain within the data layer of the system architecture.

5.1.1 Data Transfer-Interdepartmental Workflows

Interdepartmental workflows will not be altered by the addition of the proposed system.Workflows will still be dependent upon the nature of the industry. Systems managersindicated that while the data warehouse will greatly enhance data access for departments, therewill still be a great deal of hardcopy transfer which will in turn result in re-entry of data. Theunresolved issue of interdepartmental workflows will also lead to potential new problems withreport generation and timeliness which I will address later.

5.1.2 Data Ouality

As in the case of interdepartmental workflows data quality problems which emanate fromcareless data inputting will be dependent upon those entering the data in different departments.Even with the addition of uniform edits to help shield data which does not conform to corporatestandards, there is still potential for poor quality data to be entered into the system. In addition,there is no proposed method of system management of poor data. Without some sort ofeducation process for data enters, stressing the importance of high-quality data, there may stillbe some quality problems with data entered into Levels 1 and 2 of the proposed architecture.

With the exceptions of the two just mentioned, he proposed system architecture conceptuallywill solve a majority of the problems. However, there will be new problems that will derivefrom the new architecture which will have to be addressed by IS.

5.2 New Problem Issues

The new issues which could potentially develop with a new system such as the one describedin Chapter 4, result from concerns dealing with the idea of a single data resource. Databasemanagers described a "input/output bottleneck" effect as one of the major fears with theproposed system architecture. That is, since the data will theoretically be in one place witheveryone attempting to access it at the same time, there will likely be large amounts of down-time and potential system overloads. Some questions of general concern have been:

- How many requests/queries will the system be able to handle atone time?

- How many requests can/will IS be able to field and how quickly?.- How much efficiency will be lost during the start-up phase of the

new architecture?- How will the dictionary/repository record and manage errors

found in reports generated via the Access Facility?

30

- Will the CDR Access Facility and its data levels be uniformlyunderstood by all departments?

I will divide the discussion of these new problem issues between those affecting dataquality/consistency and data semantics (dictionary/repository issues).

5.2.1 Data Ouality5.2.1.1 Time Dependent ReportsThe problem of report regeneration, discussed as a data quality issue in Chapter 4, becomesquite serious under the proposed system as there is no conceptual solution at this point. Torecap the problem: In many cases, there are reports (i.e. Updated Claim Count Report,Workers Compensation Registered Report, Number of Accidents Reported (NAR) Reports)that are formed by the merging of reports from different departments in Level 3 of the proposedarchitecture. Each of these departmental reports have data that change with time so they mustbe updated constantly. What happens when a NAR for Personal Auto Bodily Injury, which isgenerated from reports from Claims, Life, and Actuarial departments, is updated by reportsfrom two departments that are current and one which is outdated? Under the proposed system,this would result on incorrect information on the NAR and recurring incorrect informationwhen the report is updated and used again by other departments. Systems architects haveexpressed an interest in using a time and date stamp for all merged reports to prevent this typeof problem, but the problem has not even been realized by all of IS at this time so no significantthought has been given to a solution.

5.2.1.2 Other Data Quality ConcernsMany other data quality issues have been realized under the proposed system. Theregeneration of reports from Level 3 information will require a great deal of on-line data forbusiness users. There is a question as to how the Access Facility, or more appropriately, ISplans to manage errors found in original reports which will be carried over from relationaldatabases such as HOCOMP and COBRA as they will be needed for the startup of the newsystem (IS executives have differing views as to the future role of present databases withinproposed system architecture). While the proposed system will be able to test data beingentered to verify that it adheres to CDR format, how will the system test for completeness ofdata entered into Levels 1 and 2 of the information architecture? For example, to what extentwill the CDR scrutinize a claim record? Outside of a proper registration and identification andthe appropriate inputting to database fields, how will the CDR check to make sure that allnecessary data is entered and correct? These data quality issues have not been fully realized bythe IS department.

5.2.2 Data Dictionary/Data Repository IssuesOne of the problems with performing a problem analysis on a "proposed" system is the factthat it is just that: proposed. It does not yet exist. This became apparent in the case of thedictionary/repository issues? I constantly questioned IS as to: How will the dictionary recordnewly defined or updated definitions? How will the repository work along with the AccessFacility to check inputted data against corporate standards? What type of hardwarerequirements will you set for the dictionary/repository which will supposedly hold definitionsof all CDR data? What types of inter-departmental meeting plans have been made for thestandardization of multiple-definition variables? The answers to these and several otherquestions regarding the function of the dictionary/repository was: " We don't know. We'relooking for ways to develop it". This lack of development focus for the datadictionary/repository is itself a problem issue for the IS department. In light of this, I did abrief investigation of the proposed dictionary/repository to be used for TIC; Digital's CommonData Dictionary Plus (CDD/Plus) to determine which of the proposed dictionary/repositoryrequirements would be satisfied/not satisfied by the CDD/Plus.

3 1

5.2.2.1 Digital Common Data Dictionary Plus (CDD/Plus)The CDD/Plus has the ability to resolve and create data structures for COBOL, as well as tieinformation together from various platforms. The CDD/Plus enables users to: (1) allowsharing of field, record and other data definitions among various VAX languages, (2) accessdefinitions in multiple dictionaries, (3) reduce the use of inconsistent definitions, and (4)protect against unauthorized access to the dictionary. Below are some of the other highlightedfeatures of the CDD/Plus dictionary/repository which will tailor to the needs of the CDR:

Distributed Dictionary Implementation. CDD/Plus providesdistributed access to the Common Dictionary Operator (CDO)dictionary definitions and directories. CDO metadata format tracksname, description, location, type, format, size, change history, andusage of the actual data. Definitions can be managed eithercentrally or locally.

- Field-Level Data Descriptions. Field definitions (smallest unit ofmetadata which can be created in dictionary) can be combined toform various record definitions and can be accessed individually.Prevents multiple storage of definitions.

- Relationships. Created by CDD/Plus when users connect two CDOdefinitions. Users can base definitions of new fields on that of anexisting field definition in the CDO. Relationships need not bedefined: CDO automatically creates them when field and recorddefinitions are created.

. Usage Tracking. All CDO dictionary usage is recorded so that userscan find out which other dictionary definitions make use ofparticular field definitions. In this way when users want tochange a field definition, they will be able to determine whichdefinitions will be affected by the change and which definitionswill need to be redefined before using the changed fielddefinition.

- Data Security and Integrity. CDD/Plus provides the dataadministrator with the tools to grant or deny access rights todictionary definitions. CDD/Plus also provides journalingcapabilities that automatically protect dictionary sessions fromsystem failures.

The CDD/Plus repository is a good start for TIC in trying to achieve the repository objectivesspecified for the proposed dictionary/repository, but there are still issues, even with the usageof the CDD/Plus, that will remain unresolved. In my estimation, the CDD/Plus represents ahuge improvement on the dictionary problems which IS experienced with WEBSTER. TheCDD/Plus can feasibly handle multiple-definition variable problems, metadata storage andmanagement, and data semantic differences. However, I do not see how the CDD/Plus willwork at tracking data quality (in terms of maintaining a history of all data input into the system,and assuring its accuracy), tracking generated reports (to reduce regeneration of reports andrequested data), and enabling access to different types of aggregated data (Level 3). I do notthink that TIC will have a problem meeting the hardware and software requirements necessaryto operate the CDD/Plus (primarily due to the close relationship between DEC and LMIC), yetin order to realize the goals previously set forth for the Access Facility, more than theCDD/Plus alone is required.

32

A final note on new problem issues refers back to the system objective of access to informationwithin shorter time-frames than those of the present. I do not think that this objective will bemet with the proposed system. The single data resource structure of the Access Facility maypossibly lead to huge time losses in the retrieval of data in addition to a multitude of additionalwork for the IS department. As one systems manager stated:" All of the data will be in oneplace, and the whole company will be trying to access it simultaneously. There will definitelybe time factors involved". I foresee this information backup taking place on Level 2 and moreso on Level 3 of the information architecture. There will be massive requests for aggregatedinformation due to the recent increased demand for corporate data. At present even theproposed system shows no method of handling and recording all of the different datarequisitions. This will only result in lost time and money. In my view there are considerationswithin TIC's proposed system architecture which must be given a great deal of thought beforeproceeding with further system development.

33

CHAPTER 6Summary and ConclusionsThis chapter covers a review of the case study and documentation of the results andconclusions which I arrived at. Possible CISL solutions and further CISL assistance are alsoinvestigated in this chapter.

6.1 Summary

This case study, examined a historical look at the different types of problems which have beenand are being experienced within the Information Systems department at The InsuranceCompany. The study begins with an analysis of the types of information systems problemsthat were realized about 4 years ago after TIC restructured their organization. It continues on tolook at the present system today and how it is handling those same problems. The study thencontinued on to look at the need for change and the strategy of the IS department in developinga new system architecture to handle the problems which still exist and which attempts toredefine the way the company does business from an information systems standpoint. Afterthis, the study looks at the proposed system architecture and the conceptualized solutions itposes to the problem issues presently identified. The study then analyzes and evaluates theseconceptualized solutions in light of future considerations to determine the proposed solutions'potential effectiveness.

6.2 Discussion

There are several improvements that need to be made to The Insurance Company's presentinformation systems structure. Easier access to corporate data, better management of dataquality, handling of multiple-definition variables and other data semantics problems, and poorconnectivity between present databases are all problems which need to be addressed in orderfor TIC to operate more efficiently as a company. Present database improvements taking place,such as attempts to place BOCOMP information directly on to the Corporate Database, therebyeliminating the need for HOCOMP, are a start, but more may be necessary for TIC to achieveits objectives.

TIC's proposed system architecture conceptually targets many of the problems with potentialsolutions. However, the fact that it is solely a proposed model, still in its planning stages,leads to many concerns as to the reality of having an architecture which can perform in the waythe Corporate Data Resource supposedly will. Problems with data quality and consistency aswell as data access are left relatively unsolved by the proposed architecture at this time. Until amore concrete model is presented by the IS department problems such as data qualitymanagement, and access to aggregate, report formatted data will continue to exist within thefirm. The proposed development of the dictionary/repository on the CDD/Plus repository ispresently, the most highlighted feature of the proposed system. But even the CDD/Plus doesnot answer all of the demands specified by the IS department in their visualization of adictionary/repository. In addition, I am not even sure as to whether or not the present databasesystems will be compatible with the CDD/Plus or if new hardware will have to be purchased inorder to transform the proposed system architecture into a reality.

Perhaps one of the most important problems with the proposed system architecture/solution todate, is the lack of a uniform understanding within the IS department as to how it will operate.Of the nine people which I spoke to at The Insurance Company at least three had conflictingviews as to the role present database systems will play once the proposed system is in place.For example, some individuals thought that the Corporate Database would be operating as aLevel 3 database, while others said that it would be removed completely from the system, withonly its historical tables and reports salvaged. Not only were there conflicting views as to howdatabases such as HOCOMP, COBRA, and the CDB will operate or not operate under the new

34

system, but there was also confusion as to what level certain databases would be conceived asfitting into. For example, some individuals state that HOCOMP would operate as a Level 2database while others viewed HOCOMP as a Level 3 database. Needless to say there was andis confusion among some of the business departments (Corporate Research) as to exactly howthe proposed system will look and operate. In order for the rest of the company to grasp theconcepts being presented in the proposed architecture the designers (the IS dept.) must agree asto exactly how it is going to look.

One of the underlying objectives which the proposed system seems to want to accomplish is tochange the work habits of the business users. This has been near impossible at TIC due to thenature of the insurance industry and because of the resistance to change of the business users.Data transfer will remain dominated by hardcopy as opposed to on-line so long as users refuseto learn and appreciate the need for such changes in work habits. Similarly, users wantincreased, accelerated access to corporate data, but do not want to take the time to learn themeans by which obtain that data (presently SQL). These "people-specific" problems willremain static unless there is some sort of education process to make these users aware of theadvantages behind certain changes in work habits.

6.3 Possible CISL Solutions

Assigning potential Composite Information Systems Laboratory solutions to the TIC case isdifficult because it is hard to define exactly what it is that TIC needs most at this time. CIS/TKtype implementation probably is not appropriate in this case in that TIC is seeking a differenttype of integration. I think that there is room within both the present system and the proposedsystem for the application of recent CISL theory development, particularly that of sourcetagging. The source tagging problem is one which applies to the TIC case in that it deals withusers accessing data from multiple databases without prior knowledge as to where the data is.CISL field studies indicate that users want simplicity of making queries, but in addition wantthe ability to know the source of data field. CISL research enabled the development of apolygen model which was developed to study heterogeneous database systems from a multiplesource perspective. The polygen model is directed at addressing issues of data source,intermediate data sources used in data access, source tagging for information composition.Knowledge of data sources allows for better interpretation of data semantics, and data analysis.This type theory applied to the case of The Insurance Company could possibly help with dataaccess and semantics problems within the present system, as well as become a basis for dataaccess features within the proposed Corporate Data Resource Access Facility.

6.4 Semantics Issues

The Insurance Company's proposed system architecture solves certain data semantics issues ona conceptual level. However, these semantic issues are non-trivial and must be evaluated on ahigher level to determine the technical feasibility of attaining their resolution. Below aresemantic issues that will become of great concern to the IS department.

6.4.1 Data Definition Change Within the Data Dictionary/Repository

The question as to how data definition changes will be dealt with by the dictionary/repository isone of great importance. In the case of TIC, we saw how variable definitions can change byway of new company or industry standards. The issue of changing definitions is one that wasnot dealt with by TIC's previous data dictionary, WEBSTER, and was part of the reasonbehind its disfunctionality. Some questions remain unanswered:

- How will changes in data definitions be reflected in the dictionarymetadata

35

. How meaningful will data be to different applications after aparticular definition change?

- Will the CDD/Plus dictionary/repository provide for a method torepresent data semantics in a manner to allow for automaticdetermination of the effect of changing data semantics due todefinition changes?

When posed with these questions, a manager of the dictionary/repository team said that TICdid not expect to see any more changes in definitions or naming standards after thedevelopment of the dictionary/repository. They stated that "rules" as to how data is used incalculations will constantly change, and that these changes would be reflected in the metadata.I am not sure as to whether or not this is a safe assumption to make in light of the recent datanaming standards imposed by TIC only a few years ago. As TIC continues to develop thedictionary/repository, standard definitions are going to be imposed on certain multipledefinition variables and changes in those definitions in the future are likely. This particularissue is not being considered within IS presently.

The CDD/Plus dictionary/repository claims to be able to handle metadata changes resultingfrom changes in data definition, yet it is not clear to me as to how this will occur. TheCDD/Plus utilizes an entity, attribute, and relationship (EAR) model description of the metadataand claims to provide automatic notification of changes once a definition is modified. Theissue of data definition changes and its impact on metadata deserves more attention.

6.4.2 Multiple Definition Issues

Multiple definition variables are often causes of poor data quality as we saw in the case of TIC(Section 2.3.4). The proposed solution discussed in Section 4.4.4 described the IS departmentas working along with different departments in a process to examine how each of the multipledefinitions elements are derived and then deriving appropriate names for derived elements.Other more general variables, whose definition differences are trivial, will have standarddefinitions imposed. IS will work along with business groups to determine which variablesfall under which category. A variable such as net written premium, which has about 17different ways of being calculated by different departments, will have derived elements toreflect the ways in which it is interpreted (i.e., netwritten_ premium-claims,net_writtenpremium-auto, net_writtenpremium _EOS (exclusive of sales)).

The process behind determining which variables will have standards imposed upon them andwhich variables will have derived elements is likely to be a longer, more difficult process thanIS may assume. More importantly, it is questionable as to whether or not there will be aprovision in variable metadata to allow for the addition of future derived elements fromparticular multiple definition variables (i.e., net written premium). Will there be appropriateforms of metadata for the derived definition elements that can be specified for differentapplications and databases? In these instances, will the data warehouse (Level 2) be providingcorrect, meaningful data?. This issue deserves further research.

6.4.3 Data Value Transformations

This issue follows along the same concept as the multiple definition issue. The proposedarchitecture is to provide for derived data elements for multiple definition variables. Thesederived data elements will have different definitions and tailor to different business groups.Assume that there is a data value used by one group which to be used by another group whichdefines the value differently. The issue is whether or not the repository will be capable oftaking the data value used by the first group, and transform its value to the appropriate valuedesired by the users in the second group. That is, will business users have the ability to see thedata they need in the format in which they desire?

36

The dictionary/repository team representative that I spoke to didn't know whether or not theCDD/Plus was capable of such value transformations. The dictionary/repository team seemsset however, in relying upon multiple derived data elements as opposed to the idea of definitiontransformations. This issue is well worth looking into.

6.4.4 Data Source Tagging

As we saw in Section 4.4.1, TIC's proposed system architecture provides for a check on dataquality. Entry of data into a Level 1 system will be done through an authorized process whichwill tag entered data and subsequently identify all entered data with a business group andmanager, thereby tracking data quality. While this is a good concept, there is an issue as toexactly how it can be accomplished. There is a question as to how metadata will be updatedwith data source information. This issue presently cannot be solved by the CDD/Plus in that itcontains.no such data tagging feature.

A TIC dictionary/repository representative explained the concept of data source tagging withinthe CDR Access Facility, but gave no information as to how, given TIC's present resources,the task would actually be accomplished. TIC plans on influencing vendors, such as Digital, toenhance their products to meet TIC's particular needs. It is questionable as to how quicklyTIC's proposed architecture needs can be satisfied by way of this process.

6.5 Conclusions

The Insurance Company is making important advances in addressing their information systemsneeds. However, major challenges lie ahead in regard to satisfying their needs for improveddatabase access and data quality. Strategic planning and integration of database systems willhelp TIC in overcoming the technical and organizational barriers to their needs.

37

7. BIBLIOGRAPHY

1. Madnick, Stuart E., Wang, Y. Richard, Siegel, Michael, "TheComposite Information Systems Laboratory (CISL) Project at MIT",Center for Information Research Working Paper No. 3157, SloanSchool of Management, MIT, Cambridge, Mass., May 1990.

2. Siegel, Michael, Madnick, Stuart E., "Identification andReconciliation of Semantic Conflicts Using Metadata", Center forInformation Research, Working Paper No. 3102, Sloan School ofManagement, MIT, Cambridge, Mass., November, 1989.

38

APPENDIX AWorkers' Compensation Claim Workflow

1. Claim comes directly into Claims department.

2. Claims department begins investigation on claim.

3. Claim information is passed by hardcopy to statistical departmentto grant coverage and in some cases do coding on claim.

4. Legal department receives claim information (via hardcopy) forregistration purposes.

5. Updated claim information goes back to Claims department forofficial registration.

6. Registered claim information is entered into BOCOMP.

7. After registration and any other transaction (i.e. Loss costestimate, Premium estimate, etc.) claim information is updatedand passed from BOCOMP to HOCOMP and then to the CorporateDatabase.

8. Registration information passed from HOCOMP to CorporateDatabase.

9. Financial department accesses HOCOMP for claim information toprocess refund against claim. "Cut a check".

10. Claim is updated (manually) at intervals depending upon type ofclaim (i.e., 90-day, annually, 18 months). Upon each update(Updates done by Claims department), updated information is re-entered into BOCOMP and passed to HOCOMP and the CorporateDatabase. Updating continues at given frequency until the claim isclosed. (6., 7., 8., repeated at update frequency)

11. Actuarial department will access BOCOMP for information togenerate aggregate reports. This occurs on a monthly or yearlybasis dependent upon report.

39

Appendix AWorkers' Compensation Claim Workflow

ClaimsDepartment

ActuarialDepartment BOCOMP

Branch OfficeCompensationDatabase

HOCOMPHome OfficeCompensationDatabase

CorporateDatabase

Source: The Insurance Company I ssss

StatisticalDepartment

LegalDepartment

APPENDIX B

Digital's VAX Common Data Dictionary Plus (CDD/Plus)(Source: Digital Equipment Corporation)

Important Highlights- Designed as key component in Digital's information management

system which was designed to span the life cycle of data fromapplication development, through production systemimplementations, to data extractions, to end-user analysis andreporting.

- Utilizes an entity, attribute, and relationship (EAR) data model toenable it to function as an active data dictionary, driving othertools at compiletime and providing automatic notification ofchanges once a definition is modified.

- With CDD/Plus open architecture, customers and independentsoftware vendors can integrate user-written programs with thebasic dictionary to support individual requirements. Thiseliminates the need to develop separate dictionaries to supportproprietary tools or applications.

The dictionary provides a single, logical controlled repository for metadata external to the manyapplications, databases, languages, production systems, query and report writers, anddevelopment and maintenance tools that use it.

Metadata is supported in two formats: metadata which can be manipulated by the CommonDictionary Operator (CDO) and metadata which can be manipulated by the DictionaryManagement Utility (DMU).

CDO Metadata

CDO Metadata format allows you to store definitions and also store information about how thedefinitions are related. This metadata in VAX CDD/Plus tracks the name, description, location,type, format, size, change history, and usage of the actual data. This relationship informationis essential to the special features of the CDO dictionaries. Only CDO dictionaries can producethese special features. VMS information management products that can provide full supportfor CDO dictionaries are also able to utilize special capabilities associated with thosedictionaries.

DMU Metadata

In the DMU format, the dictionary is organized as a hierarchy of dictionary directories anddictionary objects. DMU dictionaries are supported in the CDD/Plus, and dictionarydefinitions are manipulated by DMU, the Data Definition Language Utility (CDDL), and theVerify/Fix Utility (CDDV).

With the CDD/Plus, existing applications which utilize DMU definitions can also read CDOdefinitions through the transparent services of the bridge. This bridge translates the CDOinformation into information that can be understood by the DMU interface. The existing DMUdefinitions can be read by CDO dictionaries. Illustration in Appendix Figure B-1.

41

Common Dictionary Operator Features

- Distributed Dictionary Implementation. CDD/Plus designed foreither central or distribute implementation. Allows set-upflexibility.

- Field-Level Data Descriptions. CDO dictionaries can create andaccess metadata in a unit as small as a field. Field definitions canbe simple data structures or complex subscripted structures.These field definitions can be easily combined to form variousrecord definitions, and can be accessed individually fromsupporting VMS products. Field-level data descriptions increasethe level of data sharing possible.

- Automatic Relationship Support. CDD/Plus automatically createsrelationships when you connect two CDO definitions in some way.For example, you can base the definition of a new field on anexisting field definition. Similarly, one can relate a group of fielddefinitions to a record definition by including the field names inthe record definition. There is no need to specifically define theserelationships-CDO automatically creates the connection.

- Usage Tracking. CDD/Plus is built on the entity-attribute-relationship model so it stores relationships between datadefinitions and components of applications. Provides powerfulimpact analysis, and shows the effect of definition changes onapplications.

Other CDD/Plus Highlights

- Metadata can be created and accessed at the field level forincreased data sharing.

- Increased control of definition change allows users to eitherauthorize immediate changes to an original definition or create anew version and allow other users to incorporate the change overtime.

- Enhanced data security and integrity are provided by definition-protection provisions. Dictionary access and usage can beenforced to the degree determined by the data administrator.

Hardware RequirementsVAX, MicroVAX, VAXstation, or VAXserver configuration as specified in the SystemSupport Addendum.

Software RequirementsVMS Operating System, VMS Workstation Software.

42

Appendix Figure B-i

DEC CDD/Plus Architecture

CDODictionary

DMUDictionary

Transparent CDD/Plus bridge enables applications and informationmanagement products fo read both CDO and DMU dictionary definitions.

Source: Digital Equipment Corporation

43

Bridge

Documents

SYSTEMS ARCHITECTURESweb.mit.edu/smadnick/www/wp2/1991-03.pdfTHE STRATEGIC IMPLEMENTATION OF A DATA DICTIONARY WITHIN INFORMATION SYSTEMS ARCHITECTURES: THE INSURANCE COMPANY CASE