Data Architecture
From Zen to Reality
M<
Charles D. Tupper
AMSTERDAM • BOSTON • HEIDELBERG • LONDON
NEWYORK • OXFORD • PARIS • SAN DIEGO
SAN FRANCISCO SINGAPORE • SYDNEY • TOKYO
ELSEVIER Morgan Kaufrnann Publishers is an imprint of Elsevier
CONTENTS Vii
CONTENTS
Preface xxi
SECTION 1 THE PRINCIPLES
Chapter 1 Understanding Architectural Principles 3
Defining Architecture -3
Design Problems 6
Patterns and Pattern Usage 7
Concepts for Pattern Usage 8
Information Architecture 11
Structure Works! 12
Problems in Architecture 14
Architectural Solutions 16
The "Form Follows Function" Concept 17
Guideline: Composition and Environment 19
Guideline: Evolution 19
Guideline: Current and Future 19
Data Policies (Governance), the Foundation Building Codes 21
Data Policy Principles 21
Chapter 2 Enterprise Architecture Frameworks and
Methodologies 23
Architecture Frameworks 23
Brief History of Enterprise Architecture 26
The Zachman Framework for Enterprise Architecture 26
The Open Group Architecture Framework 29
The Federal Enterprise Architecture 33
Conclusions 40
Enterprise Data Architectures 41
Enterprise Models 42
The Enterprise Data Model 43
The Importance of the Enterprise Data Model 44
Object Concepts:Types and Structures Within Databases 45
Inheritance 45
Vlii CONTENTS
Object Life Cycles 45
Relationships and Collections 46
Object Frameworks 46
Object Framework Programming 47Pattern-Based Frameworks 48Architecture Patterns in Use 48
U.S.Treasury Architecture Development Guidance 49TADG Pattern Content 49TADG Architecture Patterns 50IBM Patterns for e-Business 50
Enterprise Data Model Implementation Methods 53
Chapter 3 Enterprise-Level Data Architecture Practices 57
Enterprise-Level Architectures 57
System Architectures 58
Enterprise Data Architectures 58
EnterpriseTechnology Architectures 58
Enterprise Architecture Terminology—Business Terms 59The Enterprise Model 60The Enterprise Data Architecture from a Development Perspective 60
Subject Area Drivers 62
Naming and Object Standards 63Data Sharing 64
Data Dictionary-Metadata Repository 66Domain Constraints in Corporate and Non-Corporate Data 67
Organizational Control Components 67
Data Administration 68Database Administration 68
Setting Up a Database Administration Group 70
Repository Management Areas and Model Management 72
Chapter 4 Understanding Development Methodologies 75
Design Methods 75
Why Do We Need Development Methodologies? 76The Beginnings 77Structured Methods 77Structured Programming 79Structured Design 79
CONTENTS ix
Structured Analysis 80
Still Having Problems 81
Requirements Definitions 81
Problems with Structured Approaches 81
Personal Computers and the Age ofTools 82
Engineering Concepts Applied 83
Other Principles Utilized 84
The Birth of Information Engineering 84
Information Engineering as a Design Methodology 85
The Synergy ofTools and Information Engineering 87
Problems with Information Engineering 88
Implementing the Best of IE while Minimizing Expense 89
SECTION 2 THE PROBLEM
Chapter 5 Business Evolution 95
The Problem of Business Evolution 95
Expansion and Function Separation 96
Separate Function Communication 96
Manual Data Redundancy 97
Data Planning and Process Planning 99
Corporate Architecture 100
Using Nolan's Stages of Growth 102
Problems with Older Organizations 103
Business Today 104
When Will It End? 106
What Can We Do about It? 107
Generic Subject Areas for Corporate Architectures 108
Corporate Information Groupings or Functional Areas 110
Corporate Knowledge 114
Chapter 6 Business Organizations 117
Purpose and Mission of the Organization 117
Ideology, Mission, and Purpose 118
Design with the Future of the Organization in Mind 120
Generalize for Future Potential Directions 121
Organizational Structure 123
What Are the Basic Functions in an Organization? 124
X CONTENTS
The Information Needs of Management 124
Organizations Don't Know WhatThey Don't Know 125
Information Strategy for Modern Business 127
Maximizing the Value of Information 131
Forces in the Organization 136
Chapter 7 Productivity Inside the Data Organization 139
Information Technology 139
What Is Information Technology? 139
Trends in Information Technology 140
Vendor Software Development 141
The Other Option 141
Trends in Organizational Change 142
Productivity 143
Explanations for the Anomaly in Productivity.. 143
InformationTechnology and Its Impact on Organizations 147
Why Invest in InformationTechnology? 148
Ineffective Use of InformationTechnology 149
Other Impediments to Organizational Efficiency 150
Organizational Impediments to InformationTechnology 151
Technological Solutions for InformationTechnology 151
Human Resource Issues in InformationTechnology 152
Quality of the Workforce 153
Summary 153
Maximizing the Use of InformationTechnology 154
Chapter 8 Solutions That Cause Problems 157
Downsizing and Organizational Culture 157
Downsizing Defined 158
Culture Change 158
Organizational-Level Analysis 160
Organizational/lndividual-Level Analysis 161
Downswing's Impact on Culture 162
A Different Approach to Culture Change and Downsizing 163
Summary 163
Outsourcing 164
Rapid Application Development 168
CONTENTS XI
SECTION 3 THE PROCESS
Chapter 9 Data Organization Practices 175
Fundamentals of All Data Organization Practices 175
Corporate Data Architecture 175
Corporate Data Policy 176
ArchitectureTeam 176
DesignTeam 177
Develop the Project Structure 177
Scope Definition 177
Project Plan 178
Data Architecture and Strategic Requirements Planning 178
Data Gathering and Classification 178
Business Area Data Modeling 179
Current Data Inventory Analysis 179
Data and Function Integration 179
Event Identification 180
Procedure Definition via Functional Decomposition 180
Process Use Identification 181
New Function Creation 181
Utilization Analysis via Process Use Mapping 182
Access Path Mapping 182
Entity Cluster Development and Logical Residence Planning 183
Application DevelopmentTemplates 183
Quality Assurance Metrics 184
Maintenance Control Process 184
The Software Development Methods 184
Architectural Development Methods 185
Atomic Process Models 188
Entity Process Models 188
The Unified Method -
189
Chapter 10 Models and Model Repositories 191
What Are Models and How DidThey Come About? 191
Data Models Introduction I92
What Does Modeling Do for Us? -194
Process Models Introduction I94
Process Models—Why? 195
Xii CONTENTS
How Are Automated Models Developed? 195
How Are Models Retained? 196
Model Repository Policy and Approach 197
Shared Repository Objects 198
Model-Driven Releases 199
Supporting an Application Release 200
VersionType: Participation 201
Seamless Development Control Process 202
Test Environments, Releases, and Databases 203
Release Stacking 203
Emergency Corrections 204
Emergency Correction Procedures 204
PTF Implementation for Shared Batch and Online Objects 205
Chapter 11 Model Constructs and Model Types 207
Data Model Constructs 207
Application Audience and Services 207Entities 208Attributes 208
Relationships 209
Primary Identifiers 209
Entity Types 210
Entity Relationship Diagrams 210
Types of Relationships 211
ModelTypes 212
Physical-Level Design 216
Primary Keys 216Normalization 217
Denormalization 218
Overnormalization 218Domains 219Domain Constraints 219Reference Data 220Generic Domain Constraint Constructs 220
Chapter 12 Time as a Dimension of the Database 223
What Is to Be Done with Historical Data? 223
Application History 223
CONTENTS Xili
Classes and Characteristics 225
Current Occurrence 226
Simple History 226
Bounded Simple History 227
Complex History 227
Logically Modeling History 228
Physical Design of History 229
Physical Implementation of History 230
Performance Tuning 231
Finding Patterns 231
Tips andTechniques for Implementing History 232
Types of Systems 233
Physical Structure 235
Dimensional History 237
SECTION 4 THE PRODUCT
Chapter 13 Concepts of Clustering, Indexing, and Structures 241
Cluster Analysis 241
What Is a Cluster? 241
Cluster Properties 241
Cluster Theory Applied 242
Inserts 244
Updates 245
Deletes 245
Physical Structure 245
Key History and Development 246
Primary Keys 248
Foreign Keys 248
Foreign Key Propagation 249
Candidate Keys 249
Natural Keys 250
Engineered Keys 250
Surrogate Keys 250
High Water Keys 251
One of a Kind Keys 252
Other Specialized Keys 252
Xiv CONTENTS
Chapter 14 Basic Requirements for Physical Design 255
Requirements for Physical Design 255
How Much Data? 255
History 256
Population Quantification of Application Data 256
Concurrency 257
Security/Audit 258
Audit 260
Archive/Purge 260
Recovery/Restart 261
Sort/Search Requirements 262
Reorganization and Restructuring 262
Data Integrity 262
Referential Integrity 263
Data Access 264
Privacy Requirements 265
Chapter 15 Physical Database Considerations 267
Three-Level Architecture 267
Data Independence 270
Database Languages 271
Classification of Database Management Systems 272
Factors Impacting Physical Database Design 274
Analysis of Queries, Reporting, andTransactions 275
Queries, Reports, andTransactions 275
Interpreting the Functional Decomposition 276
Event Identification 276
Process Use Identification Reviewed 276
Utilization Analysis via Process Use Mapping 276
Time Constraints of Queries andTransactions 277
Analysis of Expected Frequency of Insert, Delete, Update 277
Other Physical Database Design Considerations 278
Population on the Database 279
Chapter 16 Interpreting Models 281
Physical Design Philosophy 281
Objectives 281
CONTENTS XV
The Entity Relationship Model 283
Interaction Analysis 283
The CRUD Matrix 285
Entity Life Cycle Analysis/Entity StateTransition Diagrams 285
Process Dependency Scope and Process Dependency Diagram 287
Event Analysis 288
Process Logic Diagrams 290
Interaction Analysis Summary -290
Changes to ER Models 290
ERD Denormalization 291
Actions on SuperType-Subtype Constructs 293
Actions on Multiple Relationships 294
Resolution of Circular References 295
Resolution of Duplicate Propagated Keys 296
Access-Level Denormalization 297
Movement of Attributes 297
Consolidation of Entities 297
Derived Attributes and Summary Data 298
Implement Repeating Groups 299
Introduce Redundancy 300
Introduce Surrogate or Synthetic Keys 301
Vertical or Horizontal Segmentation 301
Access Path Mapping 302
Conclusion 305
SECTION 5 SPECIALIZED DATABASES
Chapter 17 Data Warehouses 1 309
Early Analysis in this Area 309
Keen and Scott-Morton 310
Decision Discussion 311
Components of Decisions 311
Responsibility 312
Report Writers and Query Engines 314
Warehouses versus Reporting Databases 314
Higher Level of Abstraction 315
Based on Perceived Business Use 315
Structure Evolution 315
XVJ CONTENTS
Warehouse Components 316
Why Can't OLTP Data Stores Be Used? 316
DSS Requirements 317
Warehouse Characteristics 317
Warehouse Modeling 317
Warehouse Modeling Depends on Architectures 318
Enterprise-Level Data Architecture 319
Chapter 18 Data Warehouses II 321
Reprise 321
Background 321
The ManyTypes and Levels of Data 321
Data Modeling; Definitions 322
Logical to Physical Transformation 323
Entity Relational Models 324
Placement of Models 324
Dimensional Modeling: Definitions ,325
Denormalization and the Dimensional Model 325
Dimensional Model Evaluation 326
Data Evolution 326
WhatAre the Choices? 326
Applicability of the Dimensional and Relational and
Hybrid Models 327
Dimensional Architecture 329
Where Is the Relational Data Warehouse Best Suited? 330
Where Is the Dimensional Best Suited? 331
Hybrid ER-Dimensional 331
Problems Associated with the Hybrid Approach 333
Target Enterprise Architecture 333
Building an Enterprise Data Model 333
Current Data Inventory 335
Standard or Corporate Business Language 335
Conclusion of Hybrid Approach 336
Chapter 19 Dimensional Warehouses from Enterprise Models 337
Dimensional Databases from Enterprise Data Models 337
Warehouse Architecture 337
CONTENTS XVii
Dimensional Model Concepts 339
Review of Basic Components of Dimensional Models 340
Differences between Dimension and FactTables 341
Star Schemas 341
Star Schema Design Approach 342
Enterprise Data Warehouse Design 343
Structure Design 343
Categorize the Entities 344
Identify Dependency Chains 345
Produce Dimensional Models 346
Options for Dimensional Design 347The Flat Table Schema 348
The SteppedTable Schema 348
Simple Star Schemas 348Snowflake Schemas 352Star Schema Clusters 352Review of Design Options 355
Chapter 20 The Enterprise Data Warehouse 357
Enterprise Data Warehouses 357
WhyWouldYou Want an Enterprise Data Warehouse? 359
Enterprise Data Warehouse Defined 359
What Are the Important EDW Driving Forces? 360The Best Practices for EDW Implementation 362
Enterprise Data Architecture Implementation Methods 363
TheTop-Down Approach • 363The Bottom-Up Approach 364Your Choices 366
Preliminary Conclusion 366The Hybrid Approach 366
Implementation Summary 367
Chapter 21 Object and Object/Relational Databases 369
Object Oriented Data Architecture 369
Sample Object Oriented Design Concept: Wiring Money 370
Examples of Different Actions 371
Elements of Object Oriented Design: Overriding 373
xviii CONTENTS
Analogy and Problem Solving 373
Coping with Complexity -374
Interconnections:The Perpetrator of Complexity 374
Assembler Languages 374
Procedures and Functions 375
Modules 375
Parameter Passing 375
Abstract Data Types 375
Objects with Parameter Passing 376
Object Oriented Architectures Summary 377
Enhanced Entity Relationship Concepts 378
Subclasses and Superclasses 378
Attribute Inheritance 378
Specialization 379
Generalization 379
Generalization Hierarchies 379
Physical Data Design Considerations 380
Messaging 381
Object Identity 381
Type "Generators" andType Constructors 382
Summary 382
Chapter 22 Distributed Databases 385
Some Distributed Concepts ...385The Distributed Model 385How Does It Work? 386Distributed Data Design Concepts 387
Fragmentation 387
Replication 388
Homogeneous Distributed Model 388
Federated or Heterogeneous Distributed Model 389Distributed DBMSs 391
Reliability and Availability 392
Controlled Data Sharing 392
Performance 393Qualities Required in a DDBMS 393Other Factors 394An Overview of Client Server 394
CONTENTS Xix
Functionality within Client Server 395
ATypical DDBMS 396
DistributionTransparency 397
Types of DDBMSs 397
Individual Site Failure's Effect on Data Integrity ; 398
Individual Site Failure's Effect onTraffic Flow 399
Communication Failure 399
Distributed Commitment 399
Distributed Deadlocks 399
Summary 400
Index 401