14
Data Architecture From Zen to Reality M< Charles D. Tupper AMSTERDAM BOSTON HEIDELBERG LONDON NEWYORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO ELSEVIER Morgan Kaufrnann Publishers is an imprint of Elsevier

Data architecture : from Zen to reality - GBV architecture : from Zen to reality Subject: Burlington, Mass., Morgan Kaufmann, 2011 Keywords: Signatur des Originals (Print): T 11 B

Embed Size (px)

Citation preview

Page 1: Data architecture : from Zen to reality - GBV architecture : from Zen to reality Subject: Burlington, Mass., Morgan Kaufmann, 2011 Keywords: Signatur des Originals (Print): T 11 B

Data Architecture

From Zen to Reality

M<

Charles D. Tupper

AMSTERDAM • BOSTON • HEIDELBERG • LONDON

NEWYORK • OXFORD • PARIS • SAN DIEGO

SAN FRANCISCO SINGAPORE • SYDNEY • TOKYO

ELSEVIER Morgan Kaufrnann Publishers is an imprint of Elsevier

Page 2: Data architecture : from Zen to reality - GBV architecture : from Zen to reality Subject: Burlington, Mass., Morgan Kaufmann, 2011 Keywords: Signatur des Originals (Print): T 11 B

CONTENTS Vii

CONTENTS

Preface xxi

SECTION 1 THE PRINCIPLES

Chapter 1 Understanding Architectural Principles 3

Defining Architecture -3

Design Problems 6

Patterns and Pattern Usage 7

Concepts for Pattern Usage 8

Information Architecture 11

Structure Works! 12

Problems in Architecture 14

Architectural Solutions 16

The "Form Follows Function" Concept 17

Guideline: Composition and Environment 19

Guideline: Evolution 19

Guideline: Current and Future 19

Data Policies (Governance), the Foundation Building Codes 21

Data Policy Principles 21

Chapter 2 Enterprise Architecture Frameworks and

Methodologies 23

Architecture Frameworks 23

Brief History of Enterprise Architecture 26

The Zachman Framework for Enterprise Architecture 26

The Open Group Architecture Framework 29

The Federal Enterprise Architecture 33

Conclusions 40

Enterprise Data Architectures 41

Enterprise Models 42

The Enterprise Data Model 43

The Importance of the Enterprise Data Model 44

Object Concepts:Types and Structures Within Databases 45

Inheritance 45

Page 3: Data architecture : from Zen to reality - GBV architecture : from Zen to reality Subject: Burlington, Mass., Morgan Kaufmann, 2011 Keywords: Signatur des Originals (Print): T 11 B

Vlii CONTENTS

Object Life Cycles 45

Relationships and Collections 46

Object Frameworks 46

Object Framework Programming 47Pattern-Based Frameworks 48Architecture Patterns in Use 48

U.S.Treasury Architecture Development Guidance 49TADG Pattern Content 49TADG Architecture Patterns 50IBM Patterns for e-Business 50

Enterprise Data Model Implementation Methods 53

Chapter 3 Enterprise-Level Data Architecture Practices 57

Enterprise-Level Architectures 57

System Architectures 58

Enterprise Data Architectures 58

EnterpriseTechnology Architectures 58

Enterprise Architecture Terminology—Business Terms 59The Enterprise Model 60The Enterprise Data Architecture from a Development Perspective 60

Subject Area Drivers 62

Naming and Object Standards 63Data Sharing 64

Data Dictionary-Metadata Repository 66Domain Constraints in Corporate and Non-Corporate Data 67

Organizational Control Components 67

Data Administration 68Database Administration 68

Setting Up a Database Administration Group 70

Repository Management Areas and Model Management 72

Chapter 4 Understanding Development Methodologies 75

Design Methods 75

Why Do We Need Development Methodologies? 76The Beginnings 77Structured Methods 77Structured Programming 79Structured Design 79

Page 4: Data architecture : from Zen to reality - GBV architecture : from Zen to reality Subject: Burlington, Mass., Morgan Kaufmann, 2011 Keywords: Signatur des Originals (Print): T 11 B

CONTENTS ix

Structured Analysis 80

Still Having Problems 81

Requirements Definitions 81

Problems with Structured Approaches 81

Personal Computers and the Age ofTools 82

Engineering Concepts Applied 83

Other Principles Utilized 84

The Birth of Information Engineering 84

Information Engineering as a Design Methodology 85

The Synergy ofTools and Information Engineering 87

Problems with Information Engineering 88

Implementing the Best of IE while Minimizing Expense 89

SECTION 2 THE PROBLEM

Chapter 5 Business Evolution 95

The Problem of Business Evolution 95

Expansion and Function Separation 96

Separate Function Communication 96

Manual Data Redundancy 97

Data Planning and Process Planning 99

Corporate Architecture 100

Using Nolan's Stages of Growth 102

Problems with Older Organizations 103

Business Today 104

When Will It End? 106

What Can We Do about It? 107

Generic Subject Areas for Corporate Architectures 108

Corporate Information Groupings or Functional Areas 110

Corporate Knowledge 114

Chapter 6 Business Organizations 117

Purpose and Mission of the Organization 117

Ideology, Mission, and Purpose 118

Design with the Future of the Organization in Mind 120

Generalize for Future Potential Directions 121

Organizational Structure 123

What Are the Basic Functions in an Organization? 124

Page 5: Data architecture : from Zen to reality - GBV architecture : from Zen to reality Subject: Burlington, Mass., Morgan Kaufmann, 2011 Keywords: Signatur des Originals (Print): T 11 B

X CONTENTS

The Information Needs of Management 124

Organizations Don't Know WhatThey Don't Know 125

Information Strategy for Modern Business 127

Maximizing the Value of Information 131

Forces in the Organization 136

Chapter 7 Productivity Inside the Data Organization 139

Information Technology 139

What Is Information Technology? 139

Trends in Information Technology 140

Vendor Software Development 141

The Other Option 141

Trends in Organizational Change 142

Productivity 143

Explanations for the Anomaly in Productivity.. 143

InformationTechnology and Its Impact on Organizations 147

Why Invest in InformationTechnology? 148

Ineffective Use of InformationTechnology 149

Other Impediments to Organizational Efficiency 150

Organizational Impediments to InformationTechnology 151

Technological Solutions for InformationTechnology 151

Human Resource Issues in InformationTechnology 152

Quality of the Workforce 153

Summary 153

Maximizing the Use of InformationTechnology 154

Chapter 8 Solutions That Cause Problems 157

Downsizing and Organizational Culture 157

Downsizing Defined 158

Culture Change 158

Organizational-Level Analysis 160

Organizational/lndividual-Level Analysis 161

Downswing's Impact on Culture 162

A Different Approach to Culture Change and Downsizing 163

Summary 163

Outsourcing 164

Rapid Application Development 168

Page 6: Data architecture : from Zen to reality - GBV architecture : from Zen to reality Subject: Burlington, Mass., Morgan Kaufmann, 2011 Keywords: Signatur des Originals (Print): T 11 B

CONTENTS XI

SECTION 3 THE PROCESS

Chapter 9 Data Organization Practices 175

Fundamentals of All Data Organization Practices 175

Corporate Data Architecture 175

Corporate Data Policy 176

ArchitectureTeam 176

DesignTeam 177

Develop the Project Structure 177

Scope Definition 177

Project Plan 178

Data Architecture and Strategic Requirements Planning 178

Data Gathering and Classification 178

Business Area Data Modeling 179

Current Data Inventory Analysis 179

Data and Function Integration 179

Event Identification 180

Procedure Definition via Functional Decomposition 180

Process Use Identification 181

New Function Creation 181

Utilization Analysis via Process Use Mapping 182

Access Path Mapping 182

Entity Cluster Development and Logical Residence Planning 183

Application DevelopmentTemplates 183

Quality Assurance Metrics 184

Maintenance Control Process 184

The Software Development Methods 184

Architectural Development Methods 185

Atomic Process Models 188

Entity Process Models 188

The Unified Method -

189

Chapter 10 Models and Model Repositories 191

What Are Models and How DidThey Come About? 191

Data Models Introduction I92

What Does Modeling Do for Us? -194

Process Models Introduction I94

Process Models—Why? 195

Page 7: Data architecture : from Zen to reality - GBV architecture : from Zen to reality Subject: Burlington, Mass., Morgan Kaufmann, 2011 Keywords: Signatur des Originals (Print): T 11 B

Xii CONTENTS

How Are Automated Models Developed? 195

How Are Models Retained? 196

Model Repository Policy and Approach 197

Shared Repository Objects 198

Model-Driven Releases 199

Supporting an Application Release 200

VersionType: Participation 201

Seamless Development Control Process 202

Test Environments, Releases, and Databases 203

Release Stacking 203

Emergency Corrections 204

Emergency Correction Procedures 204

PTF Implementation for Shared Batch and Online Objects 205

Chapter 11 Model Constructs and Model Types 207

Data Model Constructs 207

Application Audience and Services 207Entities 208Attributes 208

Relationships 209

Primary Identifiers 209

Entity Types 210

Entity Relationship Diagrams 210

Types of Relationships 211

ModelTypes 212

Physical-Level Design 216

Primary Keys 216Normalization 217

Denormalization 218

Overnormalization 218Domains 219Domain Constraints 219Reference Data 220Generic Domain Constraint Constructs 220

Chapter 12 Time as a Dimension of the Database 223

What Is to Be Done with Historical Data? 223

Application History 223

Page 8: Data architecture : from Zen to reality - GBV architecture : from Zen to reality Subject: Burlington, Mass., Morgan Kaufmann, 2011 Keywords: Signatur des Originals (Print): T 11 B

CONTENTS Xili

Classes and Characteristics 225

Current Occurrence 226

Simple History 226

Bounded Simple History 227

Complex History 227

Logically Modeling History 228

Physical Design of History 229

Physical Implementation of History 230

Performance Tuning 231

Finding Patterns 231

Tips andTechniques for Implementing History 232

Types of Systems 233

Physical Structure 235

Dimensional History 237

SECTION 4 THE PRODUCT

Chapter 13 Concepts of Clustering, Indexing, and Structures 241

Cluster Analysis 241

What Is a Cluster? 241

Cluster Properties 241

Cluster Theory Applied 242

Inserts 244

Updates 245

Deletes 245

Physical Structure 245

Key History and Development 246

Primary Keys 248

Foreign Keys 248

Foreign Key Propagation 249

Candidate Keys 249

Natural Keys 250

Engineered Keys 250

Surrogate Keys 250

High Water Keys 251

One of a Kind Keys 252

Other Specialized Keys 252

Page 9: Data architecture : from Zen to reality - GBV architecture : from Zen to reality Subject: Burlington, Mass., Morgan Kaufmann, 2011 Keywords: Signatur des Originals (Print): T 11 B

Xiv CONTENTS

Chapter 14 Basic Requirements for Physical Design 255

Requirements for Physical Design 255

How Much Data? 255

History 256

Population Quantification of Application Data 256

Concurrency 257

Security/Audit 258

Audit 260

Archive/Purge 260

Recovery/Restart 261

Sort/Search Requirements 262

Reorganization and Restructuring 262

Data Integrity 262

Referential Integrity 263

Data Access 264

Privacy Requirements 265

Chapter 15 Physical Database Considerations 267

Three-Level Architecture 267

Data Independence 270

Database Languages 271

Classification of Database Management Systems 272

Factors Impacting Physical Database Design 274

Analysis of Queries, Reporting, andTransactions 275

Queries, Reports, andTransactions 275

Interpreting the Functional Decomposition 276

Event Identification 276

Process Use Identification Reviewed 276

Utilization Analysis via Process Use Mapping 276

Time Constraints of Queries andTransactions 277

Analysis of Expected Frequency of Insert, Delete, Update 277

Other Physical Database Design Considerations 278

Population on the Database 279

Chapter 16 Interpreting Models 281

Physical Design Philosophy 281

Objectives 281

Page 10: Data architecture : from Zen to reality - GBV architecture : from Zen to reality Subject: Burlington, Mass., Morgan Kaufmann, 2011 Keywords: Signatur des Originals (Print): T 11 B

CONTENTS XV

The Entity Relationship Model 283

Interaction Analysis 283

The CRUD Matrix 285

Entity Life Cycle Analysis/Entity StateTransition Diagrams 285

Process Dependency Scope and Process Dependency Diagram 287

Event Analysis 288

Process Logic Diagrams 290

Interaction Analysis Summary -290

Changes to ER Models 290

ERD Denormalization 291

Actions on SuperType-Subtype Constructs 293

Actions on Multiple Relationships 294

Resolution of Circular References 295

Resolution of Duplicate Propagated Keys 296

Access-Level Denormalization 297

Movement of Attributes 297

Consolidation of Entities 297

Derived Attributes and Summary Data 298

Implement Repeating Groups 299

Introduce Redundancy 300

Introduce Surrogate or Synthetic Keys 301

Vertical or Horizontal Segmentation 301

Access Path Mapping 302

Conclusion 305

SECTION 5 SPECIALIZED DATABASES

Chapter 17 Data Warehouses 1 309

Early Analysis in this Area 309

Keen and Scott-Morton 310

Decision Discussion 311

Components of Decisions 311

Responsibility 312

Report Writers and Query Engines 314

Warehouses versus Reporting Databases 314

Higher Level of Abstraction 315

Based on Perceived Business Use 315

Structure Evolution 315

Page 11: Data architecture : from Zen to reality - GBV architecture : from Zen to reality Subject: Burlington, Mass., Morgan Kaufmann, 2011 Keywords: Signatur des Originals (Print): T 11 B

XVJ CONTENTS

Warehouse Components 316

Why Can't OLTP Data Stores Be Used? 316

DSS Requirements 317

Warehouse Characteristics 317

Warehouse Modeling 317

Warehouse Modeling Depends on Architectures 318

Enterprise-Level Data Architecture 319

Chapter 18 Data Warehouses II 321

Reprise 321

Background 321

The ManyTypes and Levels of Data 321

Data Modeling; Definitions 322

Logical to Physical Transformation 323

Entity Relational Models 324

Placement of Models 324

Dimensional Modeling: Definitions ,325

Denormalization and the Dimensional Model 325

Dimensional Model Evaluation 326

Data Evolution 326

WhatAre the Choices? 326

Applicability of the Dimensional and Relational and

Hybrid Models 327

Dimensional Architecture 329

Where Is the Relational Data Warehouse Best Suited? 330

Where Is the Dimensional Best Suited? 331

Hybrid ER-Dimensional 331

Problems Associated with the Hybrid Approach 333

Target Enterprise Architecture 333

Building an Enterprise Data Model 333

Current Data Inventory 335

Standard or Corporate Business Language 335

Conclusion of Hybrid Approach 336

Chapter 19 Dimensional Warehouses from Enterprise Models 337

Dimensional Databases from Enterprise Data Models 337

Warehouse Architecture 337

Page 12: Data architecture : from Zen to reality - GBV architecture : from Zen to reality Subject: Burlington, Mass., Morgan Kaufmann, 2011 Keywords: Signatur des Originals (Print): T 11 B

CONTENTS XVii

Dimensional Model Concepts 339

Review of Basic Components of Dimensional Models 340

Differences between Dimension and FactTables 341

Star Schemas 341

Star Schema Design Approach 342

Enterprise Data Warehouse Design 343

Structure Design 343

Categorize the Entities 344

Identify Dependency Chains 345

Produce Dimensional Models 346

Options for Dimensional Design 347The Flat Table Schema 348

The SteppedTable Schema 348

Simple Star Schemas 348Snowflake Schemas 352Star Schema Clusters 352Review of Design Options 355

Chapter 20 The Enterprise Data Warehouse 357

Enterprise Data Warehouses 357

WhyWouldYou Want an Enterprise Data Warehouse? 359

Enterprise Data Warehouse Defined 359

What Are the Important EDW Driving Forces? 360The Best Practices for EDW Implementation 362

Enterprise Data Architecture Implementation Methods 363

TheTop-Down Approach • 363The Bottom-Up Approach 364Your Choices 366

Preliminary Conclusion 366The Hybrid Approach 366

Implementation Summary 367

Chapter 21 Object and Object/Relational Databases 369

Object Oriented Data Architecture 369

Sample Object Oriented Design Concept: Wiring Money 370

Examples of Different Actions 371

Elements of Object Oriented Design: Overriding 373

Page 13: Data architecture : from Zen to reality - GBV architecture : from Zen to reality Subject: Burlington, Mass., Morgan Kaufmann, 2011 Keywords: Signatur des Originals (Print): T 11 B

xviii CONTENTS

Analogy and Problem Solving 373

Coping with Complexity -374

Interconnections:The Perpetrator of Complexity 374

Assembler Languages 374

Procedures and Functions 375

Modules 375

Parameter Passing 375

Abstract Data Types 375

Objects with Parameter Passing 376

Object Oriented Architectures Summary 377

Enhanced Entity Relationship Concepts 378

Subclasses and Superclasses 378

Attribute Inheritance 378

Specialization 379

Generalization 379

Generalization Hierarchies 379

Physical Data Design Considerations 380

Messaging 381

Object Identity 381

Type "Generators" andType Constructors 382

Summary 382

Chapter 22 Distributed Databases 385

Some Distributed Concepts ...385The Distributed Model 385How Does It Work? 386Distributed Data Design Concepts 387

Fragmentation 387

Replication 388

Homogeneous Distributed Model 388

Federated or Heterogeneous Distributed Model 389Distributed DBMSs 391

Reliability and Availability 392

Controlled Data Sharing 392

Performance 393Qualities Required in a DDBMS 393Other Factors 394An Overview of Client Server 394

Page 14: Data architecture : from Zen to reality - GBV architecture : from Zen to reality Subject: Burlington, Mass., Morgan Kaufmann, 2011 Keywords: Signatur des Originals (Print): T 11 B

CONTENTS Xix

Functionality within Client Server 395

ATypical DDBMS 396

DistributionTransparency 397

Types of DDBMSs 397

Individual Site Failure's Effect on Data Integrity ; 398

Individual Site Failure's Effect onTraffic Flow 399

Communication Failure 399

Distributed Commitment 399

Distributed Deadlocks 399

Summary 400

Index 401