Upload
himanshu-agarwal
View
215
Download
0
Embed Size (px)
Citation preview
7/23/2019 DW 2.0 book
http://slidepdf.com/reader/full/dw-20-book 1/10
DW 2 0
T h e r c h i t e c t u r e f o r t h e N e x t G e n e r a t io n o
D a ta W a r e h o u s in g
W H Inmon
Forest R im Technology
Derek Strauss
Gavroshe
Genia Neushloss
Gavroshe
AMSTERDAM • BOSTON • HEIDELBERG • LONDON
NEW YORK • O XFORD • PARIS • SAN DIEG O
SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
К
Mo rgan Ka ufman n Pub l i she rs is an imp r in t o f E l sev ie r.
M O R G N K U F M N N P U B L I S H E R
7/23/2019 DW 2.0 book
http://slidepdf.com/reader/full/dw-20-book 2/10
o n t e n t s
Preface xvii
Acknowledgments xx
Abou t the Authors xxi
CHAPTER 1 A brief history of data wareho using and f irst gen erat ion
data warehouses
1
Da tabase managem ent systems 1
On line applications 2
Personal com puters and 4GL technology 3
The spider web environm ent 4
Evolution from the business perspective 5
The data warehouse environm ent 6
Wh at is a data warehouse? 7
Integrating data—a painful experience 7
Volumes of data 8
A different deve lopm ent appro ach 8
Evolution to the DW 2.0 environm ent 9
The business impa ct of the data wareh ouse 11
Various com ponen ts of the data warehouse environm ent 11
ETL—extract/transform/load 12
OD S— operational data store 13
Data mart 13
Exploration ware house 13
The evolution of data ware housing from the business perspective 14
Other notions about a data warehouse 14
The active data ware house 15
The federated data ware house approa ch 16
The star schema approa ch 18
The data ma rt data warehouse 20
Building a real data wareh ouse 21
Summary 22
CHAPTER 2 An introd uct ion to DW 2.0
2 3
DW 2.0—a new paradigm 24
DW 2.0—from the business perspective 24
The life cycle of data 27
Reasons for the different sectors 30
Metadata 31
Access of data 33
Structured data/u nstructu red data 34
7/23/2019 DW 2.0 book
http://slidepdf.com/reader/full/dw-20-book 3/10
v ii i Contents
Textual analytics
Blather
The issue of termino logy
Specific text/genera l text
Metadata—a major com pon ent
Local me tadata
A foundation of technology
Changing business requirements
The flow of data with in DW 2.0
Volumes of data
Useful app lication s
DW 2.0 an d referential integrity
Reporting in DW 2.0
Summary
CHAPTER 3 DW 2 0 com ponents— about the dif ferent sectors
The Interactive Sector
The Integrated Sector
The Near Line Sector
The Archival Sector
Unstructured processing
From the business perspective
Summary
CHAPTER 4 M etadata in DW 2 0
Reusability of data an d analysis
Me tadata in DW 2.0
Active repository/p assive repos itory
The active repos itory 1
Enterprise m etad ata 1
Metadata and the system of record 1
Taxonomy 1
Internal taxonom ies/external taxonom ies 1
M etadata in the Archival Sector 1
M aintaining me tadata 1
Using metadata— an example 1
From the end-u ser perspective 1
Summary 1
CHAPTER 5 Fluid ity of the
DW
2 0 technology infrastructure
The techn ology infrastructure 1
Rapid busin ess chang es 1
7/23/2019 DW 2.0 book
http://slidepdf.com/reader/full/dw-20-book 4/10
ont nt
The treadm ill of change 114
Getting off the tread m ill 115
Reducing the length of time for IT to respon d 115
Sem antically tem pora l, sema ntically static data 115
Sema ntically tem pora l data 116
Sema ntically stable data 117
Mixing sema ntically stable and unsta ble data 118
Separating sema ntically stable and unsta ble data 118
Mitigating business change 119
Creating snapsho ts of data 120
A historical record 120
Dividing data 121
From the end-use r perspective 121
Summary 122
CHAPTER
6
M e t h o d o l o g y
a n d
a p p roa c h
f o r D W 2 . 0 1 2 3
Spiral me thodology — a sum ma ry of key features 124
The seven streams appro ach— an overview 129
Enterprise reference mo del stream 129
Enterprise knowledge coo rdinatio n stream 129
Inform ation factory dev elopm ent stream 133
Data profiling and m app ing stream 133
Data correction stream 133
Infrastructure stream 133
Total information quality ma nagem ent stream 134
Summary 137
CHAPTER
7
Stat ist ical processing
an d D W 2 . 0 1 4 1
Two types of transactions 141
Using statistical analysis 143
The integrity of the com pariso n 144
He uristic analysis 145
Freezing da ta 146
Exploration processing 146
The frequency of analysis 147
The exp loratio n facility 147
The sources for explora tion processing 149
Refreshing exp loratio n data 149
Project-based data 150
Data ma rts and the exploration facility 152
Ab ackflow of data 152
Using exploration data internally 155
7/23/2019 DW 2.0 book
http://slidepdf.com/reader/full/dw-20-book 5/10
x Contents
From the perspective of the business analyst 1
Summary 1
CHAPTER 8 Data m odels and DW 2.0
1
An intellectual road m ap 1
The data mo del and business 1
The scope of integration 1
Making the distinction between granular and sum marized data 1
Levels of the data mo del 1
Data m ode ls and the Interactive Sector 1
The corporate data mo del 1
A transformation of mod els 1
Data mo dels and unstructured data 1
From the perspective of the busines s user 1
Summary 1
CHAPTER 9 Mo nitoring the DW 2.0 env ironm ent
1
Mo nitoring the DW 2.0 environm ent 1
The transaction m onitor 1
M onitoring data quality 1
A data warehouse m onitor 1
The transaction mon itor—response time 1
Peak-period processing 1
The ETL data quality m on itor 1
The data warehouse m onitor 1
Do rm ant data 1
From the perspective of the business user 1
Summary 1
CHAPTER
1 0
DW
2.0
and se curity
Protecting access to data 1
Encryption 1
Drawbacks 1
The firewall 1
M oving data offline 1
Limiting encry ption 1
A direct du m p 1
The data warehouse m onitor 1
Sensing an attack 1
Security for nea r line data 1
From the perspective of the business user 1
Summary 1
7/23/2019 DW 2.0 book
http://slidepdf.com/reader/full/dw-20-book 6/10
ont nt
CHAPTER
1 1
T im e v a r ia n t da t a 1 9 1
All data in DW 2.0—relative to tim e 191
Tim e relativity in the Interactive Sector 192
Data relativity elsewhere in DW 2.0 192
Transactions in the Integrated Sector 193
Discrete data 194
Co ntinuo us time span data 194
A sequen ce of records 196
Nonov erlapping records 197
Beginning and ending a sequence of records 197
Con tinuity of data 198
Time-collapsed data 198
Time variance in the Archival Sector 199
From the perspective of the end user 200
Summary 200
CHAPTER 12 T h e low o f da t a in DW 2 .0
2 3
The flow of data throu gho ut the architecture 203
Enterin g the Interactive Sector 203
The role of
ETL
205
Data flow into the Integrated Sector 205
Data flow into the Near Line Sector 207
Data flow into the Archival Sector 209
The falling prob ability of data access 209
Exception-based flow of data 210
From the perspective of the business user 213
Summary 214
CHAPTER 13
ETL
processing an d DW 2.0 215
Ch anging states of data 215
W here ETL fits 215
From application data to corporate data 216
ETL in on line m od e 216
ETL in batch m od e 217
Source and target 218
An ETL m app ing 219
Changing states—an example 219
More complex transformations 221
ETL and throu ghp ut 222
ETL and meta data 22 3
ETL and an audit trail 223
7/23/2019 DW 2.0 book
http://slidepdf.com/reader/full/dw-20-book 7/10
ETL and data quality 2
Crea ting ETL 2
Cod e creation or parame trically driven ETL 2
ETL an d rejects 2
Changed data capture 2
ELT 2
From the perspective of the busine ss user 2
Summary 2
CHAPTER 14 DW 2 .0 and th e granular i ty m anag er 2
The granularity ma nage r 2
Raising the level of gran ularity 2
Filtering data 2
The functions of the granularity man ager 2
Hom e-grown versus third-party granularity ma nagers 2
Parallelizing the granularity man ager 2
Metadata as a by-product 2
From the perspective of the business user 2
Summary 2
CHAPTER 15 DW 2.0 and perform ance 2
Goo d performance— a cornerstone for DW 2.0 2
Online response time 2
Analytical response time 2
The flow of data 2
Queues 2
Heuristic processing 2
Analytical productivity and response time 2
Many facets to performan ce 2
Indexing 2
Removing dorm ant data 2
End-user educ ation 2
Monitoring the environm ent 2
Capacity planning 2
Metadata 2
Batch parallelization 2
Parallelization for transaction processing 2
Workload ma nagem ent 2
Data ma rts 2
Exp loration facilities 2
Separation of transactions into classes 2
Service level agreem ents 2
7/23/2019 DW 2.0 book
http://slidepdf.com/reader/full/dw-20-book 8/10
onten
Protecting the Interactive Sector 25 4
Partitioning data 255
Choo sing the proper hardware 255
Separating farmers and explorers 256
Physically grou p data together 257
Check automatically generated code 257
From the perspective of the business user 258
Summary 259
CHAPTER 16 M igration 26
House s and cities 261
Migration in a perfect world 262
The perfect world almo st never hap pen s 262
Adding com pone nts incrementally 262
Adding the Archival Sector 264
Creating enterprise me tadata 265
Building the me tadata infrastructure 266
Swallowing source systems 266
ETL as a shock absorber 267
Migration to the unstructured environm ent 267
From the perspective of the business user 269
Summary 270
CHAPTER 17 Cost justific ation and DW 2 0 27
Is DW 2.0 wo rth it? 271
Macro-level justification 271
A micro-level cost justification 27 2
Company В has DW 2.0 273
Creating new analysis 273
Executing the steps 274
So ho w m uch does all of this cost? 276
Consider company В 276
Factoring the cost of DW 2.0 277
Reality of inform ation 278
The real econo mics of DW 2.0 279
The time value of information 279
The value of integration 280
Historical inform ation 280
First-generation DW and DW 2.0— the econo mics 281
From the perspective of the busines s user 282
Summary 282
7/23/2019 DW 2.0 book
http://slidepdf.com/reader/full/dw-20-book 9/10
i v Contents
CHAPTER 18 Data quality in DW 2 0 2
The DW 2.0 data quality tool set 2
Data profiling too ls and the reverse-engineered data mo del 2
Data mo del types 2
Data profiling inconsistencies challenge top-dow n mo deling 2
Summary 2
CHAPTER 19 DW 2 0 and unstructured data 29
DW 2.0 and unstructured data 2
Reading text 2
W here to do textual analytical processing 3
Integrating text 3
Simple editing 3
Stop words 3
Synonym replacement 3
Synonym concatenation 3
Hom ograph ic resolution 3
Creating them es 3
External glossaries/taxon om ies 3
Stemming 3
Alternate spellings 3
Text across languag es 3
Direct searches 3
Indirect searches 3
Terminology 3
Sem istructure d data/VALUE = NAME data 3
The technology needed to prepare the data 3
The relational data base 3
Structured/unstructured linkage 3
From the perspective of the business user 3
Summary 3
CHAPTER 20 DW 2 0 and the system of record 3
Oth er systems of record 3
From the perspective of the bus iness user 3
Summary 3
CH APT ER21 M iscellaneous topics 3
Data marts 3
The convenience of a data mart 3
Transforming data mart data 3
7/23/2019 DW 2.0 book
http://slidepdf.com/reader/full/dw-20-book 10/10
Mo nitoring DW 2.0 326
Moving data from one data mart to anothe r 327
Bad data 329
A balancing entry 330
Resetting a value 330
Making corrections 330
The speed of mo veme nt of data 331
Data ware hou se utilities 332
Summary 337
CHAPTER 22 Processing in th e DW 2.0 env ironm ent 3 3 9
Summary 345
CHAPTER 23 Adm in ister ing the DW 2.0 env i ronm ent
347
The data mod el 347
Architectural adm inistra tion 348
Defining the m om en t wh en an Archival Sector will be need ed 348
Dete rmin ing wh ether the Near Line Sector is needed 349
Metadata adm inistration 351
Datab ase administrat ion 352
Stewardship 353
Systems and technology adm inistration 355
Man agemen t adm inistration of the DW 2.0 environ me nt 358
Prioritization and prioritization conflicts 358
Budget 358
Scheduling and determ ination of milestones 359
Allocation of resources 359
Managing consultants 359
Summary 361
Index 363