View
228
Download
1
Tags:
Embed Size (px)
Citation preview
Mainframe and non-mainframe data sources
Extraction, Transformation and Loading Tool - for data quality and integrity
Using flat-files or a staging area
Relational Database on a dedicated Server:
Normalised, Snowflake or Star data schema
Static/Flexible Reports:
Batch and on-line reporting environment
Data Mining:
Using an inductive logic engine to identify correlations and hidden relationships
Analytical Tool:
Interactive analysis -OLAP tool
User Interface Deployment:
Web deployment, Power-user client interfaces
Need to provide rich presentation visualisation, user interaction and supplementary documentation
Scrutinising MultidimensionalSummary Cubes
Surveying Transaction-level
Data
Showing Reportsand
Analytical Results
Data Warehouse Store
Source Systems
Discovering Surprise
Relationships
Data Staging for Cleansing
Basic principles
Life cycle of the DW
Operational DatabasesOperational Databases Warehouse DatabaseWarehouse Database
First time loadFirst time load
RefreshRefresh
RefreshRefresh
RefreshRefresh
Purge or ArchivePurge or Archive
Oracle Warehouse Components
RelationalRelationaltoolstools
Applications/ WebApplications/ Web
Any DataAny Data Any AccessAny AccessAny SourceAny Source
ExternalExternaldatadata
OperationalOperationaldatadata
OLAPOLAPtoolstools
Text, image
Oracle Medi`
Relational /Multidimensional
Spatial
Audio,videoWeb
Oracle Intelligence Tools
IS developsIS developsuser’s Viewsuser’s Views
Oracle ReportsOracle Reports
CurrentCurrent
Business usersBusiness users
Oracle DiscovererOracle Discoverer
TacticalTactical
AnalystsAnalysts
Oracle ExpressOracle Express
StrategicStrategic
Oracle Data Mart Suite
Ware-housingEngines
Data ModelingOracle Data Mart Designer
Data Management
Oracle Enterprise Manager
Data Extraction
Oracle Data Mart Builder
Data Access & AnalysisDiscoverer &
Oracle Reports
OLTP
Engines
OLTPDatabases
Data MartDatabase
Oracle8
SQL*PLUS
“Big Bang” Approach:Advantages and Disadvantages• Advantages:
– warehouse built as part of major project (eg: BPR)
– Having a “big picture” of the data warehouse before starting the data warehousing project
• Disadvantages:– Involves a high risk, takes a longer time– Runs the risk of needing to change
requirements– Costly and harder to get support for from
users
Incremental Approach to Warehouse Development
• Multiple iterations
• Shorter implementations
• Validation of each phase
Strategy
Definition
Analysis
Design
Build
Production
Benefits of an Incremental Approach
• Delivers a strategic data warehouse solution through incremental development efforts
• Provides extensible, scalable architecture • Quickly provides business benefits and
ensures a much earlier return of investment
• Allows a data warehouse to be built based on a subject or application area at a time
• Allows the construction of an integrated data mart environment
Data Mart • A subset of a data warehouse that
supports the requirements of a particular department or business function.
• Characteristics include:– Do not normally contain detailed operational data
unlike data warehouses.– May contain certain levels of aggregation
MarketingSales
FinanceHuman Resources
Dependent Data Mart
Data Data WarehouseWarehouse
Data MartsData Marts
External DataExternal Data
Flat FilesFlat Files
Operational Systems Marketing
Sales
Finance
Independent Data Mart
Sales or MarketingSales or Marketing
External DataExternal Data
Flat FilesFlat FilesOperational Systems
Reasons for Creating a Data Mart
• To give users more flexible access to the data they need to analyse most often.
• To provide data in a form that matches the collective view of a group of users
• To improve end-user response time.
• Potential users of a data mart are clearly defined and can be targeted for support
Reasons for Creating a Data Mart
• To provide appropriately structured data as dictated by the requirements of the end-user access tools.
• Building a data mart is simpler compared with establishing a corporate data warehouse.
• The cost of implementing data marts is far less than that required to establish a data warehouse.
Data Marts Issues• Data mart functionality• Data mart size• Data mart load performance• Users access to data in multiple data
marts• Data mart Internet / Intranet access• Data mart administration• Data mart installation
Example of DW tool OLAP
• Rotate and drill down to successive levels of detail.
• Create and examine calculated data interactively on large volumes of data.
• Determine comparative or relative differences.
• Perform exception and trend analysis.
• Perform advanced analytical functions for example forecasting, modeling, and regression analysis
Original OLAP Rules
1. Multidimensional conceptual view
2. Transparency
3. Accessibility
4. Consistent reporting performance
5. Client-server architecture
Original OLAP Rules
6. Multiuser support7. Unrestricted cross-dimensional
operations8. Intuitive data manipulation9. Flexible reporting10. Unlimited dimensions and
aggregation levels
1001
1007
1010
1020
Relational Database Model
31
42
22
32
F
M
M
F
Anderson
Green
Lee
Ramos
Attribute 1Name
Attribute 2Age
Attribute 3Gender
Row 1
Row 2
Row 3
Row 4
The table above illustrates the employee relation.
Attribute 4Emp No.
Multidimensional Database Model
The data is found at the intersection of dimensions.
StoreStore
GL_LineGL_Line
TimeTime
FINANCE
StoreStore
ProductProduct
TimeTime
SALES
CustomerCustomer
Specialised Multidimensional tool• Benefits:
– Quick access to very large volumes of data– Extensive and comprehensive libraries of
complex functions• analysis• Strong modeling and forecasting capabilities
– Can access multidimensional and relational database structures
– Caters for calculated fields
• Disadvantages:– Difficulty of changing model – Lack of support for very large volumes of data– May require significant processing power
MOLAP Server• The application layer
stores data in a multidimensional structure
• The presentation layer provides the multidimensional view MOLAP
Engine
DSS clientDSS client
Application Application layerlayer
WarehouseWarehouse
• Efficient storage and processing• Complexity hidden from the
user• Analysis using preaggregated
summaries and precalculated measures
ROLAP Server
• The warehouse stores atomic data.
• The application layer generates SQL for the three- dimensional view.
• The presentation layer provides the multidimensional view.
ROLAPengine
DSS clientDSS client
Application Application layerlayer
WarehouseWarehouseserverserver
Multiple Multiple SQLSQL
MOLAPMOLAP
ExpressExpressServerServer
ExpressExpressuseruser
WarehouseWarehouse
QueryQuery
DataData
MDDBMDDB
PeriodicPeriodicloadload
ROLAP
ExpressExpressServerServer
ExpressExpressuseruserWarehouseWarehouse
DataDatacachecache
LiveLivefetchfetch
CacheCache
QueryQuery
DataData
Also Hybrid (HOLAP)
Choosing a Reporting Architecture
• Business needs
• Potential for growth
• interface
• enterprise architecture
• Network architecture
• Speed of access
• Openness
MOLAP
ROLAP
SimpleSimple ComplexComplex
QueryQueryPerformancePerformance
GoodGood
OKOK
AnalysisAnalysis
Data Acquisition
• Identify, extract, transform, and transport source data
• Consider internal and external data
• Perform gap analysis between source data and target database objects
• Plan move of data between sources and target
• Define first-time load and refresh strategy
• Define tool requirements
• Build, test, and execute data acquisition modules
Modeling• Warehouses differ from operational Warehouses differ from operational
structures: structures: – Analytical requirementsAnalytical requirements– Subject orientationSubject orientation
• Data must map to subject oriented Data must map to subject oriented information:information:– Identify business subjectsIdentify business subjects– Define relationships between subjectsDefine relationships between subjects– Name the attributes of each subjectName the attributes of each subject
• Modeling is iterativeModeling is iterative• Modeling tools are availableModeling tools are available
1. Defining the business model
2. Creating the dimensional model
3. Modeling summaries
4. Creating the physical model
Physical model
11
2, 32, 3
44
Select a business process
Modeling the Data Warehouse
Identifying Business Rules
Product
Type Monitor Status
PC 15 inch NewServer 17 inch Rebuilt
19 inch CustomNone
Location
Geographic proximity
0 - 1 miles1 - 5 miles > 5 miles
Store
Store > District > Region
Time
Month > Quarter > Year
Creating the Dimensional ModelIdentify fact tables– Translate business measures into fact
tables– Analyze source system information for
additional measures– Identify base and derived measures– Document additivity of measures
Identify dimension tablesLink fact tables to the dimension tablesCreate views for users
Dimension TablesDimension tables have the following
characteristics:
• Contain textual information that represents the attributes of the business
• Contain relatively static data
• Are joined to a fact table through a foreign key reference Product Channel
Facts(units,price)
Customer Time
Fact TablesFact tables have the following characteristics:• Contain numeric measures (metrics) of the
business• May contain summarized (aggregated) data• May contain date-stamped data• Are typically additive• Have key value that is typically a
concatenated key composed of the primary keys of the dimensions
• Joined to dimension tables through foreign keys that reference primary keys in the dimension tables
Dimensional Model (Star Schema)
Dimensional Model (Star Schema)
Product Channel
Facts(units,price)
Customer Time
Dimension tablesDimension tables
Fact tableFact table
Star Schema Model
• Central fact table
• Radiating dimensions
• Denormalized model
Store TableStore_idDistrict_id...
Item TableItem_idItem_desc...
Time TableDay_idMonth_idPeriod_idYear_id
Product TableProduct_idProduct_desc…
Sales Fact TableProduct_idStore_idItem_idDay_idSales_dollarsSales_units...
Star Schema Model
• Easy for users to understand
• Fast response to queries
• Simple metadata
• Supported by many front end tools
• Less robust to change
• Slower to build
• Does not support history
Snowflake Schema Model
Time TableWeek_idPeriod_idYear_id
Dept TableDept_id
Dept_descMgr_id
Mgr TableDept_idMgr_id
Mgr_name
Product TableProduct_id
Product_desc
Item TableItem_id
Item_descDept_id
Sales Fact TableItem_idStore_id
Sales_dollarsSales_units
Store TableStore_id
Store_descDistrict_id
District TableDistrict_id
District_desc
Snowflake Schema Model
• Direct use by some tools
• More flexible to change
• Provides for speedier data loading
• May become large and unmanageable
• Degrades query performance
• More complex metadata
Using Summary Data
• Provides fast access to precomputed data
• Reduces use of I/O, CPU, and memory
• Is distilled from source systems and precalculated summaries
• Usually exists in summary fact tables
Phase 3: Modeling summaries
Designing Summary Tables
Units Sales(€) Store
Product ATotal
Product BTotal
Product CTotal
• Average• Maximum
• Total• Percentage
Summary Tables Example
SALES FACTSSales Region Month10,000 North Jan 9912,000 South Feb 9911,000 North Jan 9915,000 West Mar 9918,000 South Feb 9920,000 North Jan 9910,000 East Jan 992,000 West Mar 99
SALES BY MONTH/REGIONMonth Region Tot_Sales$Jan 99 North 41,000Jan 99 East 10,000Feb 99 South 40,000Mar 99 West 17,000
SALES BY MONTHMonth Tot_SalesJan 99 51,000Feb 99 40,000Mar 99 17,000
Summary Management in Oracle8i
ProductProduct
RegionRegion
TimeTime
SalesSalessummarysummary
CityCity
SalesSales
StateState
Summary Summary usageusage
Summary advisor
Space Space requirementsrequirements Summary Summary
recommendationsrecommendations