Upload
pauline-mckenzie
View
212
Download
0
Embed Size (px)
Citation preview
Building a Data Warehouse...Bring in the SheavesBuilding a Data Warehouse...Bring in the Sheaves Building a Data Warehouse...Bring in the SheavesBuilding a Data Warehouse...Bring in the Sheaves
January 13, 2004January 13, 2004EDUCAUSE Mid-Atlantic ConferenceEDUCAUSE Mid-Atlantic Conference
Baltimore, MarylandBaltimore, Maryland
Ella SmithElla SmithU.S. Department of AgricultureU.S. Department of Agriculture
Alan HarmonAlan HarmonU.S. Naval AcademyU.S. Naval Academy
January 13, 2004January 13, 2004EDUCAUSE Mid-Atlantic ConferenceEDUCAUSE Mid-Atlantic Conference
Baltimore, MarylandBaltimore, Maryland
Ella SmithElla SmithU.S. Department of AgricultureU.S. Department of Agriculture
Alan HarmonAlan HarmonU.S. Naval AcademyU.S. Naval Academy
Copyright Ella Smith and Alan Harmon, 2004. This work is the intellectual property of the authors. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the authors. To disseminate otherwise or to republish requires written permission from the authors.
Copyright Ella Smith and Alan Harmon, 2004. This work is the intellectual property of the authors. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the authors. To disseminate otherwise or to republish requires written permission from the authors.
Agenda for this SessionAgenda for this SessionAgenda for this SessionAgenda for this Session
• DefinitionDefinition• Overview of the Initial Process (Proof-of-Concept)Overview of the Initial Process (Proof-of-Concept)• Organizational OwnershipOrganizational Ownership• Data Warehouse ArchitectureData Warehouse Architecture• Project Team CompositionProject Team Composition• The ProcessThe Process• Wrap UpWrap Up• QuestionsQuestions
• DefinitionDefinition• Overview of the Initial Process (Proof-of-Concept)Overview of the Initial Process (Proof-of-Concept)• Organizational OwnershipOrganizational Ownership• Data Warehouse ArchitectureData Warehouse Architecture• Project Team CompositionProject Team Composition• The ProcessThe Process• Wrap UpWrap Up• QuestionsQuestions
What is a Data Warehouse?What is a Data Warehouse?
DefinitionDefinition:: a repository of data derived from a repository of data derived from operational systems or external source; NOT an operational systems or external source; NOT an archivearchive
PurposePurpose: collect and report data in a consistent, : collect and report data in a consistent, centralized manner; mechanism for conducting centralized manner; mechanism for conducting longitudinal analysislongitudinal analysis
StrategyStrategy: Target key applications (Admissions, : Target key applications (Admissions, Registrar, Frozen Files), clean data and load. Registrar, Frozen Files), clean data and load.
Benefits of a Data WarehouseBenefits of a Data Warehouse
Cost SavingsCost Savings by reducing the amount of by reducing the amount of manual time and effort required to compile, manual time and effort required to compile, organize, and report the data.organize, and report the data.
Data ConsistencyData Consistency among the different areas among the different areas since the data will be synchronized upon since the data will be synchronized upon entry into the data warehouse.entry into the data warehouse.
AccessAccess to the information will be faster since to the information will be faster since the process will be automated and available the process will be automated and available online (versus paper reports).online (versus paper reports).
Loading and Cleaning DataLoading and Cleaning DataOpportunity to Integrate, Correct, and Validate DataOpportunity to Integrate, Correct, and Validate Data
DataDataWarehouseWarehouse
DataDataWarehouseWarehouse
Data Extraction and CleaningData Extraction and Cleaning(can be very complex)(can be very complex)
Integrate multiple data sourcesIntegrate multiple data sources Correct data problems (cleanse)Correct data problems (cleanse) Validate DataValidate Data Summarize and roll-up dataSummarize and roll-up data Update MetadataUpdate Metadata
Flat FileFlat FileDataData
SourcesSources
DataDatain Databasesin Databases
DataDatain Databasesin Databases
Live Data SourcesLive Data Sources
ApplicationsApplicationsSAPSAP
PeopleSoftPeopleSoftOracle AppsOracle Apps
ApplicationsApplicationsSAPSAP
PeopleSoftPeopleSoftOracle AppsOracle Apps
Online Analytical ProcessingOnline Analytical ProcessingFast and Selective Access to Summarized DataFast and Selective Access to Summarized Data
REGISTRAR View
FINANCIAL View Ad Hoc View
PROD
MARKET
TIME
ADMISSIONS View
MAJO
RS
CL
AS
S Y
EA
R
TIME
STUDENTSSTUDENTS
DW Development StrategyDW Development Strategy Think & Plan Big Think & Plan Big
– Build In Small Steps; Don’t build a BARN! Not an Build In Small Steps; Don’t build a BARN! Not an archive systemarchive system
Identify your audienceIdentify your audience Use DW to address new areas, add new capabilities, Use DW to address new areas, add new capabilities,
and fix existing problemsand fix existing problems Retain existing transactional systemsRetain existing transactional systems Iterative development approachIterative development approach
– Address key needsAddress key needs– Rapidly deliver capability to usersRapidly deliver capability to users– Lower riskLower risk
Strategy (continued)Strategy (continued)
Evolve system in manageable phasesEvolve system in manageable phases– Identify questions you need to answer ORIdentify questions you need to answer OR– Look at data to determine questions you can Look at data to determine questions you can
answeranswer StrategyStrategy
– Develop an overall planDevelop an overall plan– Develop common Develop common metadatametadata standards standards– Implement needed pieces mindful of integration Implement needed pieces mindful of integration
and expansion and expansion
Initial Considerations Initial Considerations VisionVision Proof-of-Concept / Phased ApproachProof-of-Concept / Phased Approach BenefitsBenefits StrategyStrategy TimelineTimeline CostCost IssuesIssues
– DataData– Political constraintsPolitical constraints– Organizational FactorsOrganizational Factors
11stst Step: Proof-of-Concept Step: Proof-of-Concept
Develop a Stand Alone Proof-of-ConceptDevelop a Stand Alone Proof-of-Concept Develop model to demonstrate use of new tools Develop model to demonstrate use of new tools
to end users.to end users. Provide benchmarks for future planning.Provide benchmarks for future planning. Low cost way to “test the waters”Low cost way to “test the waters” Exposes YOUR data and ability to deal with itExposes YOUR data and ability to deal with it
Define number of tasks and deliverables.Define number of tasks and deliverables.
Proof-of-Concept TimelineProof-of-Concept Timeline
6-8 Weeks for each increment6-8 Weeks for each increment– Requirements:Requirements: gather and document gather and document– DataData: identify source, construct model, extract : identify source, construct model, extract
data, cleanse data, transport data to databasedata, cleanse data, transport data to database– Data AccessData Access: user interface, security, training, : user interface, security, training,
documentationdocumentation
Proof-of-Concept TimelineProof-of-Concept Timeline
# TaskWeek
1Week
2Week
3Week
4Week
5Week
61 Project Planning/Management2 Development Environment3 Software Installation4 Data Inventory & Quality Assessment5 Database Design6 Application Design7 Data Scrubbing & Loading8 Application Development9 Testing
11 Documentation12 Rollout Completed Project
ADMISSIONADMISSION DATADATA
SSNCLASS YEAR
DEMOGRAPHICDEMOGRAPHIC DATADATA
SSNCLASS YEARETHNICITYGENDERHIGH SCHOOLREGION
STUDENT_FACTSTUDENT_FACT
SATVHISATMHIH.S. RANKH.S. CLASS SIZE
TIMETIME
#ACYEARCLASS YEAR
REGISTRARREGISTRAR DATADATA
SSNCLASS YEARGPAMAJOR
Proof-of-Concept
Logical Data ModelLogical Data Model
SSNSSN
SSNSSN
SSNSSN
ACYRACYR
ADMISSIONADMISSION DATADATA
#ADMISSION_SSNSCORE_CLASS
DEMOGRAPHICDEMOGRAPHIC DATADATA
#DEMO_SSNDEMO_CLASSETHNICITYGENDERHIGH SCHOOLREGION
STUDENT_FACTSTUDENT_FACT
#ADMISSION_SSN#DEMO_SSN#REGISTRAR_SSN#ACYEARSATVHISATMHIH.S. RANKH.S. CLASS SIZE
TIMETIME
#ACYEARCLASS_YEAR
REGISTRARREGISTRAR DATADATA
#REGISTRAR_SSNCLASSGPAMAJOR
Proof-of-Concept
Physical Data ModelPhysical Data Model
SSNSSN
SSNSSN
SSNSSN
ACYRACYR
Post-PoC: DW ArchitecturePost-PoC: DW Architecture
Many types of architectureMany types of architecture– Star schema, Snowflake, HybridStar schema, Snowflake, Hybrid
Depends on:Depends on:– Types of queriesTypes of queries
– Size of databaseSize of database
– Capability of hardware and softwareCapability of hardware and software
Basic Components:Basic Components:– Logical ModelLogical Model
– Physical ModelPhysical Model
Physical Data Warehouse TopologyPhysical Data Warehouse Topology
Admissions
Academic Affairs
Dean of Students
Finance Office
HR
President
Instit Research
WebServerWebServerDatabaseDatabase ServerServerGeneral Public
for remoteconnectivity
Remote Laptop1 Remote Laptop2
Public Affairs
MetaDataMetaDataDefinition:Definition: Information about your dataInformation about your data
Centralized description of business rulesCentralized description of business rules– Describes data and transformations within DWDescribes data and transformations within DW– Captures changes in business rules over time to Captures changes in business rules over time to
provide a level playing field for comparing dataprovide a level playing field for comparing data Audit trail for data authenticationAudit trail for data authentication Bottom lineBottom line
– Increased trust in DW-based analysis Increased trust in DW-based analysis
resultsresults
Project Team CompositionProject Team Composition
Types of PersonnelTypes of Personnel and and Level of SkillLevel of Skill– Analysis & Design (HIGH)Analysis & Design (HIGH)– Implementation (MED)Implementation (MED)– Test & Quality Assurance (LOW)Test & Quality Assurance (LOW)
Skill = $$$Skill = $$$
Vary Skill by Task to control costVary Skill by Task to control cost
The Project Model The Project Model “Roles and Responsibilities”“Roles and Responsibilities”
Steering CommitteeSteering CommitteeSteering CommitteeSteering Committee
Project ManagerProject ManagerProject ManagerProject Manager
Quality AssuranceQuality AssuranceQuality AssuranceQuality Assurance
PrgmrPrgmrPrgmrPrgmrModelerModelerModelerModeler DBADBADBADBA Tool PrgmrsTool PrgmrsTool PrgmrsTool Prgmrs EndUserEndUserLiaisonLiaison
EndUserEndUserLiaisonLiaison DocumentationDocumentationDocumentationDocumentation
Planning, Reporting, CertificationPlanning, Reporting, Certification
Joint Client and ConsultantJoint Client and Consultant
Test and Map to RequirementsTest and Map to Requirements
The Project Model The Project Model “Roles and Responsibilities”“Roles and Responsibilities”
PrgmrPrgmrPrgmrPrgmrModelerModelerModelerModeler DBADBADBADBA Tool PrgmrsTool PrgmrsTool PrgmrsTool Prgmrs EndUserEndUserLiaisonLiaison
EndUserEndUserLiaisonLiaison DocumentationDocumentationDocumentationDocumentation
ScopingScopingScopingScoping InfrastructureInfrastructure InfrastructureInfrastructureScopingScoping
ModelingModeling CleaningCleaning Capacity PlanningCapacity Planning PrototypingPrototyping ModelingModeling
ETLETL ImplementationImplementation ImplementationImplementation BuildingBuilding BuildingBuilding
QAQA QAQA QAQA QAQA QA / TrainingQA / Training
ScopingScoping
ModelingModeling
DocumentationDocumentation
TrainingTraining
Analysis PhaseAnalysis Phase
Architecture PhaseArchitecture Phase
Implementation PhaseImplementation Phase
Transition PhaseTransition Phase
The Harvest!The Harvest!
Review requirements and results periodicallyReview requirements and results periodically– At end of each phaseAt end of each phase– Annually, taken as a wholeAnnually, taken as a whole
Optimize data warehouseOptimize data warehouse– Response based on queries and loadResponse based on queries and load– Bring in-line with operational systemsBring in-line with operational systems
Review and AdjustReview and Adjust the DW mission as the DW mission as institutional mandates changeinstitutional mandates change
Cost ControlCost Control
Start small and develop in phasesStart small and develop in phases Bring in skill sets as needed Bring in skill sets as needed
remember: $$$ = (Skills) x (period of time) remember: $$$ = (Skills) x (period of time) Institutional staff should know the dataInstitutional staff should know the data Organizational issues need to be resolved by the Organizational issues need to be resolved by the
Project Manager and Steering CommitteeProject Manager and Steering Committee
AccountabilityAccountability
MUST show results MUST show results (standard or adhoc reports)(standard or adhoc reports) Ensure complete documentation Ensure complete documentation to maintain to maintain
responsibility and association of data to responsibility and association of data to departmentsdepartments
Establish a Return-on-Investment (ROI)Establish a Return-on-Investment (ROI) whether whether tangible (number of reports) or intangible tangible (number of reports) or intangible (executive support/decision making)(executive support/decision making)
IssuesIssues
SecuritySecurity PerformancePerformance Managing the metadataManaging the metadata Managing the data warehouseManaging the data warehouse Hardware/software configurationHardware/software configuration ResourcesResources Staying in the loop!Staying in the loop!
Building a Data WarehouseBuilding a Data Warehouse