Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
Remember Your Map!
Creating an Enterprise Data Warehouse
Alison TorresAlison TorresTeradata Certified MasterTeradata Certified Master
2
In the Beginning: FormulatingBusiness Rules
•• The ObstaclesThe Obstacles•• The Promise (Data Warehousing)The Promise (Data Warehousing)
– What Exactly is Data Warehousing?• A Place• A Process• A Methodology
– How Data Warehousing overcomes those obstacles
•• The MapThe Map– Planning– Design & Implementation– Usage, Support, and Enhancements
•• The FutureThe Future– Active Data Warehousing– CRM
•• The CThe Concluonclusionsion– Delivering on the Promise
• Success Stories
D
2
3
Key Business Drivers
Flexibility toFlexibility todevelop newdevelop new
revenuerevenuestreamsstreams
ImproveImproveBusinessBusiness
EfficiencyEfficiency
PromotePromotebrand strengthbrand strength
and marketand marketpositionposition
Improve theImprove theCustomerCustomer
ExperienceExperience
Business Imperatives
• Must get closer to the customer !
• Must improve productivity of knowledgeworkers !
• Must be able to integrate newtechnologies quickly !
• Must become flexible to facilitate rapidmarket changes !
BUSINESS RULES are a foundational element of achieving these objectives.
4
3
5
Data Volume /Location
Finding theAnswers
Data Inconsistency Getting theWrong Answer
Data Availability/Currency
Answers come toolate, or never
Data Scope IncompleteAnswers
Data Problem BusinessChallenges
A lot of data spread acrossdisparate systems
Different values for thesame fact or different factswith the same name
Data Freshness =Decision Latency =Missed Opportunities
Detailed historical datadoes not exist or is noteasily attainable
What’s Standing in the Way ofMeeting those Business Objectives?
What kind of discountWhat kind of discountshould I give thisshould I give thiscustomer today?customer today?
6
Attributes of the Optimum Solution
• One Corporate Version of the Truth
– Speaking one language
– No dueling numbers
• Single Source
– One stop shopping
– Saves user time by eliminating need to obtainmultiple interfaces and files
• Accessibility and timeliness of information
– Allows End-User access without IT dependence
– Reporting takes minutes or hours not weeks ormonths
4
7
• Cost avoidance
– Multiple applications access the same data
– Offloads Legacy systems for DSS reporting
– Avoids development of standalone applicationsto address specific data needs
– Eliminates redundant systems
• Infrastructure to integrate future corporateacquisitions and data growth
– Platform proven, highly scalable
• Ability to anticipate and respond to changes in thecompetitive marketplace
Attributes of the Optimum Solution
8
"Profitable" customer characteristics
"Single Version of the Truth"What kind of
discount should I give this
customer today?
"This" customer's characteristics
OperationsBilling Customer Service
FinancialSales ElectronicCommerce
On-line Transaction Systems
Single Version of the Truth
Enterprise Data
Warehouse
Attributes of the Optimum Solution
What kind of discount should I give this customer today?What kind of discount should I give this customer today?
5
9
What is an Enterprise Data Warehouse?
A place to bring together atomic level datafrom disparate systems, creating one versionof the corporate truth, which enables timely,accurate decision making in support ofstrategic and tactical business initiatives.
TeradataTeradata
EDW Planning
EDW Design & Imple.
EDW Usage,Support &
Enhancement
A process for properly assembling andmanaging data from various sources for thepurpose of answering business questionsand mak ing dec is ions tha t were notpreviously possible.
A methodology o f combiningorganizationally consistent data in amethod that allows the enterprise torespond to market changes.
10
How Data Warehousing SolvesInformation Challenges
• The Place– Hardware Architecture
• Distributed
• Federated
• Enterprise
– Software
• Scalability
• Manageability
• Accessibility
• Usability
Teradata Teradata
6
11
““VerticalVertical”” Business AnalysisOperations Legal Finance Marketing
Distributed(Data Marts)
Operations Legal Finance Marketing
EmployeesEquipment
MovementsLocations
ExpensesCustomers
Centralized(Enterprise)
““HorizontalHorizontal”” Business Analysis
Which Environment supports your Business Objectives?
12
Data Warehouse Core Issues
• Scalability– Ability to meet increasing demands for:
• Data Volumes
• Concurrent Users
• Complex Queries
• Accessibility– Ability to ask any question, at any time, of any data
– Currency of data to meet demands
• Manageability– Low maintenance requirements
– Integrated, parallel utilities
7
13
Scalability on all Levels
Amount of DetailedData
Concurrent Users
CUSTOMER
CUSTOMER NUMBERCUSTOMER NAMECUSTOMER CITYCUSTOMER POSTCUSTOMER STCUSTOMER ADDR
CUSTOMER PHONECUSTOMER FAX
ORDER
ORDER NUMBERORDER DATESTATUS
ORDER ITEM BACKORDERED
QUANTITY
ITEM
ITEM NUMBERQUANTITYDESCRIPTION
ORDER ITEM SHIPPED
QUANTITYSHIP DATE
Complexity of DataModel •Simple Direct at the start
•Moderate Multi-table Join
•Regression analysis
•Query tool support
•Complex, 64-way table join
•15 Pages, 37 From Clauses, 7 Unions,(Largest table >1 B rows, < 43 minutes)
Query Complexity
14
MainframesIBM, Amdahl
Hitachi, Unisys,Bull, and more...
Robust Accessibility
UNIXNCR, Sun, HP, Pyramid,
Sequent, RS/6000,Silicon Graphics,
Apollo
DesktopMS DOS, Windows,Window NT, OS/2
Macintosh
InternetNetwork Computers,MS Internet Explorer
Netscape, Mosaic,Java, CGI
Data Warehouse Server
Access any data on DataWarehouse Server fromany location at any time!
8
Manageability:Complex Query Performance
•Technical Enabling Factors:
• Compiled Execution Steps• Parallel Cost-Based Dynamic Optimizer• Parallel Joins• Multi-Join Look Ahead
• Parallel Sorting & Aggregation• Sync Full File Scans• Workspace Reuse• Non-Volatile RAM “NCR's Teradata RDBMS has a strong
track record for solving large andcomplex DSS requirements through the
use of parallel technology.”
-- Gartner Group15
• The greater the number oftasks processed in parallel,the better the systemperformance
• Many products are called‘parallel’, but they onlyperform some tasks inparallel
• Cost based optimizer shouldbe parallel aware
• Parallelism should beunconditional
• Each query step should befully ‘parallelized’ with nosingle threaded operations
Manageability:Degree of Parallelism
16
“Conditional Parallelism”
Teradata“Unconditional Parallelism”
Final Result S
et
Join
Aggregate
Sort
Convergence
Query Starts
Query Optimization
Scan
9
17
VPROCS VPROCS VPROCS VPROCS
BYNET
Cost BasedCost BasedOptimizerOptimizer
User or ToolCreated Query
System Configuration
Available Parallelism
Data Demographics
Parallel Execution
PlanBenefits• Plan based on Lowest Cost• No Hints required
• Always uses maximum Parallelism• Compiled Execution Steps
Manageability:Optimizers influence the DW power
18
Locomotive Dwell Time...Horsepower Hours idling in rail yards
Why are business rules so important?A Transportation Case Study - Asset Management
10
19
Locomotive Dwell TimeLocomotive Dwell Time
Inbound Trains Outbound Trains
Terminal
Dwell Time = horsepower-hours spent idling in terminals
What impact can reducing dwell time have on increasing velocity?
Key Success Indicator = Velocity
20
Cycle Flow (Velocity) Metrics
Segments
HoursHours per Segment
Segments
Miles
Miles per Segment
Velocity
Improving TransitTimes (Bottlenecks)
$22 Million
Arrive Intermediate Terminal
Depart Intermediate Terminal = Dwell Time
11
21
Dwell-Time Business Rules
• Term (A noun or noun phrase with an agreed upon definition)
–– ScheduleSchedule• Scheduled train departure
• Scheduled train arrival
–– Dwell TimeDwell Time• The difference between scheduled train arrival and scheduled
train departure
–– Excessive Dwell TimeExcessive Dwell Time• The difference between scheduled train arrival/departure and
actual train arrival/departure
–– Reason CodeReason Code• Reason for delay in train departure
22
•• FactFact (Connects terms into sensible business relevant observations)
– Dwell Time tracks train terminal delays
– Excessive dwell time exceeds scheduled trainarrival/departure
•• Business RulesBusiness Rules– Dwell Time is deemed excessive if it exceeds the schedule
by 15 minutes (mandatory)
– Excessive dwell-time must have a reason code(mandatory)
– If Dwell Time is excessive, then notify the dispatcher(action enabler)
Dwell-Time Business Rules
12
23
Reduced Locomotive Dwell Time ResultsReduced Locomotive Dwell Time Results Equals adding 141 locomotives to fleet!
32% Improvement!
$22,000,000 directly attributable to data warehouse
So, why are business rules so important?
The Formulation, Implementation, and Adherence to Business Rules Make this kind of Business Benefit Possible!
24
The Map
• How can I drive thatthat kind of benefit for my organization?
• How do we get there from here?
It’s about:Planning! Planning! Planning!...
13
25
What tools do
I need ?
What services do I need ?
Which database is best for my needs ?
How do I manage it
after I build it ?
How much hardwaredo I need to have ?
What data should I source ? How big
will it grow?Hardware
Software
Services
OLAP Tools
SystemIntegration
MDB
Data MiningTools
LogicalModel
SystemsMgmt
Databases
User Reqs
What is this going to cost ?
How long will it take ?
Where Do I Start?
What are the user requirements ?
What resources do I need?
26
How Do You Discover Business Needs?
• Business executive interviews byprofessional industry experts
• Draw out real business problems
• Uncover hidden agendas
• Identify business champions
• Protect company and individualinterests
• Reveal bottom-line impact
• Produce and present businessvalue and priority report
• Build solid foundation for nextsteps
Problem Diagnosis
DecisionAnalysis
InformationAvailability
ValueQuantifier
14
27
Info Req.
Data Whse?
ChangeRequired
Issues &Opportunities
?
Six Key Business Questions
What are your key businessissues and opportunities?
What information do you need to support these issuesand opportunities?
Which information needs are being satisfied todayby the data warehouse?
What process change is/would be associated with theissue and opportunity?
What is the value of that process change?
How will that value be measured?
28
These are the Kinds of Answers You areLooking for
• Question: What is one of your business issuesand/or opportunities?– Answer: We’d like to increase the velocity, our key indicator
of success, of our trains system-wide.
• Question: What information do you need to supportthese issues and opportunities?– Answer: We need to know the reason for the excessive
amount of time a train spends in an intermediate terminal.
• Question: Which information needs are beingsatisfied today by the data warehouse?– Answer: We know the arrival and departure times of the
intermediate stop.
15
29
More of the Kinds of Answers You areLooking for
• Question: What process change is/would beassociated with the issue or opportunity?– Answer: We could change the mix of trains arriving at a
given terminal at the same time, helping to decreasecongestion in the yard.
• Question: What is the value of that processchange?– Answer: It would allow us to optimize fleet utilization
and avoid purchasing additional equipment.
• Question: How will that value be measured?– Answer: We’d measure dwell time and its impact on
velocity.
30
Model the Business
• Business information MODELING– Determined by BUSINESS RULES & VISION
– The ER MODEL NEVER changes UNLESS• Underlying way of doing business changes, or
• Adding NEW subject areas to MODEL will not impactexisting model
CUSTOMER
CUSTOMER NUMBERCUSTOMER NAMECUSTOMER CITYCUSTOMER POSTCUSTOMER STCUSTOMER ADDRCUSTOMER PHONECUSTOMER FAX
ORDER
ORDER NUMBERORDER DATESTATUS
ORDER ITEM BACKORDERED
QUANTITY
ITEM
ITEM NUMBERQUANTITYDESCRIPTION
ORDER ITEM SHIPPED
QUANTITYSHIP DATE
16
31
What is a Logical Data Model (LDM)?
• A Logical Data Model (LDM) is the result ofinformation modeling
• A LDM is a diagram which shows:
– Entities (data of importance to the organization)
– Attributes (properties of the data)
– Relationships between entities
• LDMs are completely technology independent of anyparticular database or hardware platform
A schematic view of the environment and amock-up representation of something in thereal world.
32
Linkage of Information Models
Enterprise Information Model
Logical Appl. Model Physical Appl. ModelLogical Appl. Model Physical Appl. Model Logical Appl. Model Physical Appl. Model
Project - A Project - B Project - C
Enterprise Data Standards
Subject Area ‘A’
Enterprise Logical Data Model(3NF)
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxx
xxxxxxx
Subject Area ‘A’
17
33
What is a Business Rule?
• Set of conditions that govern a business event so that it occurs in a waythat is acceptable to the business.
• Business people identify rules that define all possible and permissible/notpermissible conditions for the business
• Business rules should be written for and understood by business peoplein natural language and independent of technology
• Business rules are meant to be challenged by business people andimplemented in technology that allows for controlled, but spontaneousbusiness change
“An exciting new technology called business rules is beginning tohave a major impact on the IT industry, more precisely, on the waywe develop and maintain computer applications. Business rules canbe seen in some respects as the next (and giant) evolutionary stepin implementing the original relational vision.” …C. J. Date, The Business Rules Approach to Application Development
34
Types of Business Rules
• Constrains information on the behalf of the business event
– Constraints are mandatory
– Guidelines are suggestive
• Enables other action on behalf of the business event
– Action enabling rules
• Creates new information on the behalf of the business event
– Computations
– Inferences
18
35
Business Rules Challenges
Taxation Welfare Education Food Stamps
Different keys, Same data as other Different data, Unique data foundsame data apps, but uses but uses same only here, nowhere
different names names as data elsefor it in other apps with
different meaning
Operational Systems are usually designed to solve a specificbusiness problem and are rarely developed to a coordinated
corporate plan.
“And get it done quickly; we don’t have time to worry about corporate standards ...”
Compliance Collections Adjudication Litigation
36
Definitions
• Term: A noun or noun phrase with an agreed upon definition
– Customer
– Credit Rating Code
– Female
– Days of the American work week
• Fact: A statement that connects terms, through prepositions
and verbs, into sensible business relevant observations
– Customer can place order
– Order is for line item
– Line item is for order
– customer qualifies for customer credit rating code
19
37
Constraint(Mandatory)
A complete statement thatexpresses anunconditionalcircumstance that must betrue or not true for thebusiness event tocomplete with integrity
• A customer must not have more than 10 openorders at one time
• The total dollar amount of a customer ordermust not be greater than the customer’ssingle order credit limit amount
Constraint(Guideline)
A complete statement thatexpresses a warning abouta circumstance that shouldbe true or not true
• A customer should not have more than 10open orders at one time
Action Enabler A complete statement thattests conditions and uponfinding t hem true, initiatesanother business event,message or other activity
• If a customer is valid, then initiate the placeorder process
• If a customer is high-risk, then notify thecustomer services manager
Computation A complete statement thatprovides an algorithm forarriving at the value of aterm (sum, difference,quotient, count, maximum,minimum, average)
• The total amount due for an order iscomputed as the sum of the line-itemamount(s) for the order plus tax
Inference A complete statement thattests conditions and, uponfinding them true,establishes the truth of anew fact
• If a customer has no outstanding invoices,then customer is of preferred status
• If a customer is of preferred status, then thecustomer’s order qualifies for a 20% discount
Definitions and Examples of Rules
Category Definition Examples
38
Business Rules Methodology Phases
• Scoping: Capturing high-level businessrequirements and boundaries
• Business context for the eventual business rules
– mission: Provide customers worldwide with the best service onthe highest quality consumer electronics at competitive prices
– objective: Increase repeat customer business by 15% by the endof the year
– strategy: Ship orders as quickly as possible
– tactic: Employ shipping service to deliver 95% of orders for nextday delivery
– policy: Ship all orders received before 4PM for next day arrival atcustomer location
– rule: If an order is entered by 4PM on a business day, if stock isavailable and if customer credit is okay, then the order must beshipped for arrival at the customer location by noon on the nextbusiness day.
20
39
• Planning:– Create the project plan for building the business rules
• Discovery:– Rules and Data:
• tasks or activities behind each business event
• decisions made on behalf of those tasks or activities
• information referenced in making those decisions
• knowledge created or judgements made by those decisions
• sample event scenarios for testing completeness
• Discovery never stops!!!
• Analysis:– Finding inconsistencies and redundancies
– Determine rules that are shared across organizational andapplication boundaries
– Input to the rules-enriched logical data model
Business Rules Methodology Phases
40
Information Evolution In a DataWarehouse Environment - Traditional
STAGE 1
REPORTING
WHAT happened?
Pre-defined Queries
“Show items 20%or more below planhaving zero inventory”
STAGE 2
ANALYZING
WHY did it happen?
Ad Hoc Queries
“Show items 20%or more below planhaving zero inventoryand local weather info”
STAGE 3
PREDICTING
WHAT will happen?
Analytical Modeling
“When Vendor X supply ofY drops by N tell me theprojected effect on sales ineach store”
21
41
Trends Along the Way...
If your warehouse is successful, you will experience:
• More complexity in the queries
• More ad-hoc queries to support new analysis
• More detailed data, both in width and depth
• More users
• Large user groups
• More flexible analysis
• More cross-functional requirements
• Close the gap between event and action
• Less certainty of what comes next
42
Information Evolution in an ActiveData Warehouse Environment
Incr
easi
ng
Qu
ery
and
Wo
rklo
ad C
om
ple
xity
Increasing Data Detail, Volume, Integration & Schema Sophistication
Continuous Update &Time Sensitive Queries
Become Important
Stage 4OPERATIONALIZING WHAT Is Happening?
Event BasedTriggeringTakes Hold
Stage 5ACTIVE
WAREHOUSING MAKING it happen!
Continuous Update/Short Queries
Event-Based TriggeringPrimarily Batch
Stage 1REPORTING
WHAThappened?
AnalyticalModeling
Grows
Stage 3PREDICTING
WHY will it happen?
Batch
Ad Hoc
Analytics
Increase in AdHoc Queries
Stage 2ANALYZING
WHY did it happen?
Increasing Business Value and Im
pact
Increasing Business Value and Im
pact
If you don’t plan for thisand provide a capable
foundation,you will not easily
evolve through thesestages!
22
43
Also drives tactical decisions
Results measured with operations
Within minutes; only comprehensivedetail data is acceptable
High user concurrency
Complex data mining todiscover new hypotheses vs.confirming prior ones
Operational staffs, call ctrs, externalusers
Active Warehouse
Evolution of Traditional to Active DataWarehousing
Strategic decisions only
Results sometimes hard tomeasure
Daily, weekly, monthly datacurrency is acceptable;summaries often appropriateModerate user concurrency
Highly parameterized reporting,often using pre-built summarytables or data marts
Power users, knowledgeworkers, internal users
Traditional Warehouse
44
INTERNET
TECHNOLOGYREGULATORS and
GOVERNMENT BODIES
COMPETITION
CUSTOMERS!!Marketplace Dynamics DEMAND
a customer centric focus!
What’s Driving this Evolution?
Customer Relationship Management is the primary relationshipdriving the next generation of data warehousing...
23
45
• An operationalized, business-critical component of theenterprise system
• Cross-department, cross-channel; supports enterprise levelbusiness objectives
• Results of data analysis are translated into actionabledecisions
•• Shortens the time between source data and business actionsShortens the time between source data and business actionstaken as a result of analyzing that datataken as a result of analyzing that data
•• (Dwell, Idle, Lag, Response time)(Dwell, Idle, Lag, Response time)
• Underlying philosophy is to increase speed and accuracy ofbusiness decisions
• It’s broader than “real-time” or “closed-loop” data warehousing
• Each step can be as near real-time as it needs to be
• There is a very mixed workload (unlike OLTP). Each type ofwork has its own service level requirements
• It’s an evolution, not an end point
What is an Active Data Warehouse?
46
Benefits of Active Data Warehousing
• More timely identification of non-compliant clients
– Shortens the time between event and action
• Fraud detection during a motor vehicle transaction
• At-the-gate upgrades offered to the most appropriateairline passengers based on last-minute seat availability
• Customizing complex pricing while at the customer site,
based on the customer’s current value to you
• Deciding which truck to route a late-arriving package on
with minimal delay to other package deliveries
• Offering upsell coupons coincident with a transaction, that
complement the purchase and is not a duplicate
24
47
A Complex Operational Topology
Operational Data
Business Users
IT Users
Data Transformation
ODS Layer
Enterprise Warehouse &Management
Data Replication
Data Mart “Spokes”
48
An Enterprise Active Warehouse
Operational Data
Data Transformation
Active Warehouse: ODS, Enterprise Warehouse, Logical Data Marts
Replication
Physical Data Mart orDepartmental
Warehouse
IT Users
Business Users
25
49
Measures of Success
• Measurable ROI• The data warehouse is used• User satisfaction• Additional requests for DW functions and data• Business performance-based benchmarks• Goals and objectives met• Business problems solved• Business opportunity is realized• DW has become an agent of change• Delivered on time and within budget
Source:”Data Warehouse Project Management,” by Sid Adelman and Larissa Terpeluk Moss, Addison-Wesley, 2000
These can be quantified!
50
• Measurable ROI– The investment in Teradata solutions paid for itself within a
year at Union Bank of Norway, thanks to results from directmarketing.
• The DW is used– At Norfolk Southern, 1700+ external users access the DW via the
web, contributing to a 35% decrease in service center calls.
• User satisfaction– At SBC, any user can ask any question of any data at any time
• Additional requests for DW functions anddata– TX Comptroller of Public Accounts DW enabled collection of $94
million in underpaid and unpaid taxes in first 2 years. Addingdata sources enabled them to identify planes kept in TX, but notregistered there, generating $1.5M in additional tax revenue.
Measures of Success
26
51
Measures of Success
• Business performance-based benchmarks– Travelocity.com conversion rate highest in industry at 8.9%– National Australia Bank customer retention rates at market
leading 98.4%
• Goals and objectives met– 3M’s GEDW doubled on-time performance to key customers
– 3M reduced inventory levels from 4 months to target of justover 3 months, adding $437 million to cash flow
• Business problems solved– Charming Shoppes used their data warehouse to drive store
changes that increased their earnings from $139.2M loss to$10.9M profit in two years
Texas Comptroller of PublicAccounts
A Map to Success!
27
53
Texas Comptroller of Public Accounts
• Company Overview– 3,000 employees
– Enforces tax laws and collects taxes and fees owed to oneof the largest states in the U.S.
– Administers more than 30 different state taxes
– Processes 3.3 million tax returns annually
– Will collect $48.7 billion in taxes in years2002-2003
54
• Business Challenge– Outdated infrastructure made it difficult to improve the level of
taxpayer compliance
• Business Objective– Agency had to develop the systems to provide its employees
with decision support capabilities and detailed information to
help them better perform their audit and customer service
responsibilities, and improve tax collection efficiency.
Texas Comptroller of Public Accounts
28
55
• Solution– Advanced Database System (ADS) includes:
– Teradata RDBMS running on a dual-node WorldMark 4700
server and 360GB of disk array storage
– Custom-designed web access tool to access datawarehouse
– Perform data mining via custom-designed applications
– Professional Services
Texas Comptroller of Public Accounts
56
• Results– Has collected $150 million in additional revenue from
leads generated by Teradata Advanced Database System
– Users continue to uncover an average of $1 million each
week in additional revenue
– Improved efficiency and accuracy in detecting taxpayers’
difficulties complying with tax laws
– Productivity gains – less staff, more leads
– More accurate identification of non-compliant taxpayers –
meaning the right people are being audited
Texas Comptroller of Public Accounts
29
57
• Why Teradata– Effectively houses vast amounts of
data– Teradata’s previous tax expertise
with similar state and federalorganizations
– Demonstrated the value of using adata warehouse to mine detailedinternal and external data
“By providing users with
access to detail data that
was never before available,
the Teradata-based solution
is delivering some
substantial results for the
Comptroller’s office and the
residents of Texas.”
– Lisa McCormack, Area
Manager, Audit Division
Texas Comptroller of Public Accounts
58
CGGCGG
Thank You Very Much!
Alison TorresTeradata Certified Master
Teradata, a division of NCR732-809-2668