Upload
truongcong
View
216
Download
0
Embed Size (px)
Citation preview
2
Maximize Return on Mainframe Data with Informatica
Scott Hagan
Phil Line
PowerExchange Product Managers
Informatica
3
Maximize Return on Mainframe Data with Informatica
• 3 Companies
• 3 Verticals
• 3 Informatica Professionals
4
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
Business Value Business Value
Financial Services
Transportation
and Logistics
Worldwide
Resort
Management
Business Value
Perspectives & Best Practices
Perspectives & Best Practices
Perspectives & Best Practices
ROI ROI ROI
5
Lars.Gronkjar.Olsen
System Architect
BEC
6
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
What is BEC?
• Wall to wall provider of IT-services • Small banks in Denmark
• Other financial institutions (insurance, mortgage)
• Owned by the banks
• Top 10 IT-provider in Denmark
• 600+ employees
• Heavily based on IBM mainframe
• Second “strategic platform” is Microsoft
Windows
Who Am I?
Systems Architect
• Data Architect
• “Fluent” in several RDBMS’s
• “Fluent” both Unix and Windows
• Former ”technology leader” at the Danish Informatica distributor (Component Software)
• Currently data integration specialist at BEC
7
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
Mainframe at BEC
Minimize load on the Mainframe (MSU’s),
actively control mainframe growth and costs
DB2 Database centric systems
Key systems for Banking, Insurance &
Mortgages
• All Mission Critical Systems
• Customer Analytics
• Customer Reporting
Informatica at BEC
Used Informatica since 2003 Windows & SQL-Sever
• Currently on PWC/PWX 8.6.1.
• Production Windows 2003 (64b) - 3*4 Cores
• 3 Addition Environments (DEV/TEST/USERTEST)
• Data warehouse Off mainframe
• More recent (2010) Data Integration use cases
• 18,000 daily sessions initiated via mainframe OPC
• ~1100 DB2 tables registered for capture with PWX
• ~ 500 DB2 tables accessed via DB2 connect
8
WHY BEC and Informatica
Main objective: Offload expensive processing from mainframe to PowerCenter
Price per processing-unit (MIPS) is more than 1 million times cheaper, but probably a bit less effective as well.
Before mapfwk.jar – Informatica development used to be more expensive than COBOL
Storage prices are the same (same SAN used)
Database response times for the user are often the same (I/O bound)
9
BEC BI Environment
DSA mappings
• Only inserts (no deletes/updates)
• 100% generated with mapfwk.jar
API
• 2 types
• CDC in real-time (PowerExchange)
• “Simulated” based on snapshots once per day
Data mart
database
(SQL server)DB2
Powercenter
DSA
mappings
DSA database
(SQL server)
Powercenter
EDW
mappings
EDW database
(SQL server)
Powercenter
Data mart
mappings
Business
Objects
Users
Data mart
database
(SQL server)
Data mart
databases
(SQL server)
Powercenter GRID – 3x4 CPU
EDW/Datamart mappings
• Insert/updates/deletes
• Mapping “stub” generated with mapfwk.jar API,
the rest done by developer
• TYPE1 (Kimball’ish definition – no historic
information)
• TYPE2 (Historic information is recorded with
start/enddate on different records)
• 4 generators in all
10
About PowerExchange CDC
• ECCR Reads the DB2 logs
• Logger Writes relevant data into file
• PowerCenter client reads these files through Listener
• All records have added columns available
• TIMESTAMP (when did the change occur)
• ACTION (I/U/D = Insert/Update/Delete)
• We store these records “as-is” in the DSA database
11
BEC Findings PowerExchange CDC
• Condense process is not in use
• PowerCenter mapping cannot contain
more than ~60 tables.
• Even this number requires heavy use of mapplets
• Listener cannot handle more than ~20
mappings
• You can create more than one listener
12
About “Simulated” DSA
• Generated mapping + session
• Read + sort all rows from source
• Read + sort “latest” rows from target
• Compare (full outer join on business key)
• Rows not in target=insert
• Rows not in source=delete
• Rows with different data columns=update
13
Findings “Simulated” versus CDC
• Less MIPS per table consumption in some cases
• PowerCenter does the hard part – FAST
• Generates fewer rows in DSA
• Sometimes significantly
• Smaller implementation time
• Even trough both are 100% generated
• Difference lies outside powercenter
• Works for all data source types
14
• Same grouping maintained in BI
• PowerCenter batch initiated from mainframe
• PowerCenter workflows
• Database partitions
• Heavy use of WF variables
• Little use of parameter files
• “Cloning” done at deploy time
• Modifications of WF variables and DB-connections in XML
• All based on mapfwk.jar API
Our Solution
BEC Multi Tenancy and PowerCenter
• More than one bank in source tables
• 9 Groups of banks called “koersel” each in different DB2 table schemas
• All batch on mainframe is initiated in these groups
• Different DB2 connection per group
Requirements
15
Findings: mapfwk.jar API
• Saves hundreds of hours per year for BEC
• In development time
• Reduced number of errors
• Naming conventions is enforced
• Sessions is generated as well (important)
• We have added many other features
• E.g. generate DDL for target tables
• Steep learning curve
• Necessary when running ~1000 tables through PowerExchange and PowerCenter 8.6.1
16
Scaling PowerCenter
• Identify peak hours
• pmcmd GetServiceDetails
• Waiting sessions
• Running sessions
• What are my deadlines per data mart?
17
Scaling PowerCenter
Blue = # running sessions Yellow = # waiting sessions
New workload
with low
priority
18
Options
1. Use Job dependencies (dependency-chain)
• One big DSA runs first
• One big EDW waits for DSA and runs second
• Max priority data mart wait for EDW and runs first
• Min priority data mart wait for all others and runs last
2. Use Service Levels (dependency-tree)
• 1 per solution per data stage
• DSA, EDW and data marts each have different priorities
• Data marts ordered by deadline
Obviously “solution”
1 stinks
19
Service Levels versus Job-Dependencies
• Slightly more difficult to implement
• You need a dependency-tree rather than a dependency-chain
• But:
• maintaining correct dependencies is easier with a dependency-tree in the long run
• Extremely robust
• If an important job requires manual intervention the CPU’s keep on running 100% while problem is being fixed
• Very well suited for a multi tenancy set-up
• No customer wants to be delayed because of “wrong” data created by another customer
20
Service Levels in PowerCenter
• Defined at the domain level
• Priority 1 runs first
• Priority 10 is max
• Referenced from WF by name
• If not defined in target REPO, job runs at service level “Default“
• “Default” = potentially wait forever
21
BEC ROI Findings
• Offload of mainframe MIPS is the key
• CDC when it makes sense
• Condense costs is significant
• Both extra MIPS and stability issues
• 10+ mappings per day per developer only possible with mapfwk.jar
• PowerCenter GRID engine with intense use of service levels is powerful
22
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
Optimized Mainframe
• Consolidation for optimal performance and operations
• (A couple of Specific areas that potentially have benefitted from the work that you have optimized or relocated….)
• Future – looking at heavy costs of SAS mainframe.
By using Informatica
• A couple of bullets about Provisioning data, single view of customer etc.. Just a few examples that you could mention here.
23
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
ROI Mainframe
• Efficient Mainframe operation without overhead of BU reporting needs…..
• SAS will provide additional $ annual savings
ROI Informatica
• Codeless, Re-Usable
• Developers can focus on BU needs rather than infrastructure “clutter”
• Less reliance on skilled (mainframe) people
• Quicker to bring BU requirements to market
• ….
24
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
Maximizing Mainframe data within a cost conscious
business Business Value
Financial Services
Transportation
and Logistics
Worldwide
Resort
Management
Business Value
Mainframes provide the business with a robust, reliable computing landscape. My job is to keep the business agile with innovative and cost effective
solutions.
Perspectives & Best Practices
Perspectives & Best Practices
Using Informatica we are providing “live” MI capability as
well as $million savings on annual costs through helping to
drive IT optimization.
ROI ROI
Lars Grønkjær Olsen
25
Narendra Joshi
Informatica Administrator/ETL Architect
BNSF
26
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
What does BNSF do?
• BNSF is headquartered in Fort Worth, Texas with about 40,000+ employees.
• BNSF operates one of the largest railroad network in North America.
• BNSF plays a vital role in US economy, hauling the products consumers use
every day and raw materials manufactureres need to make those products.
• BNSF is a critical link that connects consumers with global marketplace.
• Length of network: 32,000 route miles.
• States in network: 28, Canadian provinces in network: 2
• Capital investment since 2000: $36 Billion
• Packages shipped on time during typical holiday season: 50 Million
• Carloads shipped in 2011: 9.4 Million.
• Distance BNSF hauls 1 ton of freight on 1 gallon of diesel fuel: 500 miles
27
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
Who am I?
• Informatica Administrator/ETL Architect at BNSF.
• A proud alumni of SC&SS, JNU, New Delhi.
• Over 20+ years of IT experience in Application Development, Data Management.
• Associated with many successful high profile projects (Over 10+ Millions).
• BI Dashboard Architect in previous Avatar.
• Work closely with my compadre Scott Solomon
Info about Scott Solomon
• Informatica Administrator/DBA/ETL Architect at BNSF.
• Over 20+ years of IT experience in Application Development, Data Management.
28
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
29
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
Mainframe environment at BNSF
• Mission critical customer-centric applications.
• DB2 z/OS centric applications.
• A mix of legacy applications with very high online activity for some tables.
• Real time Informatica Powerexchange CDC.
Mid-Tier environment at BNSF
• A mix of DB2 UDB, DB2 purescale, Teradata database.
• Focus towards complex event processing, enterprise bus architecture.
• Packaged applications like SAP, GIS, Oracle CRM.
30
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
Informatica environment at BNSF
• Mission critical customer focused applications.
• IBM AIX V5.X servers, DB2 V9 repository.
• Active migration plan to migrate to V91.
• Over 5000+ sessions and counting.
• Data Integration with SAP, GIS & Other applications.
31
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
Data Integration initiatives at BNSF
• Major Data Integration initiative started in 2004.
• BNSF had a requirement to seamlessly replicate data from highly active DB2
z/OS OLTP tables to Mid-Tier environment.
• Total updates/inserts are 27,000 per minute.
• Total 40-45 million inserts/updates every day.
• Some tables have 10,000+ inserts/updates every minute.
Business Value
• Minimize impact to OLTP users/customers.
• Reduce Storage/Application development cost.
• Reduce Mainframe CPU Utilization.
32
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
Perspective & Why Informatica?
• Informatica PowerExchange architecture utilizes the DB archive log files.
• This architecture ensures minimal impact to OLTP systems.
• Alternate to PowerExchange Real-Time CDC would have been other tools, thru
inhouse application design/development.
• Informatica PowerExchange/PowerCenter was choosen because of Robust
architecture, restartability.
• Limited capabilities on Mainframe side.
• Informatica PowerCenter provides virtually any source, any target capabilities.
• Service oriented architecture, positioned in Gartner Leader Quadrant.
33
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
PowerExchange Architecture
34
PowerExchange Architecture
Mainframe Environment
User Application
Tools
(ETL, EAI, BI) Informatica PowerExchange
Registration Data Maps
Change Stream DB2
Subsystem
Informatica PowerCenter
Data Records C
ollecto
r
Listen
er SQL
IFI 306 (Mainframe)
DB2 Logs
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
35
PowerExchange Architecture
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
36
BNSF PowerExchange processes/interaction
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
37
Primary objective of PowerExchange
• Mainframe does provide stability, reliable architecture and robust platform.
• Reduce footprint of legacy applications.
• Provide equally reliable & robust architecture on Mid-Tier side for Data Integration.
• Active plan to migrate to Plan to explore offloading of Mainframe Processes to
Mid-Tier side with V91.
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
38
PowerExchange best practices at BNSF Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
Some of PowerExchange best practices at BNSF
• 1) Resource configuration file. A separate folder on a shared drive to
import/export same Datamaps in different tracks of PowerExchange Development
Project.
• 2) Automated migration – Automated migration of Datamaps from one
environment to another using PowerExchange utilities.
39
Some of PowerExchange best practices at BNSF .. cntd
• 1) Resource configuration file. Under options tab, we can select “Resource
Configuration” option. This option allows us to specify a different location on a
shared disk. PowerExchange Datamaps can be exported/imported to/from this
location.
PowerExchange best practices at BNSF Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
40
Some of PowerExchange best practices at BNSF .. cntd
PowerExchange best practices at BNSF Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
41
Some of PowerExchange best practices at BNSF … cntd
• 2) Automated migration – Automated migration of Datamaps from one
environment to another.
• DTLURDMO utility can be used to migrate following objects from one environment to another.
• PowerExchange data maps
• PowerExchange capture registrations,
• PowerExchange extraction maps
• DTLURDMO statements are of following types:
• Global statements – For example username, password etc.
• Copy statements – specify the type of copy to be performed:
DM_COPY copies data maps.
REG_COPY copies capture registrations and, optionally, extraction maps.
XM_COPY copies extraction maps.
• Optional statement
PowerExchange best practices at BNSF Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
42
Some of best practices at BNSF … cntd
• Example JCL
PowerExchange best practices at BNSF Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
43
Some of best practices at BNSF … cntd
PowerExchange best practices at BNSF Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
44
PowerExchange Utilities
• PowerExchange does provide various other utilities. Some of the frequently used
PowerExchange utilities and their purpose is listed below.
• DTLUAPPL utility can be used to generate or print restart tokens for CDC
sessions.
• DTLUCBRG utility is designed to facilitate bulk capture registration and would be
useful for new implementations of change capture technology.
• DTLUCUDB utility performs the following functions:
• Creates a DB2 catalog snapshot to initialize the PowerExchange capture catalog table.
• Generates diagnostic information. Informatica Support might request this to resolve DB2 capture problems.
• DTLUTSK utility can be used to
• List all current tasks.
• Stop the task specified by TASKID parameter.
• Lists all current locations.
• Lists allocated datasets.
PowerExchange best practices at BNSF Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
45
Return on Investment at BNSF
• Data from Mainframe DB2 z/os was offloaded to Mid Tier side.This allowed new
applications to source data from Mid-Tier ODS rather than M/F.
• This resulted in offloading some applications from M/F and reduced long term M/F
CPU usage and cost. We monitor daily M/F CPU usage for PowerExchange.
• Seamless Integration of PowerExchange data maps into PowerCenter.
• We saw tangible/intangible benefits from not sourcing data from M/F.
• ROI can be justified in long term and not short term.
• There are various intangible benefits of using PowerExchange.
Summary Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
46
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
Maximizing Mainframe data within a cost conscious
business
Reduce cost, with minimal impact to OLTP
users/customers
Financial Services
Transportation
and Logistics
Worldwide
Resort
Management
Business Value
Mainframes provide the business with a robust, reliable computing landscape. My job is to keep the business agile with innovative and cost effective
solutions.
Migrate high volumes of data from Mainframe to Mid-Tier Databases by choosing an
equally reliable, robust solutions.
Perspectives & Best Practices
Using Informatica we are providing “live” MI capability as
well as $million savings on annual costs through helping to
drive IT optimization.
Choosing Informatica PWX/PC did provide
alternates to sourcing data from M/F which resulted in
savings millions
ROI
Lars Grønkjær Olsen Narendra Joshi
47
Scott Trometer Sr. Director, Information Technology
Data Management & Integration Wyndham Exchange & Rentals
48
Introduction
• 15+ years in IT
• Application Development Background
• Database Administration Background
• Currently Lead Data Mgt. & Integration Practice
• Also Responsible for Application Support
• Build Practices, Drive Technology Transformation
Scott Trometer Sr. Director, Information Technology Wyndham Exchange & Rentals
49 Privileged & Confidential
Wyndham Worldwide: A Dynamic Collection of Hospitality Brands
Worldwide leader in
vacation exchange
European leader in
vacation rentals
World’s largest vacation
ownership business
Leading provider of
points-based timeshare
World’s largest franchise
company
10% of U.S. hotel room
inventory
50
Our Beginning
Sales
Channels
Custom Apps
&
Databases DB2 IMS SQL Other
Global Call Centers Multi-Lingual Internet, Mobile
Business Partners
Z/OS, Unix Legacy Apps Variety of Databases
24 x 7
Z/OS UNIX
Web
& Middleware Oracle
EAI
51
0
40
80
120
160
Billions
2003 2004 2005 2006 2007 2008 2009 2010
Online Leisure Sales
The Problem
A Desire to Drive Significant Channel Shift Web Sales Channel Revenue Share 13%
U.S. Online Leisure Sales expected to Double between 2005 and 2009
Source: eMarketer, March 2007
52
0
10
20
30
40
50
Millions
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Actual Projected
The Problem
A Desire to Reduce Computing Costs Hosting Fees Doubled from 2003 to 2007
Projected to Double again within 5 years
Capacity Requirements increasing 27% per year
Estimated 60% of Costs Attributed to Online Search
53
The Problem – Opposing Forces
Increasing Demand Vs. Increasing Costs
Attempts to Meet Business Needs and Drive Channel Shift…
Result in Unsustainable Costs
Time to Step Back and take a Strategic Approach
54
Solution Objectives
Business Objectives Enhance Online Customer Experience
Grow Internet Revenue Share
Technology Objectives
Reduce Mainframe Computing Costs
Develop Modern Architectural Frameworks
Service Oriented Architecture
Contemporary Search Engine
Contemporary Rules Engine
Enterprise Data Management Framework
Incremental Legacy Migration
55
The Solution
Sales
Channels
Custom Apps
&
Databases DB2 IMS SQL Other
Web
& Middleware Oracle
Shared Services
Enterprise Search
Centralized BRMS
Z/OS UNIX
Web
& Middleware
ETL
Batch
Integrated User & Customer Experience
Distributed High-Cost Processing
Standard Data Integration Framework (CDC, Batch)
56
Data Integration Solution
PowerCenter Standard Edition
PowerExchange with CDC
Informatica 9.1
Solaris 10
57
Data Integration Solution
CDC Sources: 71 Tables / Databases
Changes: 1.2M changes per day
Latency:
• IMS 0-2 seconds (Synchronous)
• DB2 0-2 seconds
• Oracle 4-7 seconds (Logger, Groups)
Batch Jobs: 785 (IMS, DB2, Oracle, SQL Server, Other)
Volume: 340M records
Notes
Isolated Batch and CDC Staging Areas
Integrated Enterprise Scheduler (Autosys)
JMS Messaging
58
Results
• Up more than 240%
Online Revenue Share:
Mainframe Computing Costs:
• Down more than 60%
• MIPS reduced by 30%
• Saving Millions
59
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
Maximizing Mainframe data within a cost conscious
business
Reduce cost, with minimal impact to OLTP
users/customers
Financial Services
Transportation
and Logistics Hospitality
Enable Online Channel Shift , Technology Transformation
Mainframes provide the business with a robust, reliable computing landscape. My job is to keep the business agile with innovative and cost effective
solutions.
Migrate high volumes of data from Mainframe to Mid-Tier Databases by choosing an
equally reliable, robust solutions.
Operational Integration to reduce costs and enable
technology transformation. Frameworks for Data
Warehousing, Syndication
Using Informatica we are providing “live” MI capability as
well as $million savings on annual costs through helping to
drive IT optimization.
Choosing Informatica PWX/PC did provide
alternates to sourcing data from M/F which resulted in
savings millions
Migrating Key Use Cases has allowed us to Save Millions
while helping to drive an Online Channel Shift
Lars Grønkjær Olsen Narendra Joshi Scott Trometer
60
Maximize Return on Mainframe Data with Informatica Integrating & Managing Mainframe Data for Diverse and Evolving Business Needs
Financial Services
Transportation
and Logistics Hospitality
Performance Statistics
10 years Informatica @ BEC
Mainframe -> SQL-Server
~ 18000 Informatica Sessions
~ 1100 DB2 tables being captured
~ nMb of CDC data
~ Data Transfer times between?
Projected savings > $million
8+ years Informatica @ BNSF
Mainframe -> DB2 UDB/Teradata
~ 5000+ Informatica Sessions
~ 200 DB2 tables being captured
~ 27,000 upd/ins per min
~ 40-45 million upd/ins per day
~ Tables with >10,000 ins/upd per min
fashion.
Projected savings > $million
.
7+ years Informatica @ WER
Mainframe -> Oracle
~ 1000+ Informatica Sessions
~ 71 IMS/DB2 tables being captured
~ 1.2M CDC records / day
~340M Batch records
Savings > Millions
61
Thank You