40
Leading Practices and Insights for Managing Data Integration Initiatives May 7, 2010

Managing Data Integration Initiatives

Embed Size (px)

DESCRIPTION

This powerpoint slide deck is the presentation given at the Microsoft center in Waltham, MA titled Leading Practices and Insights for Managing Data Integration Initiatives.Topics covered include:Key DriversApproaches and StrategyTools and ProductsUseful Case StudiesSuccess Factors

Citation preview

Page 1: Managing Data Integration Initiatives

Leading Practices and Insights for Managing Data Integration

Initiatives

May 7, 2010

Page 2: Managing Data Integration Initiatives

04/12/232

Agenda

Introductions

Overview

Key Drivers

Approaches / Strategy

Tools

Case Studies

Success Factors / Lessons Learned

Q&A

Page 3: Managing Data Integration Initiatives

04/12/233

About Allin Wakefield, MA, based service provider of enterprise-quality solutions

and services to small and large companies Consulting practice provides technical and business expertise that

helps businesses implement strategic solutions, integrate key functions, and extend technical capability. Services include:

Integrative Application Development and Performance Tuning Systems Integration and Management Data Migration to comprehensive Design Services.

Solution focused in two key areas:

Microsoft® SharePoint® Virtualization software from Microsoft® and VMware®

Mark Bramhall, CTO Over 35 years of experience working with small and large firms Designed and implemented numerous data integration solutions Experienced technologist and integrator

Page 4: Managing Data Integration Initiatives

04/12/234

About Optima Middleton, MA, based technology consulting firm providing IT

leadership for both strategic and tactical projects

Consulting practice specializes in advising small- to mid-sized firms on optimizing their technology investments and improving business processing. Key services include:

Application Implementations

Data Integrations

Technology Optimization

Business Process Transformation

Irving Burday, President

Experienced technology leader (CIO at several companies)

Led several data integration and data warehouse projects

Directed / managed many complex integration efforts

Page 5: Managing Data Integration Initiatives

04/12/235

Data Integration Overview

Definition - combining data residing in different sources and providing users with a unified view of these dataDefinition - combining data residing in different sources and providing users with a unified view of these data

Mediated Schema Example Data Warehouse Example

Page 6: Managing Data Integration Initiatives

04/12/236

Business / Technical Drivers

Application Integrations

Legacy to new system migration and conversion

Legacy system feeds to new system

New application / Web site processing

Retirement of legacy application

Business Intelligence / Analytics

Data aggregation for data mining

Supporting predictive models and analytics

Feeding decision support and other business intelligence needs

Page 7: Managing Data Integration Initiatives

04/12/237

Business / Technical Drivers

Data Architecture Improvements

Managing feeds to/from Data Warehouse

Reorganization of Operational Data Stores and Marts

Coordinating feeds to/from Legacy systems

Coordinating feeds to/from External entities

External Factors

Supporting Acquisitions / Divestitures

Managing Mergers

Facilitating new Channels and/or Sales Growth

Page 8: Managing Data Integration Initiatives

04/12/238

Key Variables / Considerations Business Related

Scale of Effort

Availability / Skills of Resources

Funding

Timing

Technical Factors Platforms and operating systems Data management software Data models, schemas, and data semantics Middleware User interfaces

Frequency of integrations Business rules and integrity constraints

Page 9: Managing Data Integration Initiatives

Approaches / Strategies

04/12/239

Page 10: Managing Data Integration Initiatives

04/12/2310

Integration approach depends on architectural level

Approaches / Strategies

Page 11: Managing Data Integration Initiatives

04/12/2311

Several strategies for integrating data: Manual Integration – users directly interact with all relevant

information systems and manually integrate data. Requires users to have detailed knowledge of logical data representation and data semantics as well as dealing with different interfaces and query languages.

Application Specific – modifying applications or layering application specific code around an application to enable it to take data from or give data to external data stores.

Data Propagation - involves replicating data in different locations from different sources. Technologies include replication, database log scrapers and change data capture software.

Data Federation - enables a single unified virtual view of one or more source data files. Data federation technique normally employs a metadata reference file to connect related customer information together based on a common key.

Approaches

Page 12: Managing Data Integration Initiatives

12

Application Specific Solutions Application specific tools and utilities are frequently

provided by vendors to integrate and manage data. Key considerations for utilizing this approach:

Developing and using requires deep system knowledge Best results for special-purpose applications

A new data source requires new code to be written Usually optimal for one-time conversions / migrations

Data cleanup frequently requires multiple human interventions

Fragile if updating / changing the underlying data sources (may affect the application)

Can be expensive - in terms of time and skills

Page 13: Managing Data Integration Initiatives

04/12/231313

Data Propagation

Data Propagation is the distribution of data from one or more source data warehouses to one or more local access databases. Data Propagation methods include:

Bulk Extract – utilizes copy management tools or unload utilities to extract all or a subset of the operational relational database. The data which has been extracted may be transformed to the format used by the target on the host or target server.

File Compare – process compares the newly extracted operational data to the previous version. After that, a set of incremental change records is created and are applied.

Change Data Propagation – captures and records the changes to the file as part of the application change process. Techniques include triggers, log exits, or DBMS extensions. A file of incremental changes is created to contain the captured changes.

Page 14: Managing Data Integration Initiatives

14

Data Federation

Links data from two or more physically different locations, making the access/linkage appear transparent as if the data was co-located (versus a data warehouse that houses data in one location). Key elements of a data federated approach:

Middleware consisting of database management system

Uniform access to number of heterogeneous data sources

Provides query language used to combine, contrast, analyze and manipulate the data

Data integration can be done through database integration

Combine data from multiple sources with a single SQL statement

Create a master system that relates data elements from all line of business systems

Page 15: Managing Data Integration Initiatives

Data Integration Tools

04/12/2315

Page 16: Managing Data Integration Initiatives

04/12/2316

ETL

ETL: Extract, Transform and Load

ETL tools extract data from a chosen source(s), transform it into new formats according to business rules, and then load it into target data structure(s)

Enables rules and processes for managing diverse data sources and processing of high volumes of data

Direct insight into source data before; provides data profiling and quality control capabilities

Provides ability to map physical data items with a unique metadata description or create an abstraction layer of common business definitions to map all similar data items to the same definition

Page 17: Managing Data Integration Initiatives

ETL Integrated Architecture

XYZ Corp Systems

Data Extraction & Integration

Business Process LayerInformation Management

Presentation LayerAccessible throughout the organization

DistributionAd-hoc Analysis

To

ols

Core Business Systems

• Financial Systems• CRM Systems• Business Line systems• Other…• External Sources

Arc

hit

ec

ture

Portals/Dashboards Production Reports Ad-hoc reports Query Extracts Other data analysis and reporting tools

Reporting

Data Storage

Transformation

Subject Areas

Transformation

Extract, Transformation, & Load (ETL) Layer

The “Unified Business Model” and Information Management

Analytics and Reporting Tools

OLAP Cubes & Predictive Models

• Conformed facts and shared dimensions

• Pre-aggregated data stored in OLAP models to support variance analysis, exception reporting and drill-down

• Real-time drill through into relational storage

• Structured and unstructured content

ExceptionNotifications

Mining

• Extract, Transformation, and Load (ETL)

• Message Broker• Integration of actual data • Metadata (structured data)

management• Security Management• Administration

17

Page 18: Managing Data Integration Initiatives

04/12/2318

EAI

EAI: Enterprise Application Integration

An integration framework composed of a collection of technologies and services

A centralized broker that handles security, access, and communication

An independent data model based on a standard data structure (e.g., XML)

A connector or agent model where each vendor, application, or interface can build a single component that can speak natively to that application and communicate with the centralized broker

A system model that defines the APIs, data flow and rules of engagement to the system such that components can be built to interface with it in a standardized way

Page 19: Managing Data Integration Initiatives

04/12/2319

EAI Architecture

Page 20: Managing Data Integration Initiatives

04/12/2320

Tool Comparison

ETL versus EAI – what’s the difference?

Page 21: Managing Data Integration Initiatives

04/12/2321

ETL Tools - Key Features Architecture

Parallel Processing Scalability

− Job Distribution, Pipelining, Partitioning

− Common Warehouse Model (CWM) compliant

− Version Control

ETL Functionality Managing Data Streams (multiple targets,

splitting) Pivoting, De-pivoting, Unions Lookups Scheduling Error Handling

Data Extraction & Integration

Data Storage

Transformation

Transformation

Extract, Transformation, & Load (ETL) Layer

• Extract, Transformation, and Load (ETL)

• Message Broker• Integration of actual

data • Metadata (structured

data) management• Security Management• Administration

Page 22: Managing Data Integration Initiatives

04/12/2322

ETL Tools – Key Features (continued) Reusability

Reuse of components Decomposition

Debugging Step by step, row by row, breakpoints Compiler / Validater

Connectivity Native connections support (ODBC, OLE DB,

Flat Files) Integration to package / application meta data Data Quality, Data Validation

Ease-of-Use WYSIWYG Documentation

Data Extraction & Integration

Data Storage

Transformation

Transformation

Extract, Transformation, & Load (ETL) Layer

• Extract, Transformation, and Load (ETL)

• Message Broker• Integration of actual

data • Metadata (structured

data) management• Security Management• Administration

Page 23: Managing Data Integration Initiatives

Challenges

04/12/2323

Page 24: Managing Data Integration Initiatives

04/12/2324

Data Preparation / Quality

Completeness / Accuracy of Data Records Duplicates Half Match-able Data Freshness of Data

Technology Issues

Multiple and Mixed Data Formats Disparate Operating systems and processing platforms Source system constraints

Organizational

Business and IT Politics Ownership / Stewardship of Source Data Dedication of IT resource to manage daily functions

Challenges

Page 25: Managing Data Integration Initiatives

04/12/2325

Level of automation

Time based Event based

Frequency

Error handling

Reporting Requirements

Ownership of Error Remediation

− Technical failures

− Data failures

Auto-correction versus manual updating

Batch integrity

Challenges (continued)

Page 26: Managing Data Integration Initiatives

04/12/2326

Data Handling

Transformations Manipulations Transmission

Magnitude of Effort

Number of systems Volume of data Number of runs

Integration Challenges (continued)

Page 27: Managing Data Integration Initiatives

Case Studies

04/12/2327

Page 28: Managing Data Integration Initiatives

04/12/2328

Business Case 1

Service Company – Revenue: $200m, Size: 300 FTEsService Company – Revenue: $200m, Size: 300 FTEs

Client’s Business Challenge:Integrating data from customer web sites / CRM systems into operational and financial systems

Client’s objective was to build a one-time solution to manage data migrations

Solution:Use SSIS to develop a data migration framework that would allows transformation of data

Build custom stored procedure scripts to extract data from legacy applications

Lessons Learned:Data rules and manipulations required extensive analysis and documentation in order to streamline future state process

Created cross tabular map of legacy application tables to facilitate data mappings and data handling procedures during conversion and testing activities

Page 29: Managing Data Integration Initiatives

04/12/2329 04/12/2329

Education Company – Revenue: $500m, Size: 300 FTEsEducation Company – Revenue: $500m, Size: 300 FTEs

Client’s Business Challenge:Integrating data from a set of outsourced-function partners

Integration needed to be real time as clients transited Web sites, but could not fail in the face of network outages, system failures, etc

Solution:• Understand who is the data master and who only keep shadow copies for each type of data

• Design a way to uniquely identify data, even if multiple sources can create it• Deploy a publish / subscribe solution using reliable, persistent message queuing

Lessons Learned:• You cannot know your data too well; subtle relationships must become explicit

• Multi-partner integration requires extremely simple interfaces and definitions

Business Case 2

Page 30: Managing Data Integration Initiatives

04/12/2330

Healthcare Company – Revenue: $150m, Size: 200 FTEsHealthcare Company – Revenue: $150m, Size: 200 FTEs

Client’s Business Challenge:Integrating data feeds from source systems into new data warehouse

Implementing a data hub to manage data feeds from external entities (e.g., customers, banks) into financial and customer support systems

Solution:• Select / implement a full featured ETL tool to manage and handle data warehouse and miscellaneous data feeds

• Created data extracts from sources to manage data extract requirements and file formats

• Deploy a data quality program that cleansed incoming and transferred data prior to loading into destination system

Lessons Learned:• Error handling required additional time and effort to define error cases and remediation actions

• Data ownership required executive intervention to staff and manage data management process

Business Case 3

Page 31: Managing Data Integration Initiatives

Conclusions

04/12/2331

Page 32: Managing Data Integration Initiatives

The Value of a Data Quality Effort Data Remediation

Data management processes can not allow junk data to be loaded, migrated or transported into a target system

Data remediation procedures should be designed into every solution

Key Performance Indicators: Data Quality Compliance

Data quality indicators should be defined and monitored at all times

The KPIs should be used by the data management operations team to manage data processing and testing

The KPIs for management must be business focused and should show how poor data quality is financially effecting their business

32 04/12/23

Page 33: Managing Data Integration Initiatives

The Value of a Data Quality Effort Check Twice, Load Once

Data should be checked for validity prior to being loaded into target

Designers and developer MUST log the exact data quality errors and issues that are present in the data being processed

The data quality errors and issues must be summarized and reported on. Reports can be used by operations and source data owners to remediate the data and drive data compliance

33 04/12/23

Page 34: Managing Data Integration Initiatives

Importance of Governance

Poor Governance and Lack of Communication account for over 85% of the issues in a data integration project

SizeIncorrect hardware or software

7%

2%

Failure to define objectives

17%

Unfamiliarity with scopeand complexity

17%

Lack of communication

20%

InadequateProject Management

32%

Other

5%

34 04/12/23

Page 35: Managing Data Integration Initiatives

Data Stewardship

Data stewards act as the conduit between IT and the business and accept accountability for the data management process.

IT

Business

Business

Data Stewards

Domain Values Data Standards Business Rule Specifications Data Ownership Rules Data Quality Rules Security Requirements Data Retention Criteria

Data Stewards play a the central role in the management of data across the organization and in assuring its usefulness for the business.35 04/12/23

Data stewards become the “public face” for data and have the following responsibilities:

Page 36: Managing Data Integration Initiatives

Success Factors

Establish and agree upon scope, high level requirements, expected benefits, and architecture

Benefits need to be emphasized from the top down and understood from the bottom up

Data integrity and Data cleansing cannot be over-emphasized

Even well-documented systems are usually prone to poor data quality

Common definitions and mapping is crucial

A complex business is not made any less complex by documenting the data and putting it in a operational store

Knowledgeable use of the data will still require knowledgeable users

36 04/12/23

Page 37: Managing Data Integration Initiatives

Success Factors (continued)

Technology is only part of the answer No mater how sophisticated the implementation, significant process

change will be required

But, technology is key to success Having a key partner who has done this before will minimize risk

Much can be learned from similar efforts

This effort requires a full-time dedicated set of highly-skilled resources

Both technical and business knowledge are required

37 04/12/23

Page 38: Managing Data Integration Initiatives

Appendices

04/12/2338

Page 39: Managing Data Integration Initiatives

04/12/2339

ETL Vendors

ETL Vendors ETL ToolsMicrosoft  SQL Server Integration Services Oracle  Oracle Warehouse Builder (OWB)SAP Business Objects Data Integrator & Data Services IBM IBM Information Server (Datastage)IBM Data Manager/Decision Stream (Cognos)SAS Institute SAS Data Integration StudioInformatica  PowerCenterAb Initio Co>Operating SystemInformation Builders Data MigratorAdeptia Adeptia Integration ServerCastIron Systems OmniConnect PlatformPitney Bowes Business Insight DataFlow ManagerPervasive Data IntegratorElixir Elixir RepertoireJavlin  Clover ETLPentaho  Pentaho Data Integration Talend Talend Open Studio

Page 40: Managing Data Integration Initiatives

04/12/2340

ETL / EAI - Tool Strengths

ETL EAI Excels at bulk data movement Limited in data movement

capabilities

Provides for complex transformations, aggregation from multiple sources and sophisticated business rules.

Offer less sophisticated transformation and extraction functions

Assumes data delays. Operates in real time

Are batch-oriented, making them fast and simple for one-time projects and testing

Work better with continuously interacting systems

Offers little in the way of workflow Workflow-oriented at the core

Works primarily at the session layer Works primarily at the transport layer