36
SECOND QUARTER 2009 By Philip Russom OPERATIONAL DATA INTEGRATION A New Frontier for Data Management www.tdwi.org TDWI BEST PRACTICES REPORT

A New Frontier for Data Managementdownload.101com.com/pub/tdwi/Files/TDWI_ODI_Report_web.pdf · A New Frontier for Data Management ... SAP BusinessObjects, Silver Creek Systems, Sybase,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

second qua r t er 2009

By Philip Russom

operat ional data integrat ionA New Frontier for Data Management

www.tdwi.org

TDWI besT pracTIces reporT

Research Sponsors

DataFlux

expressor software

GoldenGate Software

IBM

Informatica Corporation

SAP BusinessObjects

Silver Creek Systems

Sybase

Talend

www.tdwi.org 1

© 2009 by TDWI (The Data Warehousing InstituteTM), a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or part prohibited except by written permission. E-mail requests or feedback to [email protected]. Product and company names mentioned herein may be trademarks and/or registered trademarks of their respective companies.

Table of Contents

By Philip Russom

operat ional data integrat ionA New Frontier for Data Management

second QUARTeR 2009TDWI besT pracTIces reporT

Research Methodology and Demographics . . . . . . . . . . . . 3

Introduction to Operational Data Integration . . . . . . . . . . . 4

The Three Broad Practice Areas of Data Integration . . . . . . . . . 4

The Three Main Practice Areas within OpDI . . . . . . . . . . . . . 5

Why Care about Operational Data Integration Now?. . . . . . . . . 7

The State of Operational Data Integration. . . . . . . . . . . . . 8

Growth of the OpDI Practice. . . . . . . . . . . . . . . . . . . . . 8

OpDI in Support of Business Initiatives and Technical Projects . . . 9

Benefits of Operational Data Integration . . . . . . . . . . . . . .11

Barriers to Operational Data Integration. . . . . . . . . . . . . . .12

Best Practices in Operational Data Integration . . . . . . . . . .13

Data Migration . . . . . . . . . . . . . . . . . . . . . . . . . . .13

Data Synchronization . . . . . . . . . . . . . . . . . . . . . . . .15

Business-to-Business (B2B) Data Exchange . . . . . . . . . . . . 17

Organizational Issues for Operational Data Integration . . . . .19

Staffing Operational Data Integration Practices . . . . . . . . . . .19

Competency Centers and Similar Organizational Structures . . . . .21

OpDI Team Collaboration and Cross-Functional Communication . . .23

Technology Requirements and Vendor Tools for OpDI . . . . . .24

Preferred Technologies for OpDI . . . . . . . . . . . . . . . . . . .24

Additional Technologies for OpDI . . . . . . . . . . . . . . . . . .26

Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . .31

2 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

About the AuthorPHILIP RUSSOM is the senior manager of TDWI Research at The Data Warehousing Institute (TDWI), where he oversees many of TDWI’s research-oriented publications, services, and events. Before joining TDWI in 2005, Russom was an industry analyst covering BI at Forrester Research, Giga Information Group, and Hurwitz Group. He also ran his own business as an independent industry analyst and BI consultant and was contributing editor with Intelligent Enterprise and DM Review magazines. Before that, Russom worked in technical and marketing positions for various database vendors. You can reach him at [email protected].

About TDWIThe Data Warehousing Institute, a division of 1105 Media, Inc., is the premier provider of in-depth, high-quality education and training in the business intelligence and data warehousing industry. TDWI is dedicated to educating business and information technology professionals about the strategies, techniques, and tools required to successfully design, build, and maintain data warehouses. It also fosters the advancement of data warehousing research and contributes to knowledge transfer and the professional development of its Members. TDWI sponsors and promotes a worldwide Membership program, quarterly educational conferences, regional educational seminars, onsite courses, solution provider partnerships, awards programs for best practices and leadership, resourceful publications, an in-depth research program, and a comprehensive Web site (www.tdwi.org).

About the TDWI Best Practices Reports SeriesThis series is designed to educate technical and business professionals about new business intelligence technologies, concepts, or approaches that address a significant problem or issue. Research for the reports is conducted via interviews with industry experts and leading-edge user companies and is supplemented by surveys of business intelligence professionals.

To support the program, TDWI seeks vendors that collectively wish to evangelize a new approach to solving business intelligence problems or an emerging technology discipline. By banding together, sponsors can validate a new market niche and educate organizations about alternative solutions to critical business intelligence issues. Please contact TDWI Research Director Wayne Eckerson ([email protected]) to suggest a topic that meets these requirements.

AcknowledgmentsTDWI would like to thank many people who contributed to this report. First, we appreciate the many users who responded to our survey, especially those who responded to our requests for phone interviews. Second, our report sponsors, who diligently reviewed outlines, survey questions, and report drafts. Finally, we would like to recognize TDWI’s production team: Jennifer Agee, Bill Grimmer, Denelle Hanlon, and Deirdre Hoffman.

SponsorsDataFlux, expressor software, GoldenGate Software, IBM, Informatica Corporation, SAP BusinessObjects, Silver Creek Systems, Sybase, and Talend sponsored the research for this report.

www.tdwi.org 3

Research Methodology

Research Methodology and DemographicsReport Scope. This report is designed for business and technical executives who are responsible for planning and implementing programs for operational data integration (OpDI). This report will help organizations worldwide understand the current state of OpDI, as well as where it’s going. The report drills into the business initiatives, technical implementations, and cross-functional organizational structures relevant to OpDI, as well as common starting points and success factors.

Research Methodology. Most of the market statistics presented in this report are based on a research survey. In November 2008, TDWI sent an invitation via e-mail to the data management professionals in its database, asking them to complete an Internet-based survey. The invitation was also distributed via Web sites, newsletters, and conferences from TDWI and other firms. There were 401 respondents to the survey, though not all of them answered every question. From these, we excluded the respondents who identified themselves as academics or vendor employees, leaving the surveys of 336 respondents as the core data sample for this report.

Seventy-five percent of these respondents report having worked on OpDI projects, so statistics based on their responses accurately represent real-world experiences and plans.

TDWI Research also conducted numerous telephone interviews with technical users, business sponsors, and recognized experts in the field of OpDI. TDWI received product briefings from vendors that offer products and services related to the best practices under discussion.

Survey Demographics. The wide majority of survey respondents are corporate IT professionals (69%), whereas the remainder consists of consultants (20%) or business sponsors/users (11%). We asked consultants to fill out the survey with a recent client in mind.

The financial services (15%), consulting (14%), and insurance (11%) industries dominate the respondent population, followed by software (8%), healthcare (7%), manufacturing (7%), telecommunications (5%), education (4%), and miscellaneous other industries. Most respondents reside in the United States (66%) or Europe (14%). Respondents are fairly evenly distributed across all sizes of companies and other organizations.

Position

Corporate IT professional 69%

Consultants 20%

Business sponsors/users 11%

Industry

Financial services 15%

Consulting/professional services 14%

Insurance 11%

Software/Internet 8%

Healthcare 7%

Manufacturing (non-computers) 7%

Telecommunications 5%

Education 4%

Government (all levels) 3%

Media 3%

Retail/wholesale/distribution 3%

Transportation/logistics 3%

Utilities 3%

Other* 14%

Geography

United States 66%

Europe 14%

Canada 7%

Asia 5%

Australia 4%

Africa 2%

Central/South America 1%

Other 1%

Company Size by Revenue

less than $100 million 16%

$100–500 million 15%

$500 million–$1 billion 12%

$1–5 billion 16%

$5–10 billion 14%

More than $10 billion 17%

Don’t know 10%

Based on 336 survey respondents.

* The “other” category consists of multiple industries, each represented by 2% or less of respondents.

4 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

Introduction to Operational Data IntegrationThe amount and diversity of work done by data integration specialists has exploded since the turn of the twenty-first century. Analytic data integration continues to be a vibrant and growing practice that’s applied most often to data warehousing and business intelligence initiatives. But a lot of the growth comes from the emerging practice of operational data integration, which is usually applied to the migration, consolidation, or synchronization of operational databases, plus business-to-business data exchange. Analytic and operational data integration are both growing; yet, the latter is growing faster in some sectors.

But growth comes at a cost. Many corporations have staffed operational data integration by borrowing data integration specialists from data warehouse teams, which puts important BI work in peril. Others have gone to the other extreme by building new teams and infrastructure that are redundant with analytic efforts. In many firms, operational data integration’s contributions to the business are limited by legacy, hand-coded solutions that are in dire need of upgrade or replacement. And the best practices of operational data integration on an enterprise scale are still coalescing, so confusion abounds.

The purpose of this report is to identify the best practices and common pitfalls involved in starting and sustaining a program for operational data integration. The report defines operational data integration in terms of its relationship to other data integration practices, as well as by its most common project types. Along the way, we’ll look at staffing and other organizational issues, followed by a list of technical requirements and vendor products that apply to operational data integration projects.

The Three Broad Practice Areas of Data IntegrationTDWI’s position is that diverse data integration practices are distinguished by the larger technical projects or business initiatives they support. So, this report defines data integration practices by their associated projects and initiatives. Figure 1 summarizes these projects, initiatives, and practices (plus relationships among them) in a visual taxonomy. There are three broad practice areas for data integration:

• Analytic data integration (AnDI) is applied most often to data warehousing (DW) and business intelligence (BI), where the primary goal is to extract and transform data from operational systems and load it into a data warehouse. It also includes related activities like report and dashboard refreshes and the generation of data marts or cubes. Most AnDI work is executed by a team set up explicitly for DW or BI work.

• Operational data integration (OpDI) is more diverse and less focused than AnDI, making it harder to define. For this reason, many of the users TDWI interviewed for this report refer to OpDI as the “non-data-warehouse data integration work” they do.1 To define it more positively, OpDI is: “the exchange of data among operational applications, whether in one enterprise or across multiple ones.” OpDI involves a long list of project types, but it usually manifests itself as projects for the migration, consolidation, collocation, and upgrade of operational databases. These projects are usually considered intermittent work, unlike the continuous, daily work of AnDI. Even so, some OpDI work can also be continuous, as seen in operational database synchronization (which may operate 24x7) and business-to-business data exchange (which is critical to daily operations in industries as diverse as manufacturing and retail or financials and insurance). OpDI work is regularly assigned to the database administrators and application developers who work on the larger initiatives with which OpDI is associated. More and more, however, DW/BI team members are assigned OpDI work.

Operational data integration is plagued

by poor staffing, legacy solutions, and

hand coding.

The world of data integration includes

analytic, operational, and hybrid practices.

1 For another definition of operational data integration, see the article “Operational Data Integration” by Philip Russom in TDWI’s What Works, volume 23. www.tdwi.org/publications/whatworks

www.tdwi.org 5

Introduction

• Hybrid data integration (HyDI) practices fall in the middle ground somewhere between AnDI and OpDI practices. HyDI includes master data management (MDM) and similar practices like customer data integration (CDI) and product information management (PIM). In a lot of ways, these are a bridge between analytic and operational practices. In fact, in the way that many organizations implement MDM, CDI, and PIM, they are both analytic and operational.

Figure 1. A taxonomy of data integration practices.

As a quick aside, let’s remember that data integration is accomplished via a variety of techniques, including enterprise application integration (EAI), extract, transform, and load (ETL), data federation, replication, and synchronization. IT professionals implement these techniques with technique-specific vendor tools, hand coding, functions built into database management systems (DBMSs) and other platforms, or all of these. And all the techniques and tool types under the broad rubric of data integration operate similarly in that they copy data from a source, merge data coming from multiple sources, and alter the resulting data model to fit the target system that data will be loaded into. Because of the similar operations, industrious users can apply just about any technique (or combination of these) to any data integration implementation, initiative, project, or practice—including those for OpDI.

The Three Main Practice Areas within OpDINow that we’ve defined AnDI, HyDI, and OpDI, we can dive into the real topic of this report: OpDI and its three main practice areas, as seen in the bottom layer of Figure 1.

DATA MIGRATIOn2

Although it’s convenient to call this practice area “data migration,” it actually includes four distinct but related project types. Data migrations and consolidations are the most noticeable project types, although these are sometimes joined by similar projects for data collocation or database upgrade. Note that all four of these project types are often associated with applications work. In other words, when applications are migrated, consolidated, upgraded, or collocated, the applications’ databases must also be migrated, consolidated, upgraded, or collocated.

Migration. Data migrations typically abandon an old platform in favor of a new one, as when migrating data from a legacy hierarchical database platform to a modern relational one. Sometimes the abandoned database platform isn’t really a “legacy”; it simply isn’t the corporate standard.

OpDI practices can be implemented with many different techniques and tool types.

Data migration encompasses four related project types.

2 For a discussion of development processes for data migration, see the TDWI Monograph Best Practices in Data Migration, by Philip Russom, online at www.tdwi.org/research/monographs.

BUSINESS INITIATIVES AND TECHNOLOGY IMPLEMENTATIONS

DATA INTEGRATION PRACTICES

Analyticdata integration

(AnDI)

Hybriddata integration

(HyDI)

Operationaldata integration

(OpDI)

Datawarehousing (DW)

Business intelligence (BI)

Customer data integration (CDI)

Product information mgt

(PIM)

Master data mgt (MDM)

Data migrationData

synchronization

Business-to-business (B2B)data exchange

6 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

Consolidation. Many organizations have multiple customer databases that require consolidation to provide a single view of customers. Data mart consolidation is a common example in the BI world. And consolidating multiple instances of a packaged application into one involves consolidating the databases of the instances.

Upgrade. Upgrading a packaged application for ERP or CRM can be complex when users have customized the application and its database. Likewise, upgrading to a recent version of a database management system is complicated when users are two or more versions behind.

Collocation. This is often a first step that precedes other data migration or consolidation project types. For example, you might collocate several data marts in the enterprise data warehouse before eventually consolidating them into the warehouse data model. In a merger and acquisition, data from the acquired company may be collocated with that of the acquiring company before data from the two are consolidated.

These four data migration project types are related because they all involve moving operational data from database to database or application to application. They are also related because users commonly apply one or more of these project types together. Also, more and more users apply the tools and techniques of data integration to all four. But beware, because migration projects are intrusive—even fatal—in that they kill off older systems after their data has been moved to another database platform.

DATA SynCHROnIzATIOn

Killing off a database or other platform—the way data migrations and consolidations do—isn’t always desirable or possible. Sometimes it’s best to avoid the risk, cost, and disruption of data migration and leave redundant applications and databases in place. When these IT systems share data in common—typically about business entities like customers, products, or financials—it may be necessary to synchronize data across the redundant systems so the view of these business entities is the same from each application and its database. For example, data synchronization regularly syncs customer data across multiple CRM and CDI solutions, and it syncs a wide range of operational data across ERP applications and instances. Furthermore, when a data migration project moves data, applications, and users in multiple phases, data sync is required to keep the data of old and newly migrated systems synchronized.

Note that true synchronization moves data in two or more directions, unlike the one-way data movement seen in migrations and consolidations. When each database involved in synchronization is subject to frequent inserts and updates, it’s inevitable that some data values will conflict when multiple systems are compared. For this reason, synchronization technology must include rules for resolving conflicting data values.

Hence, synchronization is a distinct practice that’s separate from migration, consolidation, and other similar OpDI practices, because—unlike them—it leaves original systems in place, is multi-directional, and can resolve conflicting values on the fly. Furthermore, migrations and consolidations tend to be intermittent work, whereas data synchronization is a permanent piece of infrastructure that runs daily for years before reaching the end of its useful life.

BUSIneSS-TO-BUSIneSS (B2B) DATA exCHAnGe

For decades now, partnering businesses have exchanged data with each other, whether the partners are independent firms or business units of the same enterprise. A minority of corporations use applications based on electronic data interchange (EDI) operating over value-add networks (VANs).

Data migration is intrusive. Data sync

is not.

www.tdwi.org 7

Introduction

These are currently falling from favor, because EDI is expensive and limited in the amount and format of data it supports. In most B2B situations, the partnering businesses don’t share a common LAN or WAN, so they share data in an extremely loosely coupled way, as flat files transported through file transfer protocol (FTP) sites. Flat files over FTP is very affordable (especially compared to EDI), but rather low-end in terms of technical functionality. In recent years, some organizations have upgraded their B2B data exchange solutions to support XML files and HTTP; this is a small step forward, leaving plenty of room for improvement.

B2B data exchange is a mission-critical application in industries that depend on an active supply chain that shares a lot of product information, such as manufacturing and retail. It’s also critical to industries that share information about people and money, like financials and insurance. Despite being mission-critical, B2B data exchange in most organizations remains a low-tech affair based on generating and processing lots of flat files. According to users TDWI Research interviewed for this report, most B2B data exchange solutions (excepting those based on EDI) are hand-coded legacies that need replacing. Most of these have little or no functionality for data quality or master data management, and they lack any recognizable architecture or modern features like Web services. Hence, in the user interviews, TDWI found that organizations are currently planning their next-generation B2B data exchange solutions, which will be built atop vendor tools for data integration with support for data quality, master data, services, business intelligence, and many other modern features.

USER STORY DIffEREnT InDUSTRIES havE DIffEREnT OpDI nEEDS.

“I worked in financial services for years, where I was unpredictably bombarded with system migration and

consolidation work, as the fallout of periodic mergers and acquisitions. Now that I work in e-commerce, my

operational data integration work focuses on upgrades and restructuring of our e-commerce applications,

largely to give them greater speed and scalability.”

Why Care about Operational Data Integration now?There are many reasons organizations need to revisit their OpDI solutions now to be sure they address current and changing business requirements:

• OpDI is a growing practice. More organizations are doing more OpDI; it is an increasing percent of the workload of data integration specialists and other IT personnel. Despite the increase in OpDI work, few organizations are staffing it appropriately.

•OpDI solutions are in serious need of improvement or replacement. Many are hand-coded legacies that need to be replaced by modern solutions built atop vendor tools.

• OpDI solutions tend to be feature poor. They need to be augmented with functions they currently lack for data quality, master data, scalability, maintainable architecture, Web services, and modern tools for better developer productivity and collaboration.

• OpDI and AnDI have different goals and sponsors. Hence, the two have different technical requirements and organizational support. Don’t assume you can do both with the same team, tools, and budget.

• OpDI addresses real-world problems and supports mission-critical applications. So you should make sure it succeeds and contributes to the success of the initiatives and projects it supports.

In short, operational data integration is a quickly expanding practice that user organizations need to focus on now—to foster its growth, staff it properly, provide it with appropriate technical

Though mission-critical, most B2B OpDI solutions receive minimal resources.

you should care about OpDI because it’s mission critical, growing, and needs serious upgrades.

8 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

infrastructure, and assure collaboration with other business and technology teams. The challenge is to develop the new frontier of operational data integration without scavenging unduly from similar efforts in analytic data integration.

USER STORY fOR OpDI SUCCESS, CREaTE an IDEal DESIgn anD STICk wITh IT.

“Our success in operational data integration is due to a best practice that we have developed,” said

Harpreet Sohal, the enterprise application architect at Wescorp (a credit union company). “When a new

project comes along—like the B2B analytic Web site we’re working on now—we brainstorm and produce an

ideal design, as if we have massive resources—which we don’t! Then we diligently stay as close to the ideal

design as possible, diluting it only when there’s a truly compelling business or technology reason. In my

experience, this approach provides more resources in the short term, and yields a more extendable and

scalable solution for the business in the long term.”

The State of Operational Data IntegrationGrowth of the OpDI PracticeOne of TDWI Research’s positions concerning OpDI is that it is growing as a practice, and the growth can be defined different ways. For example, as discussed later in this report, OpDI is growing in capabilities and architectural design as organizations replace their hand-coded legacies with OpDI solutions built atop modern vendor tools. As another example of growth, OpDI is an increasing percentage of an organization’s overall workload for data integration, which is one of the main reasons organizations need to revisit resources they’re providing for OpDI.

To quantify the state of the latter form of growth, this report’s survey asked: “With data integration (DI) usage in your organization, what is the approximate percentage split between analytic DI applied to business intelligence (BI) and data warehousing (DW) versus operational DI applied to database migrations, consolidations, synchronizations, and so on?” In fact, TDWI had asked this question four times before in Technology Surveys distributed at TDWI World Conferences. If we bring together the responses from all these surveys—plus a similar one this author conducted years ago at Forrester Research—a pattern emerges. (See Figure 2.)

Over time, the percentage split has shifted from 81% AnDI and 19% OpDI in 2004 to 51% AnDI and 49% OpDI in 2008. More simply put (and with numbers rounded), AnDI versus OpDI usage has progressed from an 80/20 split to a 50/50 split. This is a dramatic shift, given that it took only four years. Needless to say, this has been one of the most influential trends in data integration in recent memory—and it’s not over.

The surveys summarized in Figure 2 corroborate the anecdotal evidence TDWI Research has been hearing from data management professionals for years: resources are increasingly being diverted to OpDI as organizations are forced to upgrade legacy OpDI solutions or as organizations seize business and technology opportunities that succeed only when supported by excellence in OpDI. The next two sections of this report will present several examples of these opportunities.

It’s important to note that the shifting percentage split does not mean that growth in OpDI is forcing a contraction in AnDI. Instead, both are growing, and OpDI’s increasing percentage of data integration workload indicates that OpDI is growing a bit faster than AnDI.

OpDI is growing more sophisticated

and capable.

OpDI is an increasing percentage of all data

integration work.

Both AnDI and OpDI are growing practices.

www.tdwi.org 9

The State of Operational Data Integration

With data integration (DI) usage in your organization, what is the approximage percentage split between analytic DI applied to business intelligence (BI) and data warehousing (DW) versus operational DI applied to database migrations, consolidations, synchronizations, and so on?

november 2004 (Source: Forrester Wave report) 81% 19%

February 2006 (Source: tdWi tech Survey) 75% 25%

February 2007 (Source: tdWi tech Survey) 61% 39%

november 2007 (Source: tdWi tech Survey) 75% 25%

August 2008 (Source: tdWi tech Survey) 63% 37%

December 2008 (Source: tdWi research Survey) 51% 49%

Figure 2. Percentage splits of AnDI versus OpDI usage over time

OpDI in Support of Business Initiatives and Technical ProjectsOne of this report’s assertions is that OpDI is not performed in a vacuum, nor should it be. Instead, it supports larger business initiatives and technology projects. To get a sense of what these initiatives and projects are and which are treated to OpDI the most, TDWI Research asked respondents to “Rate how often data integration (either operational or analytic) is applied to the following business initiatives.” (See Figure 3, next page.) A similar question asked respondents to rate OpDI’s involvement with technical projects. (See Figure 4.) There are many ways to interpret respondents’ answers, but here are a few highlights.3

BUSIneSS InITIATIVeS

Broad and common initiatives involve OpDI most often. Thirty-eight percent of survey respondents report regularly applying OpDI to improvements in business operations, followed by data as an enterprise asset programs (34%). These are broad and common programs, which is probably why they bubbled up to the top of Figure 3.

Compliance and governance are tightly coordinated initiatives. And that’s probably why they rated so similarly in the survey; OpDI is applied occasionally to both by 35% of respondents.

Technology-focused business initiatives rely on OpDI. Roughly 40% of respondents (in Figure 3) apply OpDI occasionally to IT platform standardization, legacy platform modernization, and IT centralization.

Some important business events don’t occur often. According to the survey, OpDI is seldom or occasionally applied to initiatives for mergers and acquisitions, corporate reorganizations, and data center renovations—but not regularly or constantly, because these events occur sporadically.

OpDI regularly contributes to business efficiency, compliance, and governance.

3 Note that Figures 3 and 4 sort survey responses by the “regularly” column to reveal where OpDI is most often applied. Other sorts would obviously put initiatives and projects in different orders, thus revealing other attributes, like where OpDI is least often applied.

Analytic data integration (AnDI)

Operational data integration (OpDI)

10 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

Rate how often data integration (either operational or analytic) is applied to the following business initiatives.

Never Seldom occaSioNally regularly coNStaNtly

Improvements to business operations 6% 9% 37% 38% 10%

Data as an enterprise asset program 9% 18% 30% 34% 9%

Compliance 7% 17% 35% 31% 10%

IT platform standardization 10% 21% 38% 27% 4%

Governance 12% 23% 35% 26% 4%

legacy platform modernization 10% 23% 40% 23% 4%

IT centralization 12% 22% 39% 23% 4%

Mergers and acquisitions 24% 21% 28% 20% 7%

Partner programs 28% 26% 29% 15%

Corporate reorganizations 20% 32% 32% 13%

Data center renovation 21% 33% 34% 9%

Other 41% 24% 28% 5%

Figure 3. Based on 336 respondents. Sorted by the “Regularly” column.

TeCHnICAL PROjeCTS

Data integration is most often applied to DW/BI. An impressive 42% of respondents (in Figure 4) reported “integrating data for data warehousing and business intelligence” regularly and 38% reported doing it constantly. This isn’t surprising given that the AnDI solutions typical of DW/BI are permanent infrastructure that runs daily (sometimes multiple times daily), unlike the temporary and intermittent use of DI resources typical of many OpDI projects. On a related point, consolidating BI data stores was also revealed to be a common project (30% do it regularly).

Data sync fared well in the survey. Respondents reported regularly synchronizing customer data (36%) and synchronizing product data (34%). (See Figure 4.) Similar practices are not as regular, but still prominent, namely customer data integration (28%), product data integration (23%), and business-to-business data exchange (24%).

System consolidations are fairly regular, whereas migrations are relatively rare. Thirty-five percent of respondents admit to regularly consolidating operational or transactional databases, followed by consolidating homegrown applications (28%) and consolidating packaged applications (22%). Similar projects are not nearly as regular (because the need is rarer), namely migrating a database from a legacy platform to a relational one (19%) and migrating a database from one relational brand to another (10%).

2%

3%

3%

2%

OpDI often takes the form of data sync and

system consolidations.

www.tdwi.org 11

The State of Operational Data Integration

Rate how often data integration (either operational or analytic) is applied to the following technical projects.

Never Seldom occaSioNally regularly coNStaNtly

Integrating data for DW and BI 4% 5% 11% 42% 38%

Synchronizing customer data across systems 6% 18% 25% 36% 15%

Consolidating operational or transactional databases 5% 20% 28% 35% 12%

Synchronizing product data across systems 11% 17% 26% 34% 12%

Consolidating BI data stores (like data marts, DWs) 6% 18% 31% 30% 15%

Customer data integration (for CRM, not data warehouse) 17% 25% 24% 28% 6%

Consolidating homegrown applications 7% 21% 39% 28% 5%

Business-to-business data exchange 19% 24% 27% 24% 6%

Product data integration (for supply chain, not DW) 25% 21% 25% 23% 6%

Consolidating packaged apps (like ERP, CRM, financials) 12% 26% 33% 22% 7%

Master data management hub 30% 21% 23% 19% 7%

Migrating databases from legacy to relational platforms 13% 29% 35% 19% 4%

Migrating a database from one relational brand to another 19% 40% 29% 10%

Figure 4. Based on 336 respondents. Sorted by the “Regularly” column.

Benefits of Operational Data IntegrationWhen asked whether OpDI projects yield significant benefits, a whopping 81% of survey respondents answered “yes.“ Given that benefits clearly exist in users’ minds, it now behooves us to determine what those benefits are and which seem to be the most prominent. And that’s why the survey asked: “What are the leading benefits of operational data integration?” (See Figure 5.) OpDI’s benefits fall into three categories.

What are the leading benefits of operational data integration? (Select one to three.)

Improves data’s quality, standards, models, metadata, and so on 64%

Improves efficiency of business operations 52%

Provides a more complete view of the business 38%

Increases visibility into business operations 36%

Eases data management tasks 28%

Resolves contradictory data 25%

Reduces IT administrative costs 15%

Supports organizational changes in the business 11%

Modernizes data and applications 9%

Increases visibility across supply chains 7%

Figure 5. Based on 959 responses from 336 respondents.

Improvements to data are the leading benefits of OpDI. More than any other answer to the question about OpDI benefits, 64% of survey respondents chose “improves data’s quality, standards, models, metadata, and so on.” Related benefits concern how OpDI resolves contradictory data (25%, a common goal of data sync) and modernizes data and applications (9%, as seen in most system migrations).

2%

OpDI’s leading benefits are improvements to data, business operations, and data management.

12 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

OpDI yields business benefits, too. High percentages of survey respondents feel that OpDI improves the efficiency of business operations (52%), provides a more complete view of the business (38%), and increases visibility into business operations (36%). OpDI is less beneficial in other areas of the business where it supports organizational changes in the business (11%) and increases visibility across supply chains (7%).

The management of data is also a beneficiary of OpDI. In this regard, OpDI eases data management tasks (28%) and reduces IT administrative costs (15%).

Barriers to Operational Data IntegrationAs we just saw, OpDI has its benefits. But OpDI also faces many barriers to its success. TDWI Research asked survey respondents to “list some problems that have inhibited your operational DI projects.” Representative complaints from users are summarized below so you can anticipate these problems and make plans to avoid or solve them.

• Poor quality of operational data: “Poor quality of data,” “duplicate data,” “non-standard, incomplete, or inaccurate data,” “lack of corporate-wide data definitions,” “application data creation is poorly coded,” “non-compliance with agreed enterprise data standards,” and “third-party source [data is] outside of our control.”

• Ineffective ownership or sponsorship: “Unclear business ownership of data,” “conflicting data ownership,” “identifying realistic ownership responsibilities,” “obtain[ing] executive-level sponsorship,” “weak sponsorship,” and “lack of corporate sponsorship and funding.”

• Personnel aren’t allowed enough time: “Lack of lead time,” “just-in-time problem recognition,” “lack of time,” “limited time and budget,” and “time and resource constraints.”

• OpDI’s ROI is difficult to quantify: “Operational data integration doesn’t have the direct appearance of improving business performance,” “cost of tools,” “cost to implement,” “cost of integration and questionable ROI,” “costs with hard to quantify returns,” and “[we lack] a sponsor who knows, understands, and owns the business value.”

• Lack of cross-functional collaboration: “Technical team unaware of business impact,” “business team unaware of technical complexity,” “IT driving the project; not listening to the business,” “lack of timely customer input,” “organization barriers,” “lack of centralized data governance,” and “lack of cross-organizational support.”

USER STORY MODERnIzIng B2B DaTa ExChangE USUallY EnTaIlS vEnDOR TOOlS.

“Our current B2B data exchange solution works reliably, but—being a hand-coded legacy—it lacks

features we need and it’s hard to expand and maintain,” said Mike Romano, the manager of power supply

information systems at the Central Vermont Public Service Corporation. “So, my department is aggressively

migrating the solution to a vendor’s tool for extract, transform, and load. We feel that the tool gives us

greater productivity than hand coding, especially in areas like reuse, testing, and the deployment process.

And the tool is part of a suite that includes other tools that we need to modernize our operational data

integration solution, namely tools for data quality, master data management, and Web services. Since the

data integration tool comes from our BI vendor, we are tying operational data into operational dashboards

in ways we couldn’t do before. Overall, we think the tool-based approach will help our operational data

integration solution grow to support our changing power distribution business.”

OpDI suffers problems with data quality,

sponsorship, staffing, ROI, and collaboration.

www.tdwi.org 13

Best Practices

4 For more information about migrations, see Ten Mistakes to Avoid When Migrating Databases, available to TDWI Members at www.tdwi.org/Publications/TenMistake.

Best practices in Operational Data IntegrationAs explained in this report’s introduction, there are three main practice areas within the broader practice of OpDI: data migration, data sync, and business-to-business (B2B) data exchange. Let’s look in more detail at the project types that define each of these practice areas.

Data MigrationAs pointed out earlier, this practice area includes four related project types. It’s mostly about data migrations and consolidations, but it also includes related project types like database collocations and database upgrades.

DATA MIGRATIOn

In most migrations, a database (defined as a collection of data, not a vendor’s database management system [DBMS]) is moved from one platform to another. Typical examples include migrating a database from a mainframe to a less expensive open system running LINUX or UNIX, from a legacy platform to a more easily supported one, or from a non-standard DBMS brand to one that’s a corporate standard. Migrating to a newer or standard database platform increases the flow of information among systems by making data access and integration easier. This way, database migration contributes to advanced integration goals, like real-time or on-demand information delivery. And, in many cases, the new platform is less expensive to acquire and maintain.

Migration is intrusive (like its close cousin consolidation) because it kills off the original system, forcing changes to the business processes and applications that depend on that system. For this reason, technical personnel must not work independently. You’re not just migrating data; you’re also migrating applications, business units, and their end users. So, you have to collaborate with business managers and others to establish a plan for migrating data, people, and businesses in phases.

With database migrations off of legacy platforms (especially the mainframe), you need to check contracts before committing to a plan. Sometimes issues of licensing, leasing, depreciation, or amortization can halt the retirement of an old or non-standard system.

Be aware that database migration is something of a myth; it’s more like new development when the target system doesn’t exist and must be designed and built from scratch.4 In almost all projects for data migration—and consolidation and upgrade, too—data must be transformed as it is moved, because the data models of the source and target platforms are different. If the database being migrated includes stored procedures and other in-database procedural logic, these will need to be redeveloped on the new platform (though some SQL routines may be portable). And there are the usual quality, master, and metadata issues. On the one hand, developing the target platform for a data migration project is a lot of work. On the other hand, it’s an opportunity to greatly improve a database’s quality, metadata, model, and so on.

DATA COnSOLIDATIOn

This is where IT personnel consolidate multiple, similar databases (often with redundant content) into a single database with a single data model. If a single database isn’t a realistic goal, then data is consolidated into fewer databases. The most common project is probably the consolidation of redundant CRM or ERP applications and their databases. As another example, many organizations have multiple mid-tier databases containing slightly different views of customer data, and these are

Migration is a myth. It’s more like developing a new database.

When you migrate a database, you must also migrate its end users and business processes.

14 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

ripe for consolidation. On the analytic side of data integration, database consolidations are a popular way to reduce the number of redundant data marts.

The upside of database consolidation is that it reduces IT maintenance costs by consolidating data into fewer servers. Furthermore, it increases visibility into business processes by putting data “eggs” in fewer baskets. As with migration, database consolidation requires more development work than you might suppose, but it’s a great opportunity for improving data and its structure.

The downside is that consolidation is extremely intrusive (like migration) in that it kills off the consolidated systems. Their owners, sponsors, and users may resist passing control, typically to a central organization, and these people are usually forced to change business processes associated with the consolidated systems. Again, technical personnel must coordinate the consolidation of data and applications with managers of the affected business units.

DATABASe UPGRADe

This is where version X of a vendor’s DBMS brand is upgraded to version X+1 of the same brand. A database upgrade can be a stand-alone project or an interim step within larger OpDI implementations. For instance, it’s a good idea to upgrade all databases to the same release before consolidating them. And upgrading a legacy or non-standard database to the most recent version first sometimes makes it easier to migrate. Furthermore, when a new version of a DBMS changes how it stores data, a database upgrade can require transformational processing to get the data into a model optimized for the new version. On occasion, multiple upgrades are required, say from version 6 to 7, then version 7 to 8.

A different kind of database upgrade—focused on data model, not DBMS—may be needed when the customization of a packaged application alters its database’s schema. In these cases, upgrading the application requires some form of OpDI to remodel the database into a schema conducive to the new application version.

DATABASe COLLOCATIOn

Collocation and consolidation are similar, but they address different OpDI goals:

• Database consolidation is where diverse data models from multiple databases are merged into a single data model in a single database managed by a single DBMS instance. Database consolidation can be time-consuming and risky work, so sometimes it’s faster and easier to collocate databases.

• Database collocation simply copies multiple databases into a single DBMS instance (typically on a single hardware server) without merging the data models. In other words, one DBMS instance manages multiple databases. While collocation reduces IT maintenance costs (by centralizing previously disparate databases), it doesn’t solve the information silo problem like database consolidation does.

Database collocation (like database upgrade) is usually an independent project, but it can also be an interim step within a larger OpDI project. For instance, if the databases to be consolidated are on diverse DBMSs, collocating them on the same DBMS instance might help you with the eventual consolidation. Likewise, many data migration projects require multiple OpDI techniques, executed in multiple project phases. To add to the complexity, each phase is iterative in that you run many tests before the final migration.

Upgrades need OpDI when users customize a platform or vendors

change it radically.

Collocation and upgrades may be

independent projects or interim steps in other

OpDI projects.

Consolidation reduces maintenance costs by

erasing older database platform investments.

www.tdwi.org 15

Best Practices

This is why you should distinguish the related OpDI practices of migration, consolidation, upgrade, and collocation. Basic project design depends on identifying the OpDI types needed and the order in which they should be done. This is a good engineering practice that breaks the migration of data into manageable chunks instead of one risky big bang.

USER STORY OpDI In SUppORT Of an M&a IS COMplEx anD BUSInESS DRIvEn.

“The bank I work for is in the process of acquiring another bank,” said an audit executive at a very large

American bank. “Upper management is working out a multi-phase, two-year plan for the merger-and-

acquisition, which covers the migration and consolidation of people and processes, as well as their IT

systems. The business leads in the merger plan. So, we have to let the lines of business shake out what

they need before we can make decisions about information lifecycle management. later phases of the

plan involve rolling over user populations as systems are readied for them. As part of the big picture, we’re

taking advantage of the merger to rethink our distributed IT model. So far, we like the centralized model of

the bank we’re acquiring, so we may adopt it, which would deeply affect how we migrate and consolidate

applications and data.”

Data SynchronizationThe types of operational data integration we’ve seen so far—migrations, consolidations, collocations, and upgrades—relocate data in one-off projects that rarely require that integration infrastructure be left in place. Data synchronization—or simply data sync—is different in that it leaves databases in place and exchanges data among them, which requires a permanent integration infrastructure for daily or continuous data feeds. In fact, when it’s not possible or desirable to consolidate or migrate databases, managing data that’s redundant across them usually requires synchronization. Thus, data sync can be an alternative to more invasive OpDI practices like data migrations. Yet, data sync can also make migrations more appealing by reducing downtime in the switch-over process. For example, when a newly migrated system runs simultaneously with the old system, bidirectional data sync keeps the two systems synchronized and highly available.

The downside of synchronization is the cost of the infrastructure and its maintenance. The upside, however, is that data sync is non-invasive because it leaves database investments and the business processes that depend on them intact. And most forms of data sync move data continuously, providing the freshest data values possible.

The most common example of data synchronization involves customer data. As TDWI Research explained in a recent report on customer data integration (CDI),5 many organizations have multiple applications for customer relationship management (CRM) or similar customer-facing functions like sales force automation (SFA), call center, order entry, billing, shipping, and so on. All of these share common information about the organization’s customers, and business units are increasingly under pressure to have as complete a view as possible of all customer activities across an enterprise. Hence, many CDI solutions synchronize customer data (whether for operational or analytic purposes) across multiple CRM and CRM-ish applications and their databases.

Many firms have ERP applications from multiple vendors or multiple instances of one vendor’s application. Again, data sync helps end users of individual applications see a more complete image of business processes that reach across multiple ERP applications. In cases with multiple instances, data sync can make all the instances look like one global instance. Other examples include synchronizing master data across multiple applications, synchronizing mobile devices with enterprise databases, and synchronizing primary and shadow databases for the purpose of high availability or scalability.

5 See the TDWI Best Practices Report Customer Data Integration: Managing Customer Information as an Organizational Asset, by Philip Russom, available at www.tdwi.org/research/reportseries.

Data sync regularly synchronizes customer data across CRM systems and operational data across eRP systems.

Data sync preserves and extends investments in database platforms.

16 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

THe OVeRLAP OF DATA SynC AnD RePLICATIOn

Synchronization and replication overlap considerably in that they use similar technologies and achieve similar results. For this reason, the two are regularly compared and confused. Sorting them out is worthwhile, because to really appreciate synchronization’s advanced capabilities, you have to understand how it differs from simpler configurations of replication.

For example, replication is the most common enabling technology for database high availability (HA). Various HA configurations are possible, but in a basic master-slave configuration, replication copies data from a primary database to a secondary one that can substitute for the primary in case of failure. This is literally “replication,” meaning that data is copied without transformation, a task that replication excels at. Since the secondary system is not fully utilized, the trend in user practices is to deploy a so-called active-active or multi-master configuration. This is where applications update both databases, which then update each other—resulting in database synchronization.

When you get to an advanced configuration of replication like this, you see that replication overlaps with data synchronization. Hence, one possible definition of data synchronization is that it’s an advanced configuration of replication. But this is a bit misleading. While it’s true that many users’ data synchronization solutions are built with replication technology, other technologies can also implement or contribute to data sync, including EAI, ETL, and hand coding.

ADVAnCeD ReqUIReMenTS FOR DATA SynC

Whether you call it replication or synchronization—or whether you implement your data sync solution with replication or some other technology—there are prominent data sync project requirements that you should keep in mind, because they tend to sort basic configurations from more advanced ones:

• Direction of data flow. Although most replication configurations move data one way (this is typical of HA), data sync is inherently bidirectional or multi-directional. By definition, it moves data two or more directions among multiple databases, files, applications, and so on.

• Conflict resolution. The multi-directional nature of data synchronization gives it the added burden of resolving conflicting data values. After all, if data sources and targets are being updated regularly, it’s inevitable that some data values will conflict when they are compared during synchronization. Note that the development of a data sync solution usually entails defining rules for resolving data conflicts.

• Heterogeneous sources and targets. In straightforward database HA and data distribution applications, users most often use the replication capabilities built into a DBMS brand. Support among the DBMS brands varies, but most work best with their own brand, plus a few other common data sources and targets (like flat files, SAP, Microsoft SQL Server, and so on). OpDI projects, however, regularly involve interfacing with a more diverse collection of data sources and targets. In these cases, synchronizing heterogeneous sources and targets is best done with an independent or third-party product.

• Data transformation. Basic replication configurations only need to copy data unaltered from a data source to a target. However, heterogeneous data environments demand data transformation capabilities, if for no reason other than normalizing and merging complex data coming from diverse schema. Again, this advanced capability (commonly required in OpDI) is best achieved via a third-party product.

Synchronization is similar to replication,

but more advanced.

www.tdwi.org 17

Best Practices

USER STORY DaTa SYnC Can EnhanCE MIgRaTIOnS/COnSOlIDaTIOnS.

“I’ve always used ETl tools with data migrations and consolidations, because ETl gives me a lot of

options for handling the hefty data transformation requirements of these projects. Right now, I’m

working on a migration of our core e-commerce application. This is a multi-phase migration in which the

old and new application platforms will be running simultaneously for at least a year. We’ll do the heavy

lifting and data transformation work in ETl, augmented with replication tools to keep the old and new

platforms synchronized.”

Business-to-Business (B2B) Data exchangeA growing area within operational data integration is inter-organizational data integration. This usually takes the form of documents or files containing data that are exchanged between two or more organizations. Depending on how the organizations are related, data exchange may occur between business units of the same company or between companies that are partners in a business-to-business (B2B) relationship. Either way, this OpDI practice is called B2B data exchange.

DATA STAnDARDS ARe CRITICAL SUCCeSS FACTORS FOR B2B DATA exCHAnGe

One of the interesting architectural features of B2B data exchange is that it links together organizations that are incapable of communicating directly. That’s because data is flowing from the operational and transactional IT systems of one organization to those of another, and the systems of the two are very different in terms of their inherent interfaces and data models. Hence, data exchange almost always requires a third, neutral data model in the middle of the architecture that exists purely for the sake of B2B communication and collaboration via data. (See Figure 6.)

Data exchange architecture requires a third, neutral data model in the middle.

Figure 6. Basic data flow (left to right) for B2B data exchange.

As an analogy, consider the French language. In the seventeenth, eighteenth, and nineteenth centuries, French was the language of diplomacy—literally a lingua franca—through which people from many nations communicated, regardless of their primary languages.

Data exchange very often transports operational data tagged with extensible markup language (XML). Although the X in XML stands for extensible, it might as well stand for exchange, because the majority of uses of XML involve B2B data exchange. In many ways, B2B data exchange is XML’s killer app, and XML has become a common B2B lingua franca.

BUSINESS A B2B DATA EXCHANGE BUSINESS B

Applications Data Integration

Data Quality

Data Integration

Data Quality

ApplicationsIntermediateData Models

B2B OpDI hinges on intermediate data models.

B2B data exchange is xML’s killer app.

18 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

6 For a discussion of data stewardship requirements in B2B OpDI projects, see the TDWI Monograph Complex Data: A New Challenge for Data Integration, by Philip Russom, online at www.tdwi.org/research/monographs.

7 For a discussion of product data quality issues in B2B scenarios, see the TDWI Monograph The Unique Requirements of Product Data Quality, by Philip Russom, online at www.tdwi.org/research/monographs.

For the data model in the middle of a data exchange architecture to be effective as a lingua franca, all parties must be able to read and write it. This is why open standards are so important to data exchange, including industry-specific standards like ACORD for insurance, HIPAA and HL7 for healthcare, MISMO for mortgages, Rosettanet for manufacturing, SWIFT and NACHA for financial services, and EDI for procurement across industries. From this list, you can see that data exchange is inherently linked to standards that model data with a fair amount of complexity, in the sense of semi-structured data (as with all XML) and hierarchically structured data (as in ACORD and MISMO). Even so, some organizations forgo XML-based open standards in favor of unique, proprietary data models that all parties agree to comply with.

Note, however, that intermediary standards specify the format or model for data exchange, but they rarely specify all content combinations. This is especially the case with product data, because product attributes have so many possible (and equivalent) values, like blue, turquoise, cyan, aquamarine, and so on. Therefore, it behooves the receiving business to unilaterally enforce its own data content standards through a kind of “re-standardization” process.

SPeCIAL ReqUIReMenTS FOR B2B DATA exCHAnGe

A few situations demanding special requirements for B2B data exchange deserve note:

• Some B2B OpDI scenarios require special data stewardship functions. Relevant to B2B data exchange, a data steward may need to handle exceptions that a tool cannot process automatically. For instance, a data steward may need to approve and direct individual transactions or review and correct mappings between ambiguous product descriptions. To enable this functionality in an OpDI solution, it probably needs to include a data integration or data quality tool that supports special interactive functions for data governance and remediation, designed for use by data stewards.6

• Centralizing B2B data exchange via a hub reduces complexity and cost. Each industry has a variety of document formats and industry standards required for data exchange. Companies must comply with the latest formats and standards in order to maintain their competitive edge, to avoid regulatory penalties, and to prevent a loss of data. Staying current with standards forces users with point-to-point interfaces to invest in constant coding and maintenance. A centralized, hub-based B2B data exchange solution—built atop a vendor tool that is updated as standards change—reduces architectural complexity and maintenance costs.

• B2B data exchange sometimes handles complex data models. All B2B data exchange solutions handle flat files, where every line of the file is in the same simple record format. But some solutions must also handle hierarchical data models, which are more complex and less predictable. Hierarchical models are typical of XML and EDI documents.

• Product data may be unstructured. In product-oriented industries, B2B data exchange regularly handles product data, which often entails textual descriptions of products and product attributes. Processing textual data may require a tool that supports natural language processing or semantic processing, in addition to the specialized data stewardship described earlier.7

Some OpDI solutions need special data

stewardship functions, a hub architecture, and

complex data models.

xML and various data standards provide a lingua franca for

data exchange.

www.tdwi.org 19

Organizational Issues

USER STORY B2B DaTa ExChangE STanDaRDS vaRY BY InDUSTRY.

“Early in my career, I worked in financial services, where de jure data standards—especially SWIFT—are

prominently used for B2B data exchange. But now I’ve been in manufacturing for several years, where

data standards are mostly de facto, typically ‘made up’ by one or more business partners. Either way,

supporting data exchange ‘standards’ and ad hoc formats is an important part of operational data

integration work.”

Organizational Issues for Operational Data IntegrationStaffing Operational Data Integration PracticesOne of the most pressing issues in operational data integration today concerns how it’s staffed and which team manages the staff. As OpDI practices have grown in terms of workload, staff numbers, and sophistication of solutions, opinions have shifted about who should be doing the work and where they belong on the organizational chart. In an ideal world, OpDI would have its own personnel, organizational structure, tools, budget, and so on. Alas, the reality is pretty much the opposite in most organizations today. But there’s still hope. As technical personnel increase the sophistication of OpDI practices and as business people increasingly realize that OpDI is mission-critical, OpDI becomes more visible and therefore gets more resources.

To get a handle on the groups staffing OpDI today, TDWI Research asked: “For DI outside data warehousing, who does the work most often?” (See Figure 7.) Responses show that OpDI is performed by a range of consultants and organizational units.

For DI outside data warehousing, who does the work most often?

System integrator or consultant 25%

Data management group 22%

Data warehouse team 20%

Other 13%

Data integration competency center 12%

Not applicable 8%

Figure 7. Based on 336 respondents

• System integrator or consultant (25%). At the top of the list, consultants perform a lot of OpDI work. This is a good strategy when OpDI work is intermittent and it’s hard to justify permanent staff. Data migrations and consolidations are probably the most intermittent of OpDI work, and therefore are the most appropriate for consulting. But interviewees for this report talked about hiring consultants to set up new business partners for B2B data exchange or new data sync solutions, which were then maintained by regular IT staff.

• Data management group (22%). The in-house organization most often doing OpDI is the data management group. Database administrator (DBA) is a common job title among people implementing and maintaining OpDI solutions, regardless of the OpDI project type. This makes perfect sense, considering that DBAs have the appropriate database experience and most data management groups collaborate regularly with application groups.

OpDI is struggling to get recognition and resources.

20 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

• Data warehouse team (20%). It makes sense that members of a data warehouse (DW) team would be called upon to perform OpDI work. Almost all DW specialists have considerable experience in AnDI, even if they specialize in other areas like data modeling or report design. However, scavenging from the DW team should be avoided, because it delays important work in data warehousing and business intelligence.

• Other (13%). A lot of the comments entered by survey respondents point to application teams, like “application development groups,” “operational application developers,” “application teams,” and “IT applications groups.” This indicates that OpDI is often implemented by application specialists, not database or integration specialists. TDWI suspects that this is most true of B2B data exchange, partially true of data sync, and less true of data migration practices.

• Data integration competency center (12%). To give OpDI a home and to avoid scavenging from AnDI resources, some companies have founded data integration competency centers. The next section of this report looks at these in detail.

One of the complaints voiced by survey respondents is that OpDI is seldom staffed with enough people. In an effort to quantify OpDI staffing numbers today, TDWI Research asked: “How many full-time people were assigned to your most recent operational DI project?’ (See Figure 8.) The question was posed to people experienced with OpDI, so their responses reflect real-world situations.

How many full-time people were assigned to your most recent operational DI project?

1 or a fraction 23%

2 25%

3 19%

4 14%

5 or more 19%

Figure 8. Based on 252 respondents who’ve had personal experience with OpDI projects.

• One to four employees is the norm for OpDI. Responses to the question show that roughly half (48%) of OpDI projects are staffed by only one or two full-time employees (FTEs). Another third (33%) of OpDI projects are staffed by three or four FTEs.

• Larger OpDI teams average 13 employees. In the survey, users who selected “5 or more” were asked to enter a number. Almost all respondents reported having 30 or fewer FTEs assigned to OpDI. A few respondents entered disproportionately large numbers, like 50 and 85. If we exclude these as outliers, then large teams of “5 or more” include 13 FTEs (on average) or 10 FTEs (on median).

USER STORY ThOUgh IT USUallY ISn’T, OpDI Can BE TIghTlY lInkED TO BI.

“There’s more than one way to define operational data integration,” said Kyle Province, a business analysis

manager with Dell Financial Services (DFS), the financing group within Dell Computers. “For us, it’s about

feeding fresh operational data into the executive dashboards and other reports of our operational business

intelligence system. The combination of operational data integration and operational business intelligence

enables DFS executives to frequently monitor such things as debt delinquency and reserve balance. Since

operational data integration is closely tied to operational business intelligence, both are staffed by a

single business intelligence competency center.”

Most OpDI projects have 1 to 4 people; some

reach 13 or more.

www.tdwi.org 21

Organizational Issues

Competency Centers and Similar Organizational StructuresTDWI Research both champions operational data integration and fears it. The problem is that TDWI has seen many organizations accelerate OpDI work by diverting people, tools, and other resources from data warehousing and business intelligence teams. In the process, mission-critical BI work is delayed or derailed completely. To test whether these fears are real, TDWI Research asked: “Have you seen work in business intelligence or data warehousing disrupted because its team was assigned to work in operational DI?” A resounding 58% said “yes,” proving that TDWI’s fears are well founded. So, a challenge organizations face is to grow OpDI by incorporating best practices in data integration learned from related disciplines like AnDI, DW, and BI, but without scavenging resources from those disciplines. Managers of data administration groups have voiced similar concerns, so it seems that the growth of OpDI is stretching resources in many places.

A knee-jerk reaction to this problem is to find funding and give OpDI its own dedicated staff, tools, and other resources. The strongest argument against dedicated staffing is that a lot of OpDI work is intermittent or seasonal, so it’s difficult to keep permanent staff engaged continuously. Indeed, this is true in some organizations, but not in all. To get a sense of how continuous OpDI work is, TDWI Research asked: “In your experience, which of the following best describes the frequency of operational DI work?” (See Figure 9.)

In your experience, which of the following best describes the frequency of operational DI work?

Continuous stream of operational DI projects 46%

Sporadic projects, appearing somewhat predictably 28%

Sporadic projects, appearing unpredictably 24%

Don’t know 2%

Figure 9. Based on 252 respondents who’ve had personal experience with OpDI projects.

• Half of organizations are experiencing continuous OpDI work. Nearly half (46%) report having a “continuous stream of operational DI projects.”

• The other half is experiencing sporadic OpDI work. For a quarter of survey respondents (28%), OpDI projects are sporadic but predictable. Another quarter (24%) handles projects that are both sporadic and unpredictable. Put them together and over half (52%) are experiencing sporadic OpDI work.

Another argument against dedicated staffing for OpDI is that a dedicated OpDI team would be redundant with the AnDI resources of a DW team. This approach worries managers who prefer the leanest and most efficient workforce possible.

So what’s the best way to staff operational data integration work? There are multiple options, but many firms are settling the issue by founding a competency center (sometimes called a center of excellence). A competency center is an organizational unit that is centralized, typically owned and funded by a central entity like the CIO’s office. Charge-back accounting is sometimes involved. The competency center provides shared services that can be allocated to a variety of projects and departments, as workloads demand. In IT, each competency center is usually focused on a single technical discipline. For example, germane to this discussion, a data integration competency center obviously focuses on data integration, and it provides staff, tool, infrastructure, and other resources for all data integration practices, including OpDI, AnDI, and HyDI. Since data integration is closely

OpDI has a history of scavenging personnel from BI and DBA teams.

When OpDI work is seasonal, permanent staff is hard to justify.

22 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

associated with other disciplines, shared services and resources for data integration might come from a data management or business intelligence competency center.

A data integration competency center (or similar organizational structure) offers advantages:

• Provides a collaborative environment. This is good for the emerging practices of OpDI, which need to absorb the data integration best practices of mature AnDI practices.

• Copes with intermittent or seasonal work. The shared services model enables managers to freely allocate personnel to projects as they come and go.

• Avoids redundancy. One organization avoids the inevitable overlap of two. It also fosters reuse and keeps diverse teams from reinventing the wheel.

• Avoids scavenging. On the upside, it’s all one big team under the same management. On the downside, business units used to owning their own team must cede control.

Despite the advantages of a general data integration competency center, some situations need a different approach:

• Outsource sporadic OpDI work. At one end of the spectrum, when OpDI work is extremely intermittent and unpredictable, the best recourse is probably to outsource it to system integrators, consultants, or vendors’ professional services departments.

• Give OpDI its own competency center. At the other end of the spectrum, when OpDI work is undeniably continuous and mission-critical, it may merit its own OpDI competency center. This makes even more sense when OpDI work is also diverse and spread across all three OpDI practices: data migration, data sync, and B2B data exchange.

USER STORY MUlTIplE TOOl TYpES anD a COMpETEnCY CEnTER EnaBlE glOBal B2B DaTa ExChangE.

“The business I work in provides inventory management for over 60 major accounts, most of them retail

chains,” says Raja Musunuru, the director of enterprise data architecture at Sony Pictures Entertainment.

“To enable vendor-managed inventory, large volumes of business-to-business data are exchanged between

us and our client accounts. Since we’re a global business, data is exchanged 24-by-7 across the supply

chain, and many data feeds come and go in near real time. We use a combination of ETl and EAI tools,

where ETl processes large or complex data sets in batch and EAI moves time-sensitive data quickly.

“Obviously, our business depends heavily on operational data integration. So, about three years ago,

I founded a data integration competency center to provide shared services for this and other data

integration work. Staffing varies with workload cycles, but the competency center typically includes five to

seven full-time employees and around 10 contractors, all focused on operational data integration.”

Now that the data integration competency center is up and running, Raja has recently turned his attention

to architectural issues. “In the short term, we need to rearchitect some of Sony Pictures’ operational

data integration solutions to make them more efficient, automated, and intelligent with data movement.

And we’ll continue to beef up master data management. In the long term, however, we look forward to

consolidating multiple solutions into a hub-based solution with services. That will give our business

centralized control and visibility, plus the ability to process more data in real time and add new business

partners quickly.”

Competency centers avoid redundancy

and scavenging while balancing workloads.

www.tdwi.org 23

Organizational Issues

OpDI Team Collaboration and Cross-Functional CommunicationOperational data integration is inherently collaborative, whether it involves data migration, data sync, B2B data exchange, or a combination of these.8

Data migration involves life-and-death decisions. People involved with data migrations, consolidations, collocations, and upgrades decide which databases, DBMSs, and platforms live, die, or retire. A data architect or system architect may take the lead to drive the decision to consensus. But these life-and-death decisions are best made through a collaboration of the people who depend on the data in question, including upper managers who lead enterprise initiatives, the CIO’s office, line of business managers, people from relevant technical teams (like data warehousing or database administration), and—finally—data integration specialists. Furthermore, all forms of data migration take away old systems and introduce new ones. Technical personnel can’t do that without collaborating with the business units that use, own, and fund those systems. The collaboration should gather business requirements up front to guide the actual migration, then produce a plan for coordinating application and database switch-overs and hand-offs later.

Data synchronization makes people decide how data should be described and used. For example, data sync most often operates on customer data, and in some enterprises every department has its own process-specific definition of customer. Before you can effectively sync customer data across departmental applications and databases—in pursuit of the elusive 360-degree view—departments need to agree how customers will be defined and described in those systems. Hence, improvements to metadata and master data are common prerequisites for successful data sync. And that’s a tough collaborative process that forces line-of-business managers to decide how customers must be defined and forces technical staff to change databases and applications accordingly. Such collaboration is beyond the authority of data integration specialists, and must be driven by a strong executive sponsor. As another example, let’s recall that data sync is often used in the switch-over phases of data migration, which adds another layer of collaboration to those projects.

B2B data exchange relies on data standards and cross-business coordination. And these involve varying degrees of collaboration. When your industry has chosen a data standard—the way financials chose SWIFT—then the decision is made for you. Likewise, an influential business partner may demand a certain standard. Even when a partner foists his standard on you, you can’t support it accurately and optimally without some discussion and coordination with him. In most cases, however, data standards are developed or modified on the fly for specific business situations, which demands collaboration among concerned parties. With B2B data exchange, technical and business collaboration is critical to setting up new partnerships and maintaining old ones. And B2B data exchange is usually just one piece of a larger business application, which demands the usual database-to-application and IT-to-business collaborations.

8 For an in-depth discussion of OpDI’s collaborative requirements, see the TDWI Monograph Second Generation Collaborative Data Integration, by Philip Russom, online at www.tdwi.org/research/monographs.

OpDI touches many business and technical disciplines in a collaborative way.

24 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

Technology Requirements and vendor Tools for OpDIThis final section of this report discusses the available technology and tool types for OpDI, following the order shown in Figures 10 and 11. Furthermore, all these tool types are available as products from software vendors, and so representative vendors and products are mentioned in the following discussion.9

Preferred Technologies for OpDITDWI Research asked survey respondents whether they’ve had personal experience with OpDI projects, and an impressive 75% said “yes.” The survey then posed a question to only the experienced individuals: “In your organization, what is the preferred technology for most operational DI projects?” Based on the preferences of experienced users, a priority order of technologies is revealed. (See Figure 10.) Let’s discuss each of these technologies and their relative popularity among users.

In your organization, what is the preferred technology for most operational DI projects?

ETl 57%

Hand-coded solution 18%

EAI 10%

Other 8%

Replication 6%

Not applicable 1%

Figure 10. Based on 252 respondents who’ve had personal experience with OpDI projects.

exTRACT, TRAnSFORM, AnD LOAD (eTL)

Fifty-seven percent of survey respondents reported that ETL is the preferred technology for OpDI, way ahead of hand-coded solutions (18%), EAI (10%), and replication (6%). Anecdotal evidence suggests that data integration specialists and other technical users prefer ETL due to its unique ability to handle the extreme requirements of data integration, including multi-pass data transformations, handling of complex data, terabyte-scale data sets, deep data profiling, interoperability with data quality tools, and many-to-many data integration capabilities.

ETL offers different benefits to different OpDI project types. For example, data migrations and consolidations require complex data transformations, which the T in ETL is designed for. B2B data exchange demands support for both ad hoc and adjudicated data standards, which most ETL tools support. More advanced ETL tools include built-in capabilities for data sync, changed data capture, master and metadata management, data quality, profiling, federation, and connectors for just about any data source or target—and all these are relevant to one or more forms of OpDI.

Most of the users interviewed for this report pointed out that ETL is also the preferred data integration tool type for AnDI. Since they already know ETL from their work in data warehousing and business intelligence, they can leverage their skills with a specific ETL tool (although an additional product license is usually required by the vendor for OpDI). Besides the capabilities of ETL tools already mentioned, interviewees talked about how they can create easily maintained hub-and-spoke architectures with an ETL tool, which is giant step forward compared to the plague of point-to-point interfaces and spaghetti code typical of hand-coded OpDI solutions.

9 The vendors and products mentioned here are representative, and the list is not intended to be comprehensive.

eTL is by far the preferred technology

for OpDI.

www.tdwi.org 25

Technology Requirements and Vendor Tools

Representative ETL products include expressor semantic data integration system, IBM InfoSphere DataStage, Informatica PowerCenter, SAP BusinessObjects Data Integrator, and Talend Integration Suite.

HAnD-CODeD SOLUTIOnS

Despite developers’ preference for ETL, most organizations are loath to outfit their OpDI projects with vendor tools for ETL or another data integration approach. As we saw earlier, a common barrier to OpDI is a lack of personnel, tools, budgets, and other resources. Organizations are often in denial, refusing to admit that OpDI requires its own resources—even though it’s a recurring task that supports mission-critical applications and business processes. Ideally, OpDI should have adequate resources, whether it has its own people, tools, and infrastructure for data integration or shares these through a data integration competency center or similar organizational structure.

Without data integration tools, technical people fall back to hand coding (in SQL or a similar language) or mostly manual methods (using a hodgepodge of utilities built into DBMSs, operating systems, and so on). This helps explain why hand coding has persisted so long with resource-poor OpDI development. These techniques are non-productive and feature-poor. They get the job done, but slowly, without reusability, and at a depth of data improvement that’s lower than an OpDI solution should have.

Studies have shown that tool-based data integration development and maintenance is far more productive (and, therefore, more economical) than hand-coded solutions.10 Yet, hand coding persists because developers can’t wean themselves of it, consultants use it as an excuse to rack up billable hours, and short-sighted managers won’t spend in the near term to get the long-term cost reductions of tool productivity. It’s time for everyone to do the math and recognize the economic superiority of tool use—at least for data integration projects.

ExpERT COMMEnT hanD CODIng IS SlOwlY faDIng awaY.

“In my experience with a broad range of clients, I’d say that the era of handing coding is slowly coming to

an end,” said Mark Madsen, president of ThirdNature and a recognized expert in data integration. “This is

true for all data integration practices, but the trend is moving faster in some areas than others. Analytic

data integration has moved faster into building solutions atop a vendor’s tool than operational data

integration, probably because the data transformation requirements of AnDI are far tougher to code than

the relatively straightforward data unpack and load requirements of OpDI.”

enTeRPRISe APPLICATIOn InTeGRATIOn (eAI)

When you think of data integration, enterprise application integration (EAI) and similar forms of message-oriented middleware (MOM) probably don’t leap to mind. It’s true that EAI tools cannot satisfy the extreme volume, transformation, and data quality requirements of AnDI. But they can satisfy some of the special requirements of OpDI practices, especially those of B2B data exchange. In many cases, information arriving at a business through B2B data exchange will describe high-dollar-value or time-sensitive business events, like transactions, trades, invoices, payments, orders, new product announcements, and so on. Since these need special attention and handling, it’s important to get them into the appropriate internal business process as quickly and accurately as possible. This is why many process-oriented OpDI solutions (typically for B2B data exchange) must triage incoming data and (when appropriate) push a message onto a queue or otherwise interoperate with EAI infrastructure (often via JMS, which is data friendly). (A similar solution would be to push data

10 “For the wide majority of cases, developing a solution atop a vendor’s data integration tool achieves monetary and time savings over hand coding.” According to the Forrester Research publication, “The Total Economic Impact of Deploying Informatica PowerCenter,” January 2004, page 7.

Interoperability with eAI infrastructure satisfies OpDI requirements for process management and real time.

Hand-coded data integration solutions are feature-poor, non-productive, and not as cheap as you think.

26 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

directly to the target system via its API, thereby circumventing EAI.) While EAI is not the primary technology for building the OpDI solution, OpDI can integrate with EAI to leverage its process management and real-time capabilities.

RePLICATIOn

Almost all data management professionals are conversant in replication. That’s because a fair amount of replication functionality comes at no additional charge with a relational DBMS license, and many third-party replication tools have been available for years. Furthermore, replication can ably handle many OpDI project types, especially data migrations, consolidations, collocations, and database upgrades—as long as the data transformation needs of these projects are relatively light and data sources and targets are fairly homogeneous. It’s surprising that replication isn’t used more for data integration, given that it’s a familiar technology, readily available, capable, and flexibly configured to run in real time or at any level of latency. (Replication overlaps with data synchronization, as discussed elsewhere in this report.)

OTHeR PReFeRReD TeCHnOLOGIeS FOR OPDI

In the survey, a few respondents selected “other” and entered descriptions of the technologies they use or prefer. Here are some highlights from their comments:

• Combinations of tool types. OpDI solutions commonly include multiple tool types. Users report combining ETL and replication, ETL and EAI, and ETL and hand coding.

• eLT instead of eTL. Several users say they prefer ELT over ETL. This leverages the power of target DBMSs by processing data transformations on the target.

• Service-oriented architecture (SOA). A few forward-looking users are applying services to OpDI, probably in lieu of interfacing with EAI. As one user put it: “One of our 2009 goals is to tap the SOA capabilities of our ETL tool suite and interface directly with apps.”

USER STORY SOME OpDI pROjECTS REqUIRE nEaR-REal-TIME DaTa hanDlIng.

“The part of the company I work in handles real-estate tax escrow,” says an ETl architect at a financial

services company. “The operational data integration work I do is mostly linked to business-to-business

data exchange that moves a lot of information about real-estate taxes. We exchange a lot of tax data and

related information with banks, governments, tax authorities, and other business partners, usually through

file transfers, but sometimes through tapes. Most of these partners have their own unique data formats

that we have to support. And most transfers involve flat files, although we’ve moved on to XMl documents,

wherever possible. We handle thousands of files a day, and they vary from kilobytes to gigabytes in size.

Most of the information is time sensitive—though not transactional—so most of the file processing needs to

run at near real time. Our operational data involves a lot of volume, speed, and complexity, and most of it

flows through our ETl tool, which in recent years has replaced hand-coded COBOl programs.”

Additional Technologies for OpDIBesides asking about preferred technologies, this report’s survey also asked about additional technologies and tool types that data integration specialists are using for OpDI. (See Figure 11.) This question was directed solely at survey respondents who reported having personal experience with OpDI projects, so their responses represent real-world tool usage. Again, survey responses reveal a priority order of technologies and tool types, which we’ll discuss.

Replication is familiar and available, plus it

can satisfy a lot of OpDI requirements.

www.tdwi.org 27

Technology Requirements and Vendor Tools

In addition to the preferred technology you selected in the last question, which of the following techniques and tool types were used? (Select all that apply.)

Data modeling 57%

Data quality 48%

Data profiling 43%

Changed data capture 42%

Data synchronization 41%

Metadata management 38%

Master data management 30%

Data federation 17%

Tools licensed as open source 9%

Tools provided in a software-as-a-service (SaaS) model 5%

Other 4%

Figure 11. Based on 844 responses from 252 respondents who’ve had personal experience with OpDI projects (3.3 responses per respondent, on average).

DATA MODeLInG

More than any other tool type, 57% of survey respondents selected data modeling as technology they’re applying to OpDI. This isn’t surprising, given that popular OpDI projects require a fair amount of data modeling. For example, developing a data model for the target database is a lot of the work involved in database migrations, consolidations, and upgrades.

With data migration and consolidation projects, instead of just recreating the old data model on a newer platform, look for ways to improve it, perhaps to optimize it in a modern distributing computing environment. Likewise, databases nowadays are tied together far more often through data integration infrastructure than those designed 10 or more years ago. So, consider adjusting the new data model to include tables or materialized views that make the new database easier to integrate with others.

Representative data modeling products include CA ERwin Data Modeler and Sybase PowerDesigner.

DATA qUALITy

OpDI practices—like all variants of data integration—expose data quality issues. Some of these are problems requiring a fix, while others are opportunities for enriching data. Hence, an OpDI solution can benefit from a vendor’s data quality tool that can perform functions like name-and-address cleansing, deduplication, and standardization. So it’s good to see that data quality came in second in the survey (48%).

Data migrations and consolidations often operate on legacy operational data sets that were subject to data entry (the leading origin of defective data), and so need cleansing and standardization. With ERP migrations and consolidations, product data can be improved by appending D-U-N-S numbers to supplier records, just as customer data can be improved by appending consumer demographics acquired from a third party data provider. Furthermore, database consolidations involving customer data can benefit from data quality tools that identify and match consumers with high accuracy. In

Like all data integration practices, OpDI reveals data quality issues.

Data modeling tools are important to OpDI, which often requires new target data models.

28 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

B2B data exchange, you must cope with data from an external source whose data quality, format standards, and content standards are beyond your control.

In all OpDI projects, you should strive to improve operational data, not just move it. The altruistic goal of all data management professionals should be to add value to data, not just manage it. The assumption is that improving data improves the applications and business processes that depend on the data, which in turn improves the experiences of customers, colleagues, and partners. Improvements involve data’s quality, model, semantics, and accessibility. Hence, all types of OpDI projects are significant opportunities for data improvement that should be seized, not merely dispatched.

Many software vendors offer suites of DQ tools, including DataFlux dfPower Studio, Informatica Data Quality, IBM InfoSphere QualityStage, Informatica Identity Resolution (formerly Identity Systems), SAP BusinessObjects Data Quality, Silver Creek Systems DataLens System, and Talend Data Quality.11

DATA PROFILInG

Data profiling (43%) came in third place in the survey after the related capability of data quality. In many ways, OpDI is a specialized form of data integration. So it makes sense that data quality and profiling—which go hand-in-hand with integration—fared well in the OpDI survey. But there are special issues to consider with data profiling in the context of OpDI:

Profile deeply up front or suffer unpredictable setbacks later. It’s hard to rationalize the time and resources committed to data profiling, since this is not the actual deliverable. But profiling is imperative with migration and consolidation projects, especially those that involve legacy databases, which invariably lack documentation and metadata. Profiling is also critical when a B2B data exchange solution incorporates sources in new data standards.

Use profiling tools instead of manual methods. Manual methods are inferior because of the time-consuming and error-prone process of moving profile information from a query to the documentation to the OpDI solution. When possible, users should profile with a vendor tool to get greater accuracy, repeatability, and productivity.

Re-profile to assure that OpDI produced desired results. Profiling is not just for source data, but also for newly migrated or consolidated data. Once the new platform has live data and active end users, monitor it to gauge success and to prioritize areas of needed improvement. Monitoring may count the number of end users (to measure adoption), outages (to quantify high availability), data defects (to assure improvement), and volume metrics (to assist with scalability).

Data profiling functions are built into all the aforementioned products for ETL and data quality.

USER STORY OpDI UnCOvERS DaTa qUalITY ISSUES ThaT nEED aTTEnTIOn.

“You do the operational data integration project a disservice if you don’t improve the data as you migrate

and consolidate it,” said Andy Ashta, a data architect at Cars.com. “Even so, there can be tough trade-

offs, because you have to avoid scope creep and not all improvements have a high return for the effort.

That’s why each operational data integration project should involve a cost-to-benefit analysis regarding

these data quality corrections. With most projects, improvements to data quality yield substantial benefits,

whereas improvements to master and metadata may have to be incorporated into other projects.”

11 For a complete survey of data quality vendors and tools, see the TDWI Technology Market Report Enterprise Data Quality Tools (Q2 2006), available to TDWI Members at www.tdwi.org/research.

Improve operational data, don’t just move it.

Profile early and often, to better scope OpDI

projects and measure success later.

www.tdwi.org 29

Technology Requirements and Vendor Tools

CHAnGeD DATA CAPTURe (CDC)

CDC identifies changes (like table inserts, updates, and deletes) that have occurred since the last time data was extracted from a source database, so that data integration jobs needn’t scan the entire database the next time they need to extract data. This speeds up data extraction, makes extraction less intrusive, and reduces the amount of extracted data as compared to other approaches (which, in turn, aids scalability). CDC is relevant to OpDI’s data sync practice, because CDC helps speed up and scale up synchronization, plus keep it non-intrusive.

CDC is usually a feature of replication and data synchronization products, as listed below.

DATA SynCHROnIzATIOn

As noted earlier, data sync, by definition, involves bidirectional or multi-directional data flows, which in turn demand rule-based data-value conflict resolution. Advanced configurations of data sync may also need support for heterogeneous data sources and targets, data transformations, and changed data capture. Although most DBMS brands have sturdy replication capabilities built in, you need a third-party or independent product to get the advanced features of true data synchronization.

Data sync and replication share a number of capabilities that are useful for OpDI and other data integration projects. Both can be configured to move data frequently, which is great for time-sensitive data or high availability. Both can operate on data sets of different sizes, like transactions, logs, tables, or whole databases.

Representative high-end data sync products include GoldenGate Transactional Data Management (TDM), IBM InfoSphere Change Data Capture (formerly DataMirror Transformation Server), and Sybase Replication Server. Those tools are replication-based, but it’s possible to effect a form of data sync with other technologies, including when ETL tools interface with EAI tools, as seen in expressor semantic data integration system, Informatica PowerCenter Real Time Edition, and Talend Integration Suite.

Related functionality is available from data verification products. These can audit data that was integrated by a data sync product and confirm that data arrived with integrity. Representative products include GoldenGate Veridata and Informatica PowerCenter Real Time Edition.

MeTADATA MAnAGeMenT AnD MASTeR DATA MAnAGeMenT (MDM)

OpDI projects for migration and consolidation often deal with databases from older platforms, where the data dictionary may be weak, inaccurate, or even missing. This is an opportunity to develop master data and metadata as you develop the data model of the new target database, which in turn can make the new database easier to administer and to access via various applications and integration processes. Likewise, all OpDI projects suffer from the generally sad state of meta- and master data in operational systems, according to comments users entered into this report’s survey.

Metadata management tools and repositories are built into all the ETL products mentioned in this report. Master data management tools are available as modules of data integration tool suites from IBM, Informatica, SAP BusinessObjects, Silver Creek Systems, and other vendors.

Operational systems’ poor meta- and master data challenges OpDI.

CDC makes data sync faster, less intrusive, and more scalable.

Data sync has advanced features that replication doesn’t.

30 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

TOOLS LICenSeD AS OPen SOURCe

Data integration tools based on open source are still quite new, so it’s not bad that 9% of survey respondents are using them. These tools address real-world issues that are quite glaring in OpDI practices. To be honest, OpDI projects aren’t funded nearly as well as AnDI projects. The fact that open source data integration tools can be licensed for a quarter of the cost of market-leading ETL tools makes them a good financial fit for OpDI. Furthermore, many OpDI projects are executed by database administrators or application developers, whose priorities lie outside data integration. They don’t have the time or patience to learn the hundreds of features of a large, complex ETL tool, so they appreciate the streamlined user interface and straightforward data management functions of an open source tool.

Representative open source data integration vendors include Apatar, Jitterbit, and Talend.

TOOLS PROVIDeD In A SOFTWARe-AS-A-SeRVICe (SAAS) MODeL

A lot of OpDI work is intermittent, especially data migrations, consolidations, and upgrades. So it’s hard to rationalize spending a lot of money on a continuing license for a tool that you only use here and there for a few months at a time. Akin to the creative licensing of open source data integration tools, software-as-a-service (Saas) is another new approach to licensing that’s now being applied to data integration tools. Relevant to OpDI, SaaS allows technical users to “rent” a tool for a limited amount of time. And the tool is hosted over the Internet, so users needn’t install it. Data integration tools available through SaaS are potentially good for the intermittent work of OpDI. But today they’re so new and rare that the jury is still out.

Open source and SaaS licensing aside, other innovative licenses are available. For example, expressor software only charges for the number of processor channels—i.e., units of data parallelism—you wish to run in parallel via one or more expressor applications. All tools, metadata repositories, and data connectors come free of charge with the purchase of a channel.

OTHeR ADDITIOnAL TeCHnOLOGIeS FOR OPDI

In the survey, a few respondents selected “other” and entered descriptions of additional technologies they’re using for OpDI. Here are some of the technologies they mentioned:

• Data standards. As mentioned earlier, supporting multiple data standards is part of B2B data exchange and occasionally other OpDI projects. This is harder for a tool to support than it sounds. Even when partners are using the same open standard, they will use different versions and create slight variations to accommodate unique data requirements for schema, format, and content standards. Sometimes a partner demands the use of its proprietary data format as a requirement for doing business. Consequently, data integration infrastructure must support open standard data models, versions and variants of the standards, and ad hoc or proprietary data models.

• exception processing. This is important to all serious data integration implementations, and has ramification for OpDI. For example, consider a B2B data exchange solution (in manufacturing or retail industries), where most of the incoming data is from product catalogs. All organizations—even partnering companies—have unique ways of describing their products, so it’s difficult to map incoming data about a supplier’s or manufacturer’s product to how your internal system describes that product. Hence, there are many exceptions—sometimes hundreds or thousands at a time—that a human must map, route, approve, edit, or process in some way.

Some OpDI projects have special needs for

data standards and exception processing.

Open source data integration tools have

an appealing price and simplicity.

Tools via SaaS may fit OpDI’s

intermittent nature.

www.tdwi.org 31

Recommendations

For an end user to survive a high volume of exceptions, he or she needs an exception processing tool that can automate as many remediation tasks as possible, complemented by an easy-to-use drag-and-drop user interface for those cases that truly need human intervention. And that tool needs to be integrated into the OpDI solution.

Representative products include Informatica B2B DT, Silver Creek Systems Data Governance Studio, and SAP BusinessObjects Data Governance Visualization.

• Dedicated OpDI tools and packages. There aren’t many of these, but representative products include Informatica B2B DX, Rever DB-MAIN, and SAP Data Migration Services.

USER STORY OpEn SOURCE DaTa InTEgRaTIOn TOOlS havE a fOOThOlD In OpDI.

DAI is a global development firm that specializes in helping emerging societies and economies become

more prosperous, safe, and just. “Many of our projects need a software application if the project is to

achieve its goals and our clients are to have a tool to work with after we’ve exited the project,” said

Andrew Ross, a principal developer and GIS specialist with DAI. “The catch is that many of our clients

are organizations inside third-world countries. They need simple, low-tech applications, because that’s

all they can afford or maintain. Yet, they also need applications that leverage freely accessible data in

an innovative way. That’s why many of our applications are ‘mashups’ that creatively combine data from

multiple free sources and overlay the information over maps and other visual backgrounds. Furthermore,

most clients don’t have budgets for ongoing licenses for enterprise software, but they can afford open

source software. And that’s why we use open source data integration software for integrating operational

data into mashups and other applications.”

Recommendationsexpect to expand your operational data integration practice. According to TDWI Research, the two broad practices of data integration—analytic and operational—are both growing, but the operational practice has grown faster than the analytic one in recent years. This trend will continue, driven by the increasing number of initiatives that require operational data integration, including ERP initiatives, application consolidation initiatives, IT centralization, data center renovations, legacy decommissions, data as an enterprise asset programs, business integration and transformation initiatives, corporate reorganizations, and mergers and acquisitions.

Revamp your OpDI solutions for future success. Many of these are hand-coded legacies that need to be replaced by modern solutions built atop vendor tools. Others are feature-poor and need to be augmented with functions they currently lack for data quality, master data, scalability, architecture, Web services, and modern tools for better developer productivity and collaboration.

Recognize how important B2B data exchange is. It’s an important layer in mission-critical applications. And it’s the medium through which you communicate with valuable partners. Without it, your firm would come to a halt. So give it the staff and resources it needs to grow.

Apply multiple OpDI project types in tandem. For example, database collocations and upgrades are typical prerequisites for database migrations and consolidations—or vice versa, in some cases. And some databases may need synchronization after they’ve been migrated or consolidated.

Realize that data migrations and consolidations aren’t always possible. After all, these are very invasive. But database collocation and synchronization get similar results less intrusively.

Consider a renewed investment in resources for operational DI.

32 TDWI rese arch

OPERAT IONAl DATA IN TEGRAT ION

Avoid staffing OpDI from the data warehouse team. Yes, they have great skills, but you put important BI work on the back burner when you scavenge from the data warehouse team.

Consider a data integration competency center. It can provide staff, shared services, and other resources for all DI practices.

Consider an OpDI competency center. This is most likely with B2B data exchange in firms with active supply chains. But it could also apply to firms that have regular mergers and acquisitions, as in financial services. If your firm continually performs work in all three of OpDI’s practice areas, then OpDI definitely merits its own competency center.

When OpDI work is intermittent, outsource it. Instead of permanent staff, hire system integrators, consultants, or vendors’ professional services departments.

Continue the trend away from hand coding and toward vendor tools. A tool-based approach can give all OpDI practices richer features, modern functions, and more productive development and administration environments.

Depend on eTL for OpDI’s heavy lifting. Most organizations do, because the T in ETL is very useful with the complex transformations typical of data migrations and consolidations. Beyond heavy lifting, complement ETL with other tools that satisfy specialized needs for unstructured data, synchronization, real-time data delivery, and so on.

expect OpDI—like all variants of integration—to expose data quality issues. Some of these are problems requiring a fix, while others are opportunities for enriching data. Either way, don’t just move data and metadata; improve them.

Profile source data carefully. Otherwise suffer unpredictable problems during development and deployment. For greatest productivity, profile with a tool, instead of manual methods.

Be aware of the overlap between replication and synchronization. Realize that data sync, by definition, is multi-directional, whereas most replication configurations are just unidirectional. Likewise, be aware that advanced data sync configurations may need to support data transformations and heterogeneous data sources and targets.

Data sync and MDM go hand in hand. Improving master data and metadata is often a prerequisite to a successful data synchronization project.

Foster data standards. These are the bread and butter of B2B data exchange, and a successful implementation must support the appropriate de jure and de facto standards, plus variants and versions of these based on flat-file and XML formats.

enforce system-specific data standards. Beyond unpacking incoming data, a business or IT system receiving data must also convert data formats and contents to meet requirements specific to their systems. This may include converting units of measure, standardizing product attributes, or translating from one natural language to another.

Provide appropriate tools for data stewards and business analysts. There are lots of exceptions that must be processed quickly and accurately in OpDI projects, especially B2B data exchange. Stewards, analysts, brand managers, and others need a business-friendly tool where they can apply their domain expertise to mapping, routing, approving, and editing data.

Use multiple tool types, if that gets you the

functions OpDI needs.

Review your approach to OpDI staffing.

Improve OpDI solutions by recreating them atop

vendor tools.

Datafluxwww.dataflux.comDataFlux enables organizations to analyze, improve, and control their data through an integrated technology platform. With DataFlux enterprise data quality and data integration products, organizations can more effectively and efficiently build a unified view of customers, products, suppliers, or any other corporate data asset. A wholly owned subsidiary of SAS (www.sas.com), DataFlux helps customers rapidly assess and improve problematic data, and build the foundation for enterprise data governance. Effective data governance delivers high-quality information that can fuel successful enterprise efforts such as risk management, operational efficiency, and master data management (MDM).

expressor softwarewww.expressor-software.comexpressor software tackles the complexity and cost of enterprise IT projects with data integration software that delivers breakthrough development productivity and data processing performance at a significant price/performance advantage. expressor’s patent-pending semantic data integration system is based on common business terms to enable collaborative, role-based team development, business rule reuse, and end-to-end project lifecycle management that reduce total data integration costs by 40% or more.

goldengate Softwarewww.goldengate.comGoldenGate Software is a leader in solutions for real-time data integration and high availability. GoldenGate’s technology enables real-time integration and synchronization of transactional data between operational and analytical systems with very low impact. With over 500 customers and 4,000 solutions deployed globally, GoldenGate supports mission-critical business systems at companies including Visa, Bank of America, DIRECTV, HSN, AT&T, US Bank, UBS, Sabre Holdings, Mayo Foundation, and Overstock.com.

IBMwww.ibm.comIBM Information Management has the end-to-end capabilities to help you manage your data and content, pull together trusted information that cuts across diverse silos, and also gain valuable insights to optimize your business. Key offerings include the IBM InfoSphere software portfolio, which consists of: InfoSphere Information Server, InfoSphere MDM Server, InfoSphere Warehouse, IBM Industry Models, and InfoSphere Foundation Tools. The InfoSphere software portfolio provides a complete, integrated, and easy-to-deploy platform to address a wide range of information needs. Please visit: www.ibm.com/software/data/information/trust.html.

Informatica Corporationwww.informatica.comInformatica Corporation provides data integration software and services that empower your organization to access, integrate, and trust all its information assets, giving your organization a competitive advantage in today’s global information economy. As the independent data integration leader, Informatica has a proven track record of success helping the world’s leading companies leverage all their information assets to grow revenues, improve profitability, and increase customer loyalty. That is why Informatica is known as the data integration company.

Sap BusinessObjectswww.sap.comSAP is the world’s leading provider of business software, offering applications and services that enable companies of all sizes and in more than 25 industries to become best-run businesses. SAP has more than 82,000 customers in over 120 countries. The SAP® BusinessObjects™ portfolio transforms the way the world works by connecting people, information, and businesses. With open, heterogeneous solutions in the areas of business intelligence; information management; governance, risk and compliance; and enterprise performance management, the SAP BusinessObjects portfolio enables organizations to close the gap between business strategy and execution.

Silver Creek Systemswww.silvercreeksystems.comSilver Creek Systems®, the leader in product data mastering, is recognized as the

“go to” vendor for product data problems. Their patented semantic-based toolset, the Datalens System™, enables enterprisewide standardization and integration of product data—a prerequisite to any successful MDM/PIM, systems migration, data quality, or governance project. Drawing from elements of data integration, data quality, and data governance, their toolset uses next-generation semantic technology to standardize, enrich, match, and repurpose product data from any source—reducing implementation time, cost, and risk while improving the effectiveness of applications from search and merchandising to global data synchronization, inventory management, and procurement.

Sybasewww.sybase.comSybase Replication Server supports enterprise data management environments by replicating and synchronizing copies of Sybase ASE, Oracle®, SQl Server, and other databases. It provides high-performance, secure, guaranteed delivery of data across the enterprise to meet the challenges of data distribution, synchronization, system migrations, and business continuity. Replication Server combines the benefits of real-time, heterogeneous, bi-directional data replication with integrated modeling, development, and administration. It supports reporting and analytics business applications, IT process and technology initiatives, and resource consolidation.

Talendwww.talend.comTalend is the recognized market leader in open source data integration. After three years of intense research and development investment, and with solid financial backing from leading investment firms, Talend revolutionized the world of data integration when it released the first version of Talend Open Studio in 2006. Talend’s solutions are used primarily for integration between operational systems, as well as for ETl (extract, transform, load) for business intelligence and data warehousing, for migration, and for data quality management. Unlike proprietary, closed solutions, which can only be afforded by the largest and wealthiest organizations, Talend makes data integration solutions available to organizations of all sizes, and for all integration needs.

Research Sponsors

1201 Monster Road SW

Suite 250

Renton, WA 98057

T 425.277.9126

F 425.687.2842

E [email protected]

www.tdwi.org

TDWI Research provides research and advice for BI professionals worldwide. TDWI Research focuses exclusively on BI/DW issues and teams up with industry practitioners to deliver both broad and deep understanding of the business and technical issues surrounding the deployment of business intelligence and data warehousing solutions. TDWI Research offers reports, commentary, and inquiry services via a worldwide Membership program and provides custom research, benchmarking, and strategic planning services to user and vendor organizations.

TDWI rese a rch