Transcript
Page 1: Data Integration Platforms for Big Data and the Enterprise: Customer

WHITE PAPER

Data Integration Platforms for Big Data and the Enterprise

Customer Perspectives on IBM, Informatica, and Oracle

April 2015

Page 2: Data Integration Platforms for Big Data and the Enterprise: Customer

Contents

Executive Summary ..................................................................................................................................................... 1

Introduction .................................................................................................................................................................. 2

Research Methodology ............................................................................................................................................... 3

Company Research Profiles ...................................................................................................................................... 3

A Framework for Evaluating Data Integration .......................................................................................................... 4

Overview of Research Findings................................................................................................................................... 5

Vendor Platform Architecture Differences ............................................................................................................... 5

Data Integration for Analytics ................................................................................................................................... 8

Traditional BI and Data Warehousing .................................................................................................................. 8

Big Data Integration ........................................................................................................................................... 11

Data Integration for Enterprise Systems ................................................................................................................ 12

Information Availability .......................................................................................................................................... 14

Conclusions and Other Considerations .................................................................................................................... 16

THIS WHITE PAPER WAS SPONSORED BY ORACLE. THE RESEARCH AND ANALYSIS WERE CONDUCTED INDEPENDENTLY BY DAO RESEARCH.

Page 3: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

Executive Summary

ata integration solutions have evolved over time from dedicated on-premise tools for moving and trans-forming data to comprehensive platforms supporting a broad spectrum of use cases for today’s data-driven enterprises. Now data integration solutions must include heterogeneous, real-time change data

capture (CDC) and distribution, data transformation, data quality, data validation, and metadata management—all merged with traditional on-premise, private, and public cloud scenarios.

Big Data integration further demands data integration platforms that keep pace with market innovations and support massive scale. The architecture and design of a data integration platform are critical for not only solution performance and developer productivity but also high-level business concerns such as time-to-market, cost of ownership, scalability, and innovation.

IBM, Informatica (soon to be acquired by an international private-equity consortium in a leveraged buyout), and Oracle are all considered leaders in the data integration market, each with their own suite of products addressing a variety of use cases and deployment options; however, some differences in the offerings of IBM and Informatica as compared to those of Oracle are fundamental. To explore the differences and their impact on customer organizations using each vendor’s offerings, Dao Research reviewed public solution information and customer use cases and conducted primary research with customers of IBM, Informatica, and Oracle. The key takeaways from this research and analysis are the following:

Oracle supports a significant range of data integration use cases across analytics, enterprise integration, and information avail-ability with the most unified platform, including real-time and bulk data integration, data quality and governance, and rapidly evolving options for on-premise, private, and public cloud deployment scenarios.

Big Data integration and cloud data integration are enabled by Oracle in numerous ways with a single platform and common tooling with native support for various Big Data processing environments. IBM and Informatica, however, offer this integration through separate platforms or add-ons that require additional hardware and software investment.

All three vendors offer components for enterprise data quality, but Oracle provides integration and automation with a unified development environment, where users can execute data quality rules from data integration processes.

Customers cited 30% to 60% greater developer productivity using Oracle versus traditional Extract-Transform-Load (ETL) tools from Informatica and IBM, based on key architectural differences like native push-down processing, the separation of logical and physical layers, and easy extensibility.

Oracle’s data integration cost of ownership is lower based on a unified platform with fewer add-on options versus multiple platforms containing many editions and add-on options, leveraging source and target processing, developer productivity, faster implementation, and less ongoing management of middle-tier integration infrastructure.

Oracle’s data integration offerings provide flexibility to support rapidly evolving use cases such as Big Data without customers having to wait for major release cycles, most notably due to the flexibility and extensibility of its products.

D

“As an ODI developer, I am a Big Data

developer without having to understand the underpinnings of

Big Data. That's pretty powerful capability.”

Integration Architect for Large Financial Services

Provider

© 2015 Dao Research. All rights reserved. 1

Page 4: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

Introduction

s with many other IT solution categories, the scope of data integration has broadened considerably over the past decade. Due to historically limited database processing capacity, the traditional approach for batch transformation was to extract data from source databases during finite windows of off-peak time

and process it in large batches on a middle-tier transformation engine dedicated to ETL processes. There was a natural latency inherent in this process before the data was made operational in the target system compounded by the risk associated with shrinking batch windows and ever-increasing amounts of data.

More recently, key trends have resulted in a broader definition of data integration to include integral components beyond ETL such as real-time CDC and distribution, replication, data services, data virtualization, metadata management, data quality, data validation, and more:

Data volumes have grown dramatically, and there is increasing pressure to reduce data latency from near real-time to the ultimate goal of allowing key decisions to be made based on real-time data.

The need for data quality and governance has grown based on an increase in the number of data integration projects within an organization, expanding data sources and types of data from legacy systems, heterogeneous databases, and enterprise messaging systems.

Database processing power has increased on a massive scale, such that it can increasingly be leveraged for data transformation purposes.

According to a leading analyst firm, Big Data strategies are not complete without Big Data integration, and “enterprises are spending more time on integrating Big Data sources than on any other Big Data management function. Simplifying and automating Big Data integration is important to reduce time-to-value and leads to successful Big Data projects.”

Lastly, for consumers and users of data integration solutions, the deployment styles have widened based on the evolution of private and public cloud architectures.

These trends in the data integration market create an interesting dynamic among some of the leading companies in this space, namely IBM, Informatica, and Oracle. The purpose of this paper is to summarize research into the hypothesis that Oracle’s data integration platform provides some fundamental business benefits as compared to traditional ETL-architected solutions and to review the differentiating features and benefits in a vast array of use cases for data integration for analytics, data integration for enterprise systems, and information availability.

A

© 2015 Dao Research. All rights reserved. 2

Page 5: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

Research Methodology

xploring the research hypothesis, Dao Research conducted a multilevel research project on the relative value of data integration solutions and the impact of choice of vendor platform on the IT environment and the broader business. Beyond the architectural and tool capability analysis, our research framework

sought to distinguish elements such as implementation, developer productivity, manageability, performance, and the resulting impact on overall cost of ownership and the ability to innovate. Our research process and methods were as follows:

Reviewed publicly available information and secondary research regarding Oracle, Informatica, and IBM data integration solutions, capabilities, use cases, and key value drivers.

Identified and qualified 13 interviewees who participated in detailed primary research and data gathering.

Conducted interviews with industry data integration experts, Oracle product management, and solution consulting staff to collect existing customer data and further validate benefits and value drivers as experienced by current customers and partners.

Developed this paper to summarize the research findings.

Company Research Profiles Table 1 lists the companies analyzed and interviewed in the data-gathering phase of the research project.

Table 1. Companies and Vendor Solutions Included in Primary Research

Company Industries Represented Solution/Deployment Notes

Financial Services Oracle Data Integrator, GoldenGate

Financial Services Informatica PowerCenter Oracle Data Integrator*

Systems Integrator Oracle Data Integrator, GoldenGate Informatica PowerCenter

Telecomm (Southeast Asia) IBM DataStage Oracle Data Integrator

Retailer Oracle Data Integrator, GoldenGate Informatica PowerCenter IBM DataStage

Pharma (Clinical Trials) Oracle Data Integrator Informatica PowerCenter

Business Services IBM DataStage Oracle Data Integrator, GoldenGate (with Exadata)

E

© 2015 Dao Research. All rights reserved. 3

Page 6: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

Company Industries Represented Solution/Deployment Notes Global Media Company Oracle GoldenGate

Informatica PowerCenter

Retailer Informatica PowerCenter

Retailer Oracle Data Integrator

Telecomm Informatica PowerCenter

Financial Services (Europe) Oracle Data Integrator (with Exadata) IBM DataStage*

Pharma Oracle Data Integrator Informatica PowerCenter*

*Denotes evaluation or prior use of vendor tool.

A Framework for Evaluating Data Integration Table 2 presents the analytic framework and areas of investigation employed for evaluating data integration solutions based on the major functions and activities involved in a variety of deployment scenarios.

Each of the areas in Table 2 was explored in both the secondary research and customer interviews and data collection. In the customer interviews, a variety of use cases were analyzed with this framework and the findings are provided in the next section.

Table 2. Data Integration Solution Evaluation Framework

Platform/Solution Comparison

Including an architectural comparison, an “apples-to-apples” solution comparison across various deployment scenarios and the level of integration among solution components where possible.

Implementation The effort and time required to “stand up” the integration platform and the various components, as well as changing it as requirements evolve.

Developer Productivity & Time-to-Market

The impact of the solution and tooling on the common day-to-day activities of integration developers, including the estimated effort and complexity level of the major activities in addition to an evaluation of skill level requirements and training or ramp-up time.

Manageability A comparison of the effort involved in managing and administering the solution environment and the impact of centralized management capability.

Cost of Ownership Based on the other elements of the framework, the impact (where possible to analyze) on overall cost of ownership.

Performance, Scalability & Reliability

Discussion on any findings relative to processing speed, ability to scale as data volumes grow, and fault tolerance/resilience to network outages or process interruption.

© 2015 Dao Research. All rights reserved. 4

Page 7: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

Overview of Research Findings

he first part of the analysis pertained to overall product and solution architecture differences and how that is reflected based on the research framework defined in the previous section. After exploring the general differences, we looked deeper into the key use cases of data integration for analytics, including

Big Data integration, data integration for the enterprise, and information availability.

Vendor Platform Architecture Differences The analysis framework begins with a look at the foundation of the vendor platforms. Specifically, the IBM and Informatica data integration platforms represent a traditional ETL architecture with a middle-tier transformation engine and associated tooling that extracts data from source systems, operates an extensive set of trans-formations in a dedicated middle-tier transformation engine, and loads the resulting data into the target system. In contrast, Oracle’s data integration platform represents an ELT architecture with a distributed, agent-based approach, where transformation processing happens in the database engine (source and/or target database) where the data resides. It utilizes the native processing capabilities of the underlying database infrastructure rather than moving data to a transformation engine. This difference in architecture is illustrated in Figure 1.

Figure 1. Platform Architecture Difference of ETL (IBM, Informatica) vs. ELT (Oracle)

T

© 2015 Dao Research. All rights reserved. 5

Page 8: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

This fundamental architectural difference between traditional ETL and ELT has a profound impact when applying the analytical framework employed in the research. With Oracle, as compared to IBM and Informatica, customers found there was no need for middle-tier ETL transformation engines and the associated server hardware and software required. This fact impacts multiple facets of our analysis, including performance, productivity, agility, and cost of ownership.

IBM and Informatica customers interviewed cited the need for powerful servers due to the fact that source data is often extracted in bulk without a granular filtering capability. In some cases with Informatica and IBM, there are also multiple instances of the middle tier for development, test, and production, further adding to acquisition and management costs. Specific to IBM, in high-volume environments where the InfoSphere DataStage grid solution is deployed, the configuration also requires multiple servers in a clustered environment, which from a hardware perspective scales out in a linear fashion as the load increases.

According to a senior integration developer with deployment experience with both Oracle Data Integrator (ODI) and Informatica across several industries, “Today, we have a database server and we are just reusing the power of that server. No additional server is needed for ODI. For Informatica, it is a separate cost, a big cost actually. This server has to be powerful for Informatica to have good performance. You have to have people managing that server. Then, in your whole architecture, you usually have four environments like dev, test, UAT, and production, so that means four servers and that’s a big cost.”

In Table 3, the analysis of the platform is broadened to include other key components of the data integration suites. Oracle’s architecture and design with Oracle GoldenGate, Oracle Enterprise Data Quality, and ODI have a very streamlined and unified offering for a broad range of use cases and deployment scenarios. The product set is also easy to understand from a buyer perspective. In contrast, in the case of both IBM and Informatica, to support traditional analytics, Big Data integration, and cloud integration in a truly enterprise scenario there are many more products, a significant and overlapping array of options and add-ons, and entirely different platforms to purchase and manage. For Informatica alone, there are 13 “editions” of their products for ETL, data quality, and cloud integration. The streamlined architecture of Oracle’s data integration solution supports public and private cloud integration without adding more complexity.

In terms of real-time data capture and movement, both IBM and Informatica have separate product lines for data replication differentiated from the CDC products. In contrast, Oracle GoldenGate’s single platform offers the ability to efficiently capture data from a broad range of sources in analytic and enterprise integration scenarios, as well as from a range of sources in information availability scenarios, which will be explored in a later section.

“”Today, we have a database server and we are just reusing the power of that server. No

additional server is needed for ODI. For Informatica, it is a

separate cost, a big cost actually. This server has to be powerful for Informatica to

have good performance. You have to have people

managing that server. Then, in your whole architecture, you

usually have four environments like dev, test,

UAT, and production, so that means four servers and that’s

a big cost.”

© 2015 Dao Research. All rights reserved. 6

Page 9: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

Table 3. Broader Enterprise Integration Platform Comparison of Oracle, IBM, and Informatica

Oracle Informatica IBM

Complex Transformation & Bulk Data Movement

Oracle Data Integrator Platform #1. PowerCenter Editions:

Standard Advanced Premium

InfoSphere DataStage Editions:

Standard Multiple Virtual Storage

Real-Time Data Oracle GoldenGate PowerCenter Real Time Edition (add-on) PowerExchange, Informatica Data Replication Informatica Fast Clone

InfoSphere Data Replication InfoSphere Change Data Capture (CDC) InfoSphere Change Data Delivery InfoSphere Classic Replication Server for z/OS InfoSphere QReplication

Data Quality Oracle Enterprise Data Quality

Informatica Data Quality Editions:

Standard Advanced Governance Big Data

InfoSphere QualityStage IBM InfoSphere Information Server for Data Quality InfoSphere Information Analyzer InfoSphere Discovery

Big Data Oracle Data Integrator Oracle GoldenGate for Big Data Oracle Big Data Connectors

Platform #2. PowerCenter Big Data Editions (add-on):

Standard Governance

InfoSphere Information Server Editions:

Workgroup Enterprise Hypervisor

Cloud Integration

Oracle Data Integrator (bulk) Oracle GoldenGate (real time) Oracle Integration Cloud Service (app integration)

Platform #3. Informatica Cloud Integration Platform Editions:

Professional Standard Advanced Premium

Platform #4. Informatica Cloud Platform (software as a service [SaaS])

Platform #2. IBM Cast Iron Editions:

Hypervisor (on-premise) Appliance XH40 (on-premise) Express (SaaS) Live (SaaS)

IBM BlueMix IBM DataWorks

Table 3 also highlights the impact of architecture differences from a software licensing perspective. Oracle customers cited they can avoid purchasing optional capabilities such as push-down offered by Informatica, for example. Informatica customers often realized the need for this capability well into the deployment but found the cost prohibitive relative to the value it provided over and above the middle tier. Some, like a large financial

© 2015 Dao Research. All rights reserved. 7

Page 10: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

services company using Informatica, ended up coding all of their own scripts to execute transformations on the database. As a result, they shared, “We use Informatica very much as a shell, and a lot of the mapping and workflows you create within Informatica directly call on just PL/SQL scripts rather than using the transformation functionalities that come built with Informatica.”

In regard to licensing, Oracle customers did note the differences in pricing models versus those of IBM and Informatica, with the Oracle license based on the underlying database versus the middle-tier server(s). While for some this model raised concerns, particularly for those who were not under an enterprise licensing agreement, for others it proved to be a more stable investment because they do not need to increase the size of the middle tier as data volumes grow.

Because there is no need to “stand up” the middle tier, the data integration platform implementation is significantly faster for Oracle versus IBM and Informatica, a “third to half of the time” to get into production according to study participants.

Also associated with the lack of a middle tier, customers found the manageability of Oracle much easier and requiring less administrative effort, as compared to managing, monitoring, and upgrading a dedicated server environment. With Oracle, customers cited very little reliance on operations or system administration resources. On average, customers with experience with ODI and either Informatica and/or IBM found it “half the cost” to manage Oracle.

The other major difference between Oracle versus IBM and Informatica, also tying back to the architectural difference, lies in the tooling itself and the impact on best practices, resources, and developer productivity. ODI generally relies on common SQL-based knowledge because it uses the underlying databases for transformations; ODI actually generates native code that runs on the relational database or the Big Data environment. As such, customers cited the ease of finding and ramping up resources without previous integration “tool” experience. By contrast, those bringing on resources for IBM and Informatica deployments required prior experience with the vendor-specific integration tools. Study participants related that with common SQL-based skills it was easier to achieve and enforce code consistency and have more transparency into data integration. As a partner in a systems integration firm related, “For the traditional ETL tools like Informatica, when we are looking at resources to bring on we look for deep experience with PowerCenter. With ODI, we find we can bring those with SQL experience on, whether they have been a DBA or database developer, and ramp up very quickly.”

Data Integration for Analytics The most significant use of data integration historically has been the area of analytics including business intelligence (BI), data warehousing and, increasingly, Big Data analytics. Most of the customers included in the research were using the integration platforms for these use cases, which involve the basic elements for getting data from source to target systems and driving toward real-time integration.

Traditional BI and Data Warehousing

IBM, Informatica, and Oracle all have a substantial installed base of customers utilizing their data integration solutions for BI and data warehousing scenarios with the core offerings including IBM InfoSphere DataStage, Informatica PowerCenter, and ODI, respectively. From a pure transformation capability perspective, customers found IBM and Informatica each having a rich set of data integration transformations, handling simple to very complex data integration projects. The transformation capability was noted as an advantage as compared to Oracle until the release of ODI 12c, which expanded the transformation portfolio to be on par, if not more comprehensive, due to the expansion of knowledge modules and the flexibility afforded by its software development kit.

“We use Informatica very much as a shell, and a lot of the mapping and

workflows you create within Informatica directly call on just

PL/SQL scripts rather than using the transformation

functionalities that come built with Informatica.”

© 2015 Dao Research. All rights reserved. 8

Page 11: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

As with the architectural differences, there are some foundational differences in how the tool capability is delivered in ODI 12c versus PowerCenter and DataStage, which lends to a dramatic impact on integration developer capability. Participants in the study, many versed in multiple vendor platforms, found developer productivity with ODI ranged from 30% to 60% higher than with PowerCenter and DataStage due to the following:

With ODI, Oracle separates the logical and physical layers of the integration and also provides the related notion of context, which reduces the potential for errors and also makes activities such as switching between environments very simple. This also enables the segregation of integration duties across senior and junior developer resources.

ODI provides knowledge modules which are predefined and reusable components for common transformations and connectivity to many different data sources and targets. This was noted by virtually all study participants as a significant factor in both productivity and platform flexibility. As described by an integration architect with ODI and Informatica experience, “Basically, because I have these knowledge modules set up I can bring in somebody off the street tomorrow and within a few days of training they can be using ODI. Even better, I can set up tasks and be very confident that they’re not affecting performance of my data warehouse and ETL window.”

Our secondary research also showed that ODI provides the ability to automate data quality processes from within ODI Studio by using Check Knowledge Modules or by invoking Oracle Enterprise Data Quality processes, which provides prebuilt rules for both customer and product data domains. The integration between ODI and Oracle Enterprise Data Quality enables productivity when developing comprehensive solutions that combine data quality with data movement and transformation.

Overall, there is significantly less custom scripting in ODI vis-à-vis PowerCenter and DataStage, and there is a fundamental difference in the processing of scripts. For IBM and Informatica, the processing takes place on the middle-tier server, referred to by several participants as a “black box” process and, as such, it can be more challenging to identify, troubleshoot, and resolve issues. In contrast, Oracle executes transformations using common SQL on the database, making it easier to diagnose performance bottlenecks. In the words of the director of data integration for a business services provider, “With ODI, we treat our mappings that we create as SQL and then we code ODI as if you’re creating SQL and then interestingly enough ODI generates SQL, which allows you to tune your jobs like you would tune any other Oracle-based process or SQL standard. Database tuning is part of the process with ODI. That’s really the advantage over DataStage, which generates a proprietary language underneath it—unreadable, immutable objects that you have to figure out how to influence.”

In terms of performance, scalability, and reliability, study participants noted key aspects of Oracle that differentiated it from IBM and, even more so, Informatica. Customers of IBM DataStage did tout the advantages of its parallel processing capabilities in terms of performance, but they focused primarily in IBM environments that still utilized the middle tier. Three major factors were highlighted by study participants:

Using the power of the source and target databases with ODI greatly improved batch processing time. One study participant, a business services provider, cited the performance difference in the ETL process for an internal data warehouse in migration from DataStage to ODI. He shared, “Our weekly processor data warehouse runs over the weekend but I can tell you when it’s converted to ODI it won’t take all weekend—from the stuff that’s already converted on average it’s taking a quarter of the time.”

“Basically, because I have these knowledge

modules set up I can bring in somebody off

the street tomorrow and within a few days of training they can be

using ODI. Even better, I can set up tasks and be

very confident that they’re not affecting

performance of my data warehouse and ETL

window.”

© 2015 Dao Research. All rights reserved. 9

Page 12: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

Tight integration of ODI with Oracle GoldenGate for CDC was cited as a very significant differentiator for Oracle compared to IBM and Informatica as companies drive toward real-time data integration. Specific points brought up by study participants include the following: GoldenGate has extremely low impact on underlying data

stores and is a single product that can efficiently move data from and to a variety of heterogeneous data stores. Because Informatica does not offer a comparative product (though there is a CDC option in PowerCenter), many Informatica customers in our analysis were actually using GoldenGate in conjunction with PowerCenter. In terms of IBM, participants cited the capability of InfoSphere CDC (formerly a DataMirror product) in this capacity but also mentioned there were several different versions depending on the source database and use was limited primarily to IBM-centric environments.

The tight integration between GoldenGate and ODI includes a robust journalizing capability cited as a significant advantage for those utilizing the products in combination.

GoldenGate provides capabilities that assist in production load testing in preproduction environments. As explained by a large financial services provider, “One of the nice things about using GoldenGate from a real-time perspective, we weren't having to wait for it to get to production to see where data issues are. We installed GoldenGate at the beginning of the project in production, and it creates what they call trail files. We were then able to take weeks and weeks of trail files and push them down to dev and pump those through the integration so that you can then do our testing on a whole restructured dataset within a matter of hours.”

A key point regarding performance was highlighted by Oracle customers using ODI in conjunction with Exadata in support of more critical integration scenarios. ODI is the only bulk data integration tool that runs natively on Exadata, providing “orders of magnitude” performance improvements with a very light footprint. Several very large global mobile telecomm companies cited the benefits of Exadata, one noting the ability to publish their “month-end data in half the time” and “improve end-user reporting performance by three to five times.”

A final point related to data integration and an important signal to the market in general is Oracle’s recent migration from Informatica as the solution provider for its BI applications to Oracle’s own ODI. Even though Informatica is embedded in BI applications and not used directly by end-customers for integration development, study participants did cite manageability and ongoing cost of ownership savings in replacing the middle-tier transformation engine with the distributed, agent-based ODI. They also cited improvements in performance with ODI as more optimized for BI applications when compared to Informatica. This move by Oracle was viewed by study participants as an important signal that both demonstrates the value of ODI and expands the skill base for ODI.

“With Exadata, we are able to publish

our month-end data in half the time and

have seen an increase in end-user

reporting performance of

three to five times.”

“We find significant value in GoldenGate

in terms of both high performance

and the flexibility to support many different data

integration scenarios from real-time analytics, Big Data, Active-Active

and disaster recovery.”

© 2015 Dao Research. All rights reserved. 10

Page 13: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

Big Data Integration

Big Data integration scenarios are becoming more prevalent as both the volume of data continues to grow and as more types of data are being included in analytics for better insight and business value. Similar to the core BI and data warehousing scenarios, the architectural difference of Oracle’s ELT versus IBM’s and Informatica’s ETL rapidly enables Big Data scenarios utilizing the processing power of distributed environments like Hadoop natively. Figure 2 highlights the solution platform difference and the implication of running natively “on” Hadoop versus natively “in” Hadoop.

Figure 2. Traditional ETL Platforms vs. Oracle ODI for Big Data Integration

In the area of Big Data integration, the analysis and customer research discovered the following:

In Big Data integration scenarios, the need for the traditional middle-tier ETL transformation is further obviated, which in the case of IBM and Informatica traditional deployments acts more as a shell/repository for scripts in Big Data deployments. Both vendors do offer Big Data specific integration solutions but they represent unique platforms compared to their core data integration offering. From a cost-of-ownership perspective, customers of IBM and Informatica doing Big Data and “small” data integration projects will likely purchase and manage several integration platforms. This can produce a multiplier effect on the cost of ownership of data integration.

With Oracle, study participants cited no need for a separate platform or deployment footprint for Big Data. Compared to the IBM and Informatica offerings, Oracle’s solution is unique by being fully “aware” of the cluster environment and enabling execution inside of the Hadoop footprint. Oracle’s Big Data solution leverages the same products as their core data integration, namely ODI and GoldenGate, and these products support heterogeneous Big Data systems as they do in relational database integration. That being said, Oracle does offer surrounding products specific to Big Data deployments including connectors for Big Data and options such as the Oracle Big Data Appliance, which provides a preloaded and configured Cloudera Hadoop deployment on an optimized Sun x86 rack system. While Oracle’s data integration offering is optimized for engineered systems for Big Data analytics, it is not limited to such solution architectures.

© 2015 Dao Research. All rights reserved. 11

Page 14: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

From a developer productivity perspective, study participantsrelated that ODI developers can easily transition to Big Datadevelopers, due to the architecture and tooling of ODI. The samebenefits exist in traditional integration projects wherein ODIgenerates the logic that runs natively in the Big Data platform.According to the large financial services participant, “As an ODIdeveloper, I am a Big Data developer without having tounderstand the underpinnings of Big Data. That's prettypowerful capability.”

The nature of Oracle’s data integration offering provides fasterinnovation and more flexibility to support emerging Big Datadeployments using and extending knowledge modules ratherthan a dependency on major platform release cycles, as was indicated with both IBM and Informatica. Asystem integration firm noted that, “There’s been a lot of new code issued by Oracle the last few monthsto support other Hadoop technologies as Hadoop itself evolves; it’s moving more in memory and so on.You’re getting a benefit from the fact that it’s very easy to write support in for other technologies,whereas Informatica would have to almost hard code that in. You’re also getting the benefit of those BigData technologies that you’re using themselves improving. You’re getting double benefit, really, fromthe approach that was taken there.”

GoldenGate provides extreme flexibility and excellent performance in moving data for a variety of BigData integration scenarios. A global media company is using GoldenGate in numerous deploymentscenarios including: Capturing change data with GoldenGate and loading into a Teradata data warehouse from which it

is batch-loaded into Hadoop using Sqoop. Capturing billing data with GoldenGate and using GoldenGate’s flat file converter to inject directly

into Hadoop processing engines. Loading change data with GoldenGate through Apache Kafka into the Apache HBASE and ultimately

into Hadoop processing engines.

Oracle very recently announced support for streaming real-time transactional data into Apache Flume, Apache HDFS, Apache Hive, and Apache Hbase with Oracle GoldenGate for the Big Data offering. The product includes GoldenGate for Java, which enables extending the target to other Big Data systems such as Oracle NoSQL, Apache Kafka, and Apache Spark, among others.

Also recently, Oracle released advanced features for Big Data integration with ODI. Native Pig, Hive, and Oozie execution capability has been added to ODI, which will significantly reduce data and network overhead, increase speed and reduce cost for Big Data customer projects, and free customers from having to decide which technology to adopt as a standard for their Big Data projects.

Lastly, Big Data and cloud integration will invariably intersect as the market evolves. Due to the variety of data sources included in Big Data integration projects, preparation and data quality are very important in realizing the output value of Big Data. To drive innovation in this area, Oracle will soon deliver the Oracle Big Data Preparation Cloud Service. This web-based service enables business users to participate in improving data quality and enrichment for analytics, resulting in a significant cost advantage for data-intensive projects by reducing the amount of time and resources required to ingest and prepare new datasets for downstream IT processes.

Data Integration for Enterprise SystemsAs data integration deployments mature in organizations, the need arises to look holistically across the organization and establish data integration best practices and capabilities for accessing and transforming data across transactional and analytical systems in a consistent, automated, and repeatable way. According to the participants in our study, a solution to this need is becoming more critical as companies increasingly form integration silos and have disparate integration projects. Some organizations have adopted shared services architectures to streamline integration practices—such as with service integration—and include data

“As an ODI developer, I am a Big Data

developer without having to understand the underpinnings of

Big Data. That's pretty powerful capability.”

© 2015 Dao Research. All rights reserved. 12

Page 15: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

integration as part of that initiative. Data integration for enterprise systems also includes normalizing data, improving data quality, and providing governance over data and data integration services. As the number of data integration projects in customer organizations grow, so does the value of streamlining data integration architecture with standardization on a single platform and using integrated solutions both horizontally and vertically.

Horizontal integration refers to how unified the solutions are within the broader integration portfolio including governance, data quality, replication, master data, and metadata management. Although our primary research was not explicitly focused on this use case, the responses that were collected in this area suggest that IBM, Informatica, and Oracle all offer the breadth of requisite solution components that complement each vendor’s core data integration offering. The primary research conducted did not find a significant difference in the horizontal solution integration among the three vendors.

Where there is significant differentiation, particularly comparing Oracle to Informatica, is in the area of vertical, or “stack level,” integration of technologies. As an integration software company, Informatica is limited in terms of the level of integration that can be achieved with other layers of the overall IT stack from application services down to the hardware. IBM also has a vertical stack presence similar to Oracle but it does not include applications. Additionally, its application integration and data integration products live in different product families, WebSphere and InfoSphere, respectively. Partially due to this, IBM professional services are more involved in integration projects. For Oracle, all middleware and integration products are part of the single Fusion Middleware product line.

Oracle’s data integration platform is embedded or can be easily used with other elements of the Oracle stack including enterprise applications themselves, BI apps for packaged applications, SOA Suite for application integration, WebLogic for middleware, Oracle Exadata and Exalogic for engineered systems, and Oracle Enterprise Manager for unified manage-ability, to name a few. This vertical integration results in fast time-to-market and robust “engineered” solutions for customers.

Many of the study participants using Oracle’s data integration platform created reference architectures based on the various components of the broader Oracle IT stack. One of those participants was a large U.S.-based telecomm and wireless provider using ODI in combination with WebLogic to support high availability and clustering for a shared data services implementation framework using Oracle SOA Suite. In addition to a 50% improvement in large-batch data processing, they cited improved data quality and consistency with data correction, monitoring, and audit capabilities. According to the data architect, “ODI being on WebLogic provides excellent performance and high scalability via WebLogic’s clustering to support high availability and load balancing. Our operations team can easily process large data batch loads automatically across virtually all enterprise applications and their data warehouse.”

A Latin American cable services provider also demonstrated an example of building a single corporate real-time integration platform in a standards-based private cloud architecture with a high level of standardization. The private cloud environment was implemented on Oracle Exadata and Oracle Exalogic, and for enterprise-wide real-time integration they are using ODI, Oracle GoldenGate, and Oracle SOA Suite. The benefits achieved are transformational in regard to their IT and business strategy, including:

A 60% reduction in development time. A 50% reduction in manageability cost through native monitoring and error handling via Oracle

Enterprise Manager. An 80% reduction in data quality problems through data validation, an innovative ODI data quality

“firewall.”

“ODI being on WebLogic provides

excellent performance and high scalability via

WebLogic’s clustering to support high

availability and load balancing.”

© 2015 Dao Research. All rights reserved. 13

Page 16: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

Information Availability For today’s data-driven organizations a key requirement is information availability. Global operations and customer expectations require access to critical information on a 24/7 basis. Data integration, especially with near–real-time latency, goes hand in hand with having continuous data availability: If data is critical, it needs both low-latency distribution across different enterprise systems and to be highly available and protected.

Oracle’s real-time data integration offering is geared to support this closely linked data access requirement of business critical systems. As described previously, research shows that the Oracle GoldenGate offering provides support for a wide range of use cases for information availability, many of which can be used in cloud or hybrid cloud environments. The most popular use cases include:

Zero-downtime upgrade and migration Active-active high availability Disaster recovery for non-Oracle databases Query and report offloading Data synchronization within the enterprise

Our primary research touched on this in part and provided examples of data migration comparing GoldenGate to IBM InfoSphere Change Data Capture (formerly known as DataMirror). According to a business services provider, who is a customer of both Oracle and IBM, “GoldenGate was head and shoulders above any of the other data capture tools. We brought in DataMirror—that might have been the only other one that made it into the building, but that actually fell down and wasn't even able to finish the POC as we laid it out. GoldenGate had very low latency on the source system: You can run GoldenGate off of your active data.”

While the core of the primary research was not focused on information availability, secondary research was employed to identify the public case studies for each of the vendors as a proxy for the types and volume of deployments. On the Oracle website, it is pretty straightforward to find the variety of customer case studies for data integration products including GoldenGate. As of the time of publication, there are more than 50 stories specifically covering the areas of real-time data synchronization, zero-downtime migrations, and active-active high availability.

A good use case example for GoldenGate highlighting its performance and data availability offering is a leading social networking company’s use to replicate data among three major active datacenters simultaneously. The result is that if a status is updated on a mobile device and the data resides in one datacenter then that information is immediately accessible to a web user accessing the site from a datacenter across the country. This active-active replication architecture supports both enterprise data synchronization for timely data access and continuous database availability across regions.

For Informatica and IBM, the exercise of determining type and quantity of use cases is not as straightforward because the solutions are less clearly mapped to product or specific use case. For Informatica, their Data Replication product web page only references a handful of customer case studies and, of those, one is focused

“GoldenGate was head and shoulders above any of the

other data capture tools. We brought in

DataMirror—that might have been the only other one that made it into the building, but that actually fell down and wasn't even

able to finish the POC as we laid it out. GoldenGate

had very low latency on the source system: You can run

GoldenGate off of your active data.”

© 2015 Dao Research. All rights reserved. 14

Page 17: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

on enterprise and cloud integration and one is focused on real-time analytics.[1] From IBM’s perspective, it is even more obscure given their branding and vast product portfolio. They have several product lines, including InfoSphere Data Replication and InfoSphere Change Data Capture, each of which consists of two products, a general heterogeneous one and another specific to DB2 for z/OS. There is no direct linkage to customer case studies from the product pages, so it is not easy to determine what types of and how many use cases they support. That said, given IBM’s installed base there is undoubtedly some level of deployment for information availability–related use cases but the extent is unknown.

Not surprisingly, cloud environments and cloud-based services are prime spots for information availability use cases. Our research uncovered Oracle Managed Cloud Services’ (OMCS) use of Oracle’s data integration portfolio for its customers in different solutions, including:

Zero-downtime upgrade and patching using a primary and disaster recovery instance of Seibel 8.x. Near-downtime upgrade of Enterprise JD Edwards. Offloading of the PeopleSoft ETL process to the OMCS for maintaining high performance. According to OMCS management, ODI products’ tight vertical integration across the Oracle stack with

certified solutions for Oracle applications makes the OMCS environment easier to manage and more reliable for its customers.

1Informatica Data Replication product web page

© 2015 Dao Research. All rights reserved. 15

Page 18: Data Integration Platforms for Big Data and the Enterprise: Customer

Data Integration Platforms for Big Data and the Enterprise

Conclusions and Other Considerations

he data integration market and solution requirements have increased in scope and complexity, with demands relating to the volume, sources, and types of data; the diversity of data integration scenarios; the drive toward real-time analytics; and the increasing solution consumption options including the

cloud. Big Data integration has further emphasized the pendulum shift from centralized, vendor tool–centric processing to a distributed model leveraging a growing number of native, purpose-centric processing modes such as Hadoop.

Oracle’s data integration offering, by architectural design and inherent capability, is well suited to this shift with unified and streamlined solutions that support a broad array of integration scenarios, the prerequisite efficient movement of data, and massive-scale performance via enterprise stack integration and engineered systems such as Exadata. IBM and Informatica, as leaders in the traditional ETL market, are challenged to address the emerging paradigm with their existing platforms of DataStage and PowerCenter, respectively. To address that gap, they market new platforms for Big Data and cloud integration and, in turn, add to their already broad and often overlapping platform and product portfolios.

Revisiting the elements of the framework utilized in this analysis, the net of this as an Oracle data integration customer is the following:

Greater flexibility to support a growing array of data integration scenarios with a streamlined product set.

Higher developer productivity, on the order of 30% to 60%, based on fundamental differences in architecture and capabilities such as knowledge modules and declarative design that have no parallel in IBM and Informatica.

Lower cost of ownership for core data integration usage scenarios, such as BI and data warehousing and a multiplication of cost savings associated with Big Data and cloud integration derived from a unified platform, flexibility, and reliability of ODI and GoldenGate.

Faster time-to-market and innovation based on seizing emerging technology advancements.

T

© 2015 Dao Research. All rights reserved. 16


Recommended