6

Click here to load reader

idc-big-data-1533773

Embed Size (px)

Citation preview

Page 1: idc-big-data-1533773

Filing Information: February 2012, IDC #233348, Volume: 1, Tab: Vendors

Database Management and Data Integration Software: Insight

I N S I G H T

O r a c l e ' s A l l - O u t A s s a u l t o n t h e B i g D a t a M a r k e t : O f f e r i n g H a d o o p , R , C u b e s , a n d S c a l a b l e I M D B i n F a m i l i a r P a c k a g e s

Carl W. Olofson

I D C O P I N I O N

The Big Data space is rapidly evolving. The first wave of adoption involved Web-

based companies such as online retailers, service providers, and social media firms.

These companies adopted open source technologies such as Apache Hadoop and

used considerable in-house technical expertise to build business solutions on top of

these open source foundations. The second wave will involve businesses that both

lack technical teams of the same size and depth as the Web-based companies and

are averse to the risk and cost associated with large investments in original software

development. These businesses will be attracted to finished products from

established companies that offer short paths to business analytic solutions using Big

Data technologies. Oracle is seeking to appeal to such firms with:

Well packaged sets of preinstalled, integrated, and optimized software on select

hardware in the form of engineered systems and appliances

Products offered in a way that enables users to integrate them into their existing

Oracle Database and Fusion Middleware environment

Technologies that include the Big Data capabilities in highest demand, including

Hadoop, support for the R language, and scalable in-memory database

functionality (IMDB)

I N T H I S I N S I G H T

This IDC Insight considers a number of key product announcements made by Oracle

in January and February 2012 as well as their role in the company's strategy with

respect to Big Data and their likely impact on the software markets associated with

Big Data technology. The most recent announcement concerns Oracle Advanced

Analytics, an option of Oracle Database 11g. This announcement aligns strategically

with the following three product announcements that establish comprehensive Oracle

coverage of the Big Data space:

Oracle Exalytics In-Memory Machine

Oracle Big Data Appliance

Oracle TimesTen In-Memory Database 11g Release 2

Glo

bal H

eadquart

ers

: 5 S

peen S

treet F

ram

ingham

, M

A 0

1701 U

SA

P

.508.8

72.8

200 F

.508.9

35.4

015 w

ww

.idc.

com

Page 2: idc-big-data-1533773

2 #233348 ©2012 IDC

Taken together, these products address three key Big Data areas: advanced and

large-scale analytics, Hadoop-based data classification and extraction, and scalable

in-memory database (IMDB) technology.

S I T U A T I O N O V E R V I E W

H i g h l i g h t s

On February 8, 2012, Oracle announced general availability of Oracle Advanced

Analytics. This option of Oracle Database 11g Enterprise Edition includes Oracle

Data Mining and a new component called Oracle R Enterprise, which embeds R

analytic capability in the database server. Previously, Oracle announced the Oracle

Exalytics In-Memory Machine and the Oracle Big Data Appliance at Oracle

OpenWorld 2011. In mid-January 2012, the company announced pricing and general

availability for these two products plus a greatly enhanced version of the in-memory

relational database management system (RDBMS), Oracle TimesTen. Taken

together, this database option and these three products address key areas of the Big

Data space and represent a significant move by Oracle to establish itself as a major

Big Data player. IDC identifies three key areas of Big Data as:

Large-scale advanced analytics

Hadoop-driven Big Data processing

Scalable in-memory database management

This combination represents a comprehensive approach to the Big Data problem

space. This Insight considers each area in turn, focusing on how Oracle is addressing

it.

A n a l y s i s

Oracle describes its approach to the Big Data space as encompassing four key

stages:

Acquire: Collect, ingest, and format data for analysis

Organize: Put data into an order that supports either deep analysis or integration

into a larger structured data collection, such as a data warehouse

Analyze: Perform either standard query-based/online analytical processing

(OLAP) analysis or deep statistical analysis on the resulting data set

Decide: Yield results that can drive both tactical and strategic business decisions

The Oracle Big Data Appliance takes the user from the Acquire to the Organize

stage, the Oracle Exadata Database Machine (or other Oracle Database 11g

Enterprise Edition installation) with the Oracle Advanced Analytics option takes the

user from the Organize to the Analyze stage, and the Oracle Exalytics In-Memory

Machine takes the user from the Analyze to the Decide stage.

Page 3: idc-big-data-1533773

©2012 IDC #233348 3

These products (note that Oracle Exalytics In-Memory Machine includes Oracle

TimesTen) fall into the three functional areas described in this Insight as key

elements of the Big Data space.

Large-Scale Advanced Analytics

This functional area includes the ability to accumulate large amounts of data in a

scalable space for high-performance deep analysis.

Oracle is addressing this area with two product offerings:

Oracle Advanced Analytics is an Oracle Database 11g Enterprise Edition

option that includes Oracle Data Mining and Oracle R Enterprise for those that

wish to perform deep data mining and analytics driven by the R language, with

those analytics executing in the database engine.

Oracle Exalytics In-Memory Machine is for those seeking an engineered

system that is preconfigured to support classic online analytical processing using

in-memory cubes powered by Oracle Essbase, or relational data held in memory

by Oracle TimesTen for fast execution. (Note that Exalytics can support large

data sets that extend beyond the main memory capacity of the system by

sending SQL queries to a back-end database such as Oracle running on

Exadata.)

Oracle Advanced Analyt ics

This option of Oracle Database 11g Enterprise Edition has two components: Oracle

Data Mining and Oracle R Enterprise. The former is an upgraded version of the data

mining option that Oracle has offered for a number of years. The latter is a capability

embedded in the database engine that allows the user to build R analytics that

execute in the database close to the data for better performance. The system allows

R users to access table data within the database using the familiar variables and

other constructs of the R language. Data retrieval, statistical and predictive analysis

operations, and advanced numerical computations expressed in R are converted into

SQL and executed under the covers, so the R programmer does not need to have

expertise in relational database technology or the structure of the database in

question. The role of this option is to allow "quants" that prefer to use R as their

means of doing deep analytics to use that language in a high-performance way

directly against the database data rather than as an external facility that requires

considerable configuration to set up.

It should be noted that Oracle Advanced Analytics is a database option and so can be

used with any installation of Oracle Database 11g Enterprise Edition. This also means

that it can be used within the Oracle Exadata Database Machine. When Oracle

Advanced Analytics is used with the Oracle Real Application Clusters (RAC) option of

Oracle Database, or within the Oracle Exadata Database Machine (which includes

RAC), the user also takes advantage of the scalability of parallel SQL execution,

which IDC also considers a key Big Data characteristic for relational database.

Page 4: idc-big-data-1533773

4 #233348 ©2012 IDC

Exalyt ics In-Memory Machine

This product is used to perform deep analysis of large amounts of business

intelligence (BI) data quickly. It combines Oracle Business Intelligence Enterprise

Edition (OBIEE) with enhanced visualization capabilities and performance

optimizations, an optimized version of Oracle TimesTen In-Memory Database with

analytic extensions, and an optimized version of Oracle Essbase for analyzing OLAP

cubes in memory. It is delivered as an engineered system, with the hardware

configured specifically for the Oracle TimesTen In-Memory Database for Exalytics

and Oracle Business Intelligence Foundation software, which includes Oracle

Business Intelligence Enterprise Edition and Oracle Essbase.

The idea, as with all Oracle's engineered systems, is to deliver a product that can be

set up and used with a minimum of effort, involving virtually no installation and only

the tuning and configuration necessary for the specific analysis required by the user.

Other products that feature IMDB functionality with analytics require considerable

installation and configuration before use.

Hadoop-Driven Big Data Processing

This is the most mature of the new technology areas in the Big Data space. It involves

the ability to accept either complex, heterogeneous (or unstructured) data or high-

volume streams of machine-generated data; analyze the data for elements of value or

for meaningful patterns; and provide analytical results or structured output, or both,

generally leading to further analysis. This capability is generally addressed using the

MapReduce paradigm, and the most common form of that paradigm is the open

source Apache Hadoop set of technology.

Oracle Big Data Appl iance

Oracle Big Data Appliance is an engineered system that provides a preconfigured

installation of Cloudera's distribution that includes Apache Hadoop and associated

project software. Oracle provides frontline support for this software, with back-end

support from Cloudera, and enables the user to choose between standard Hadoop

HDFS-based HBase and the Oracle NoSQL Database (developed from Berkeley DB)

as the data management engine for query and analysis. (It should be noted that

Oracle is among a number of vendors offering faster, more flexible alternatives to

HBase for Hadoop users.) Hadoop applications can be integrated into Oracle

environment using the Oracle Big Data Connectors (a package that includes

optimized integration into the database), Oracle Loader for Hadoop, Oracle Data

Integration Application Adapter for Hadoop, Oracle R Connector for Hadoop, and

Oracle Direct Connector for HDFS.

The Hadoop installation is a full Cloudera distribution that includes Cloudera

Manager, all fully supported by Oracle, with Cloudera providing level 2 and 3 support.

It also includes an open source distribution of R and the Oracle NoSQL Database

Community Edition. All are packaged in an appliance format on a machine with 216

CPU cores and 864GB of RAM, with 648TB of raw disk storage, and internally

connected via an internal 40Gbps InfiniBand network.

Page 5: idc-big-data-1533773

©2012 IDC #233348 5

Scalable IMDB Management

It is well understood that in-memory data management yields orders of magnitude

better performance than any disk-based alternative. The Big Data dimension of this

approach, and the one that really sets up IMDB as the future of database

management generally, is the use of clustered servers on high-speed network with

peer-to-peer background replication to deliver nearly limitless scalability with solid

recoverability. A number of IMDB technologies have been moving in this direction for

a while, though most were nonrelational.

Oracle T imesTen 11g Release 2

The sleeper announcement of the year may be that of Oracle TimesTen 11g Release

2, which includes a scalable cache grid for in-memory relational database

management that can scale to a larger size than can be supported in a single server's

main memory space while retaining the high-performance characteristics of memory-

based data management. Currently, such scaling can be accomplished by

deployment within the Oracle Exalogic machine and using its built-in high-speed

network that can support up to eight nodes. Further scaling can be achieved by

linking multiple Oracle Exalogic machines together with InfiniBand connections. This

configuration is normally applied to the use of Oracle TimesTen as a cache for Oracle

Database and so is called the TimesTen In-Memory Database Cache Grid. Logically,

however, it could be used as a standalone database with a similar configuration,

either within Exalogic or on user-configured hardware. Recoverability is assured by

transaction replication from the executing server to standby or subscribing servers.

Further recoverability with reduced latency is achieved by the writing of parallel logs.

Oracle TimesTen can be optimized for either OLTP or analytic workloads. The

analytic workload optimization includes columnar data management. When used as a

cache for Oracle Database, TimesTen can be configured for either read/write caching

with parallel replication of transactions and parallel write-through to the database or

read-only caching with multistream refresh of transactions from the database and

parallel replication of the refresh transactions to standby nodes. As was previously

mentioned, TimesTen is also the in-memory RDBMS component of the Exalytics In-

Memory Machine.

Competitive Landscape

Oracle's comprehensiveness in approaching both the Big Data landscape overall and

how the products fit together represents a formidable challenge to any vendor hoping

to offer end-to-end business-oriented Big Data solutions. There are, however, clear

competitors in each of the Big Data areas.

F U T U R E O U T L O O K

Big Data is a fast-moving space, and it is reasonable to expect that various

combinations of products, old and new, will form to challenge Oracle in each of the

Big Data areas described in this Insight. Some will be narrow, deep technologies that

perform certain analytic functions very well. Others will be broad based. Oracle's

approach, based on both software functionality and Oracle's engineered systems

Page 6: idc-big-data-1533773

6 #233348 ©2012 IDC

strategy, can become well entrenched in user sites, however, as long as Oracle

strives to move forward with these technologies.

E S S E N T I A L G U I D A N C E

A c t i o n s t o C o n s i d e r

The Big Data space remains bewildering both for those in the business of making

technical solutions and for users of those solutions. Some things to consider going

forward are discussed in the sections that follow.

Advice for Buyers

Big Data is a fast-moving space, and approaches that seem "standard" may not be so

tomorrow. Oracle's products offer a variety of approaches to Big Data management

and analysis. This offers options, but one should regard the purchase of an

engineered system or appliance as an investment in the future, not just a short-term

solution. So, buyers should be circumspect and work out their long-range plans for

the proper exploitation of Big Data for the foreseeable future before making significant

commitments.

Advice for Other Vendors

Oracle's Big Data offerings are well packaged and fairly complete. Competing

vendors must first decide if they want to concentrate on certain Big Data analytic or

management problems, or if they want to compete on a level of breadth similar to that

of Oracle. If they choose the latter, they should seek to be as comprehensive, either

on their own or through partners, and to seek details regarding the Oracle products

that represent opportunities to win through differentiation.

C o p y r i g h t N o t i c e

This IDC research document was published as part of an IDC continuous intelligence

service, providing written research, analyst interactions, telebriefings, and

conferences. Visit www.idc.com to learn more about IDC subscription and consulting

services. To view a list of IDC offices worldwide, visit www.idc.com/offices. Please

contact the IDC Hotline at 800.343.4952, ext. 7988 (or +1.508.988.7988) or

[email protected] for information on applying the price of this document toward the

purchase of an IDC service or for information on additional copies or Web rights.

Copyright 2012 IDC. Reproduction is forbidden unless authorized. All rights reserved.