Click here to load reader
Upload
debashisdas
View
39
Download
0
Embed Size (px)
Citation preview
Filing Information: February 2012, IDC #233348, Volume: 1, Tab: Vendors
Database Management and Data Integration Software: Insight
I N S I G H T
O r a c l e ' s A l l - O u t A s s a u l t o n t h e B i g D a t a M a r k e t : O f f e r i n g H a d o o p , R , C u b e s , a n d S c a l a b l e I M D B i n F a m i l i a r P a c k a g e s
Carl W. Olofson
I D C O P I N I O N
The Big Data space is rapidly evolving. The first wave of adoption involved Web-
based companies such as online retailers, service providers, and social media firms.
These companies adopted open source technologies such as Apache Hadoop and
used considerable in-house technical expertise to build business solutions on top of
these open source foundations. The second wave will involve businesses that both
lack technical teams of the same size and depth as the Web-based companies and
are averse to the risk and cost associated with large investments in original software
development. These businesses will be attracted to finished products from
established companies that offer short paths to business analytic solutions using Big
Data technologies. Oracle is seeking to appeal to such firms with:
Well packaged sets of preinstalled, integrated, and optimized software on select
hardware in the form of engineered systems and appliances
Products offered in a way that enables users to integrate them into their existing
Oracle Database and Fusion Middleware environment
Technologies that include the Big Data capabilities in highest demand, including
Hadoop, support for the R language, and scalable in-memory database
functionality (IMDB)
I N T H I S I N S I G H T
This IDC Insight considers a number of key product announcements made by Oracle
in January and February 2012 as well as their role in the company's strategy with
respect to Big Data and their likely impact on the software markets associated with
Big Data technology. The most recent announcement concerns Oracle Advanced
Analytics, an option of Oracle Database 11g. This announcement aligns strategically
with the following three product announcements that establish comprehensive Oracle
coverage of the Big Data space:
Oracle Exalytics In-Memory Machine
Oracle Big Data Appliance
Oracle TimesTen In-Memory Database 11g Release 2
Glo
bal H
eadquart
ers
: 5 S
peen S
treet F
ram
ingham
, M
A 0
1701 U
SA
P
.508.8
72.8
200 F
.508.9
35.4
015 w
ww
.idc.
com
2 #233348 ©2012 IDC
Taken together, these products address three key Big Data areas: advanced and
large-scale analytics, Hadoop-based data classification and extraction, and scalable
in-memory database (IMDB) technology.
S I T U A T I O N O V E R V I E W
H i g h l i g h t s
On February 8, 2012, Oracle announced general availability of Oracle Advanced
Analytics. This option of Oracle Database 11g Enterprise Edition includes Oracle
Data Mining and a new component called Oracle R Enterprise, which embeds R
analytic capability in the database server. Previously, Oracle announced the Oracle
Exalytics In-Memory Machine and the Oracle Big Data Appliance at Oracle
OpenWorld 2011. In mid-January 2012, the company announced pricing and general
availability for these two products plus a greatly enhanced version of the in-memory
relational database management system (RDBMS), Oracle TimesTen. Taken
together, this database option and these three products address key areas of the Big
Data space and represent a significant move by Oracle to establish itself as a major
Big Data player. IDC identifies three key areas of Big Data as:
Large-scale advanced analytics
Hadoop-driven Big Data processing
Scalable in-memory database management
This combination represents a comprehensive approach to the Big Data problem
space. This Insight considers each area in turn, focusing on how Oracle is addressing
it.
A n a l y s i s
Oracle describes its approach to the Big Data space as encompassing four key
stages:
Acquire: Collect, ingest, and format data for analysis
Organize: Put data into an order that supports either deep analysis or integration
into a larger structured data collection, such as a data warehouse
Analyze: Perform either standard query-based/online analytical processing
(OLAP) analysis or deep statistical analysis on the resulting data set
Decide: Yield results that can drive both tactical and strategic business decisions
The Oracle Big Data Appliance takes the user from the Acquire to the Organize
stage, the Oracle Exadata Database Machine (or other Oracle Database 11g
Enterprise Edition installation) with the Oracle Advanced Analytics option takes the
user from the Organize to the Analyze stage, and the Oracle Exalytics In-Memory
Machine takes the user from the Analyze to the Decide stage.
©2012 IDC #233348 3
These products (note that Oracle Exalytics In-Memory Machine includes Oracle
TimesTen) fall into the three functional areas described in this Insight as key
elements of the Big Data space.
Large-Scale Advanced Analytics
This functional area includes the ability to accumulate large amounts of data in a
scalable space for high-performance deep analysis.
Oracle is addressing this area with two product offerings:
Oracle Advanced Analytics is an Oracle Database 11g Enterprise Edition
option that includes Oracle Data Mining and Oracle R Enterprise for those that
wish to perform deep data mining and analytics driven by the R language, with
those analytics executing in the database engine.
Oracle Exalytics In-Memory Machine is for those seeking an engineered
system that is preconfigured to support classic online analytical processing using
in-memory cubes powered by Oracle Essbase, or relational data held in memory
by Oracle TimesTen for fast execution. (Note that Exalytics can support large
data sets that extend beyond the main memory capacity of the system by
sending SQL queries to a back-end database such as Oracle running on
Exadata.)
Oracle Advanced Analyt ics
This option of Oracle Database 11g Enterprise Edition has two components: Oracle
Data Mining and Oracle R Enterprise. The former is an upgraded version of the data
mining option that Oracle has offered for a number of years. The latter is a capability
embedded in the database engine that allows the user to build R analytics that
execute in the database close to the data for better performance. The system allows
R users to access table data within the database using the familiar variables and
other constructs of the R language. Data retrieval, statistical and predictive analysis
operations, and advanced numerical computations expressed in R are converted into
SQL and executed under the covers, so the R programmer does not need to have
expertise in relational database technology or the structure of the database in
question. The role of this option is to allow "quants" that prefer to use R as their
means of doing deep analytics to use that language in a high-performance way
directly against the database data rather than as an external facility that requires
considerable configuration to set up.
It should be noted that Oracle Advanced Analytics is a database option and so can be
used with any installation of Oracle Database 11g Enterprise Edition. This also means
that it can be used within the Oracle Exadata Database Machine. When Oracle
Advanced Analytics is used with the Oracle Real Application Clusters (RAC) option of
Oracle Database, or within the Oracle Exadata Database Machine (which includes
RAC), the user also takes advantage of the scalability of parallel SQL execution,
which IDC also considers a key Big Data characteristic for relational database.
4 #233348 ©2012 IDC
Exalyt ics In-Memory Machine
This product is used to perform deep analysis of large amounts of business
intelligence (BI) data quickly. It combines Oracle Business Intelligence Enterprise
Edition (OBIEE) with enhanced visualization capabilities and performance
optimizations, an optimized version of Oracle TimesTen In-Memory Database with
analytic extensions, and an optimized version of Oracle Essbase for analyzing OLAP
cubes in memory. It is delivered as an engineered system, with the hardware
configured specifically for the Oracle TimesTen In-Memory Database for Exalytics
and Oracle Business Intelligence Foundation software, which includes Oracle
Business Intelligence Enterprise Edition and Oracle Essbase.
The idea, as with all Oracle's engineered systems, is to deliver a product that can be
set up and used with a minimum of effort, involving virtually no installation and only
the tuning and configuration necessary for the specific analysis required by the user.
Other products that feature IMDB functionality with analytics require considerable
installation and configuration before use.
Hadoop-Driven Big Data Processing
This is the most mature of the new technology areas in the Big Data space. It involves
the ability to accept either complex, heterogeneous (or unstructured) data or high-
volume streams of machine-generated data; analyze the data for elements of value or
for meaningful patterns; and provide analytical results or structured output, or both,
generally leading to further analysis. This capability is generally addressed using the
MapReduce paradigm, and the most common form of that paradigm is the open
source Apache Hadoop set of technology.
Oracle Big Data Appl iance
Oracle Big Data Appliance is an engineered system that provides a preconfigured
installation of Cloudera's distribution that includes Apache Hadoop and associated
project software. Oracle provides frontline support for this software, with back-end
support from Cloudera, and enables the user to choose between standard Hadoop
HDFS-based HBase and the Oracle NoSQL Database (developed from Berkeley DB)
as the data management engine for query and analysis. (It should be noted that
Oracle is among a number of vendors offering faster, more flexible alternatives to
HBase for Hadoop users.) Hadoop applications can be integrated into Oracle
environment using the Oracle Big Data Connectors (a package that includes
optimized integration into the database), Oracle Loader for Hadoop, Oracle Data
Integration Application Adapter for Hadoop, Oracle R Connector for Hadoop, and
Oracle Direct Connector for HDFS.
The Hadoop installation is a full Cloudera distribution that includes Cloudera
Manager, all fully supported by Oracle, with Cloudera providing level 2 and 3 support.
It also includes an open source distribution of R and the Oracle NoSQL Database
Community Edition. All are packaged in an appliance format on a machine with 216
CPU cores and 864GB of RAM, with 648TB of raw disk storage, and internally
connected via an internal 40Gbps InfiniBand network.
©2012 IDC #233348 5
Scalable IMDB Management
It is well understood that in-memory data management yields orders of magnitude
better performance than any disk-based alternative. The Big Data dimension of this
approach, and the one that really sets up IMDB as the future of database
management generally, is the use of clustered servers on high-speed network with
peer-to-peer background replication to deliver nearly limitless scalability with solid
recoverability. A number of IMDB technologies have been moving in this direction for
a while, though most were nonrelational.
Oracle T imesTen 11g Release 2
The sleeper announcement of the year may be that of Oracle TimesTen 11g Release
2, which includes a scalable cache grid for in-memory relational database
management that can scale to a larger size than can be supported in a single server's
main memory space while retaining the high-performance characteristics of memory-
based data management. Currently, such scaling can be accomplished by
deployment within the Oracle Exalogic machine and using its built-in high-speed
network that can support up to eight nodes. Further scaling can be achieved by
linking multiple Oracle Exalogic machines together with InfiniBand connections. This
configuration is normally applied to the use of Oracle TimesTen as a cache for Oracle
Database and so is called the TimesTen In-Memory Database Cache Grid. Logically,
however, it could be used as a standalone database with a similar configuration,
either within Exalogic or on user-configured hardware. Recoverability is assured by
transaction replication from the executing server to standby or subscribing servers.
Further recoverability with reduced latency is achieved by the writing of parallel logs.
Oracle TimesTen can be optimized for either OLTP or analytic workloads. The
analytic workload optimization includes columnar data management. When used as a
cache for Oracle Database, TimesTen can be configured for either read/write caching
with parallel replication of transactions and parallel write-through to the database or
read-only caching with multistream refresh of transactions from the database and
parallel replication of the refresh transactions to standby nodes. As was previously
mentioned, TimesTen is also the in-memory RDBMS component of the Exalytics In-
Memory Machine.
Competitive Landscape
Oracle's comprehensiveness in approaching both the Big Data landscape overall and
how the products fit together represents a formidable challenge to any vendor hoping
to offer end-to-end business-oriented Big Data solutions. There are, however, clear
competitors in each of the Big Data areas.
F U T U R E O U T L O O K
Big Data is a fast-moving space, and it is reasonable to expect that various
combinations of products, old and new, will form to challenge Oracle in each of the
Big Data areas described in this Insight. Some will be narrow, deep technologies that
perform certain analytic functions very well. Others will be broad based. Oracle's
approach, based on both software functionality and Oracle's engineered systems
6 #233348 ©2012 IDC
strategy, can become well entrenched in user sites, however, as long as Oracle
strives to move forward with these technologies.
E S S E N T I A L G U I D A N C E
A c t i o n s t o C o n s i d e r
The Big Data space remains bewildering both for those in the business of making
technical solutions and for users of those solutions. Some things to consider going
forward are discussed in the sections that follow.
Advice for Buyers
Big Data is a fast-moving space, and approaches that seem "standard" may not be so
tomorrow. Oracle's products offer a variety of approaches to Big Data management
and analysis. This offers options, but one should regard the purchase of an
engineered system or appliance as an investment in the future, not just a short-term
solution. So, buyers should be circumspect and work out their long-range plans for
the proper exploitation of Big Data for the foreseeable future before making significant
commitments.
Advice for Other Vendors
Oracle's Big Data offerings are well packaged and fairly complete. Competing
vendors must first decide if they want to concentrate on certain Big Data analytic or
management problems, or if they want to compete on a level of breadth similar to that
of Oracle. If they choose the latter, they should seek to be as comprehensive, either
on their own or through partners, and to seek details regarding the Oracle products
that represent opportunities to win through differentiation.
C o p y r i g h t N o t i c e
This IDC research document was published as part of an IDC continuous intelligence
service, providing written research, analyst interactions, telebriefings, and
conferences. Visit www.idc.com to learn more about IDC subscription and consulting
services. To view a list of IDC offices worldwide, visit www.idc.com/offices. Please
contact the IDC Hotline at 800.343.4952, ext. 7988 (or +1.508.988.7988) or
[email protected] for information on applying the price of this document toward the
purchase of an IDC service or for information on additional copies or Web rights.
Copyright 2012 IDC. Reproduction is forbidden unless authorized. All rights reserved.