Oracles Big Data solutions Jean-Philippe Breysse Oracle
Suisse
Slide 3
The following is intended to outline our general product
direction. It is intended for information purposes only, and may
not be incorporated into any contract. It is not a commitment to
deliver any material, code, or functionality, and should not be
relied upon in making purchasing decisions. The development,
release, and timing of any features or functionality described for
Oracles products remain at the sole discretion of Oracle.
Slide 4
Copyright 2012, Oracle and/or its affiliates. All rights
reserved.Insert Information Protection Policy Classification from
Slide 13 4 USE CASE 3: LOGS ANALYSIS OF SERVERS Short Description :
Daily logs analysis Issues: Find correlations on what drives to
failures Log files stored as flat files
Slide 5
Copyright 2012, Oracle and/or its affiliates. All rights
reserved.Insert Information Protection Policy Classification from
Slide 13 5 Oracle Technology mapped to Analytics Landscape Acquire
Analyze Organize Data Decide Structured Semi-structured
Unstructured Master & Reference Transactions Machine Generated
Text, Image, Video, Audio Oracle 12g Files Oracle NoSQL Oracle
Hadoop HDFS Oracle Data Integrator Oracle Hadoop MapReduce Oracle
12g Oracle Essbas e Oracle R Enterprise & Oracle Data Mining
Oracle BI Enterprise Oracle Real Time Decisions Oracle Endeca
Information Discovery Oracle Times Ten Endeca MDEX Oracle Golden
Gate
Slide 6
Agenda Big Data Solution Spectrum Inside the Big Data Appliance
Big Data Applications Software Big Data Analytics Conclusions
Slide 7
Big Data Why Everyone Should Care
Slide 8
Tapping into Diverse Data Sets Transactions Information
Architectures Today: Decisions based on database data Big Data:
Decisions based on all your data Video and Images Machine-Generated
Data Social Data Documents
Slide 9
9 A bit of history... : Developed initially by Doug Cutting
(Nutch - Opensource websearch engine) and Yahoo -> inspired by
Googles papers on MapReduce and GFS (2003-2004) resulted in Apache
Hadoop (2006) Amazon Dynamo (2007): distributed systems
technologies Cassandra: was developed at Facebook (2008) to power
their Inbox Search feature (columnar oriented distributed DB) based
initially on Dynamo and Bigtable (built by Google) Voldemort: is a
distributed data store that is designed as a key-value store used
by LinkedIn for high-scalability storage (NoSql key value)
Cloudera:. It contributes to Hadoop and related Apache projects and
provides a commercial distribution of Hadoop
Slide 10
10 So What is Big Data Anyway? Its a matter of perspective. Big
Data is both: LARGE AND VARIABLE DATASETS that are difficult for
traditional database tools to easily manage including datasets that
once seemed not important or too problematic to deal with. Big Data
datasets include: Extremely large files of unstructured or
semi-structured data Large and highly distributed datasets that are
otherwise difficult to manage as a single unit of information NEW
SET OF TECHNOLOGIES that can economically capture, store, manage,
and extract value from Big Data datasets thus facilitating better,
more informed business decisions Structured Data vs. Unstructured
Data Relational databases work best with structured data data which
has underlying structure (schema) and size that easily fits the
specific confines of database columns and rows. Unstructured data
is highly variable, lacks fixed structure, and is often too large
to easily handle by RDBMS systems. Source: IDC Digital Universe
Study, Extracting Value from Chaos, June 2011 (sponsored by EMC)IDC
Digital Universe Study, Extracting Value from Chaos, June 2011
(sponsored by EMC)
Slide 11
Drive Value from Big Data Building a Big Data Platform
Hadoop to Oracle Bridging the Gap AcquireAnalyze Organize
Hadoop MapReduce HDFS Cassandra RDBMS (OLTP) RDBMS (DW) Advanced
Analytics ETL Oracle Loader for Hadoop Schema-less Unstructured
Data Variety Schema
Slide 14
Oracle Integrated Software Solution AcquireAnalyze Organize
Oracle (DW) Oracle (OLTP) Schema-less Unstructured Data Variety
Schema Hadoop HDFS Oracle NoSQL DB Oracle Analytics: Data Mining R
Spatial Graph mapreduce Oracle Analytics: Data Mining R Spatial
Graph mapreduce OBI EE Oracle Data Integrator Oracle Loader for
Hadoop
Slide 15
Inside the Big Data Appliance Overview
Slide 16
16 Copyright 2011, Oracle and/or its affiliates. All rights
reserved. Insert Information Protection Policy Classification from
Slide 8 Oracle Engineered Solutions Acquire Analyze Organize Oracle
Database (DW) Oracle Database (DW) Oracle Database (OLTP) Oracle
Database (OLTP) In-DB Analytics R Mining Text Graph Spatial In-DB
Analytics R Mining Text Graph Spatial Oracle BI EE Oracle BI EE
Oracle NoSQL DB HDFS Hadoop Oracle Data Integrator Oracle Loader
for Hadoop Data Variety Information Density Unstructured Schema Big
Data Appliance Hadoop NoSQL Database Oracle Loader for hadoop
Oracle Data Integrator Oracle Exadata OLTP & DW Data Mining
& Oracle R Semantics Spatial Exalytics Speed of Thought
Analytics
Slide 17
Big Data Appliance Usage Model InfiniBand Oracle Big Data
Appliance Oracle Exadata AcquireOrganizeAnalyze & Visualize
Stream Oracle Exalytics InfiniBand
Slide 18
Why build a Hadoop Appliance? Time to Build? Required
Expertise? Cost and Difficulty Maintaining?
Slide 19
18 Sun X4270 M2 Servers 48 GB memory per node = 864 GB memory
12 Intel cores per node = 216 cores 24 TB storage per node = 432 TB
storage 40 Gb p/sec InfiniBand 10 Gb p/sec Ethernet Oracle Big Data
Appliance Hardware
Slide 20
Big Data Appliance Cluster of industry standard servers for
Hadoop and NoSQL Database Focus on Scalability and Availability at
low cost Compute and Storage 18 High-performance low-cost servers
acting as Hadoop nodes 24 TB Capacity per node 2 6-core CPUs per
node Hadoop triple replication NoSQL Database triple replication
10GigE Network 8 10GigE ports Datacenter connectivity InfiniBand
Network Redundant 40Gb/s switches IB connectivity to Exadata
Slide 21
Scale Out to Infinity Scale out by connecting racks to each
other using Infiniband Expand up to eight racks without additional
switches Scale beyond eight racks by adding an additional
switch
Slide 22
Oracle Enterprise Linux 5.6 Oracle Hotspot Java VM Clouderas
Distribution including Apache Hadoop Cloudera Manager Open Source
Distribution of R Oracle NoSQL Database Community Edition Oracle
Big Data Appliance Software
Slide 23
Why Open-Source Apache Hadoop? Fast evolution in critical
features Built by the Hadoop experts in the community Practical
instead of esoteric Focus on what is needed for large clusters
Proven at very large scale In production at all the large consumers
of Hadoop Extremely stable in those environments Well-understood by
practitioners
Slide 24
Software Layout Node 1: M: Name Node, Balancer & HBase
Master S: HDFS Data Node, NoSQL DB Storage Node Node 2: M:
Secondary Name Node, Management, Zookeeper, MySQL Slave S: HDFS
Data Node, NoSQL DB Storage Node Node 3: M: JobTracker, MySQL
Master, ODI Agent, Hive Server S: HDFS Data Node, NoSQL DB Storage
Node Node 4 18: S: HDFS Data Nodes, Task Tracker, HBase Region
Server, NoSQL DB Storage Nodes Your MapReduce runs here!
Slide 25
Big Data Application Software Acquire New Information
Slide 26
Key-Value Store Workloads Large dynamic schema based data
repositories Data capture Web applications (click-through capture)
Online retail Sensor/statistics/network capture (factory automation
for example) Backup services for mobile devices Data services
Scalable authentication Real-time communication (MMS, SMS, routing)
Personalization Social Networks
Slide 27
Oracle NoSQL DB A distributed, scalable key-value database
Simple Data Model Key-value pair with major+sub-key paradigm
Read/insert/update/delete operations Scalability Dynamic data
partitioning and distribution Optimized data access via intelligent
driver High availability One or more replicas Disaster recovery
through location of replicas Resilient to partition master failures
No single point of failure Transparent load balancing Reads from
master or replicas Driver is network topology & latency aware
Elastic (Planned for Release 2) Online addition/removal of Storage
Nodes Automatic data redistribution Storage Nodes Data Center A
Storage Nodes Data Center B NoSQLDB Driver Application NoSQLDB
Driver Application
Slide 28
Oracle NoSQL DB Differentiation Commercial Grade Software and
Support General-purpose Reliable Based on proven Berkeley DB JE HA
Easy to install and configure Scalable throughput, bounded latency
Simple Programming and Operational Model Simple Major + Sub key and
Value data structure ACID transactions Configurable consistency
& durability Easy Management Web-based console, API accessible
Manages and Monitors: Topology; Load; Performance; Events; Alerts
Completes Oracle large scale data storage offerings
Slide 29
Big Data Application Software Organizing Data for Analysis
Slide 30
30 Copyright 2011, Oracle and/or its affiliates. All rights
reserved. Oracle Loader for Hadoop Features Load data into a
partitioned or non-partitioned table Single level, composite or
interval partitioned table Support for scalar datatypes of Oracle
Database Load into Oracle Database 11g Release 2 Runs as a Hadoop
job and supports standard options Pre-partitions and sorts data on
Hadoop Online and offline load modes
Slide 31
31 Copyright 2011, Oracle and/or its affiliates. All rights
reserved. Oracle Loader for Hadoop SHUFFLE /SORT MAP SHUFFLE /SORT
REDUCE SHUFFLE /SORT REDUCE INPUT 2 INPUT 1 MAP REDUCE MAP REDUCE
MAP REDUCE ORACLE LOADER FOR HADOOP
Slide 32
32 Copyright 2011, Oracle and/or its affiliates. All rights
reserved. Oracle Loader for Hadoop: Online Option SHUFFLE /SORT
REDUCE MAP REDUCE ORACLE LOADER FOR HADOOP Connect to the database
from reducer nodes, load into database partitions in parallel Read
target table metadata from the database Perform partitioning,
sorting, and data conversion
Slide 33
33 Copyright 2011, Oracle and/or its affiliates. All rights
reserved. Oracle Loader for Hadoop: Offline Option SHUFFLE /SORT
REDUCE MAP REDUCE ORACLE LOADER FOR HADOOP Read target table
metadata from the database Perform partitioning, sorting, and data
conversion Write from reducer nodes to Oracle Data Pump files
Import into the database in parallel using external table
mechanism
Slide 34
34 Copyright 2011, Oracle and/or its affiliates. All rights
reserved. Oracle Loader for Hadoop Advantages Offload database
server processing to Hadoop: Convert input data to final database
format Compute table partition for row Sort rows by primary key
within a table partition Generate binary datapump files Balance
partition groups across reducers
Slide 35
35 Copyright 2011, Oracle and/or its affiliates. All rights
reserved. Selection Output Option for Use Case Oracle Loader for
Hadoop Output Option Use Case Characteristics Online load with
JDBCThe simplest use case for non partitioned tables Online load
with Direct PathFast online load for partitioned tables Offline
load with datapump filesFastest load method for external tables On
Oracle Big Data Appliance Direct HDFS Leave data on HDFS Parallel
access from database Import into database when needed
Slide 36
36 Copyright 2011, Oracle and/or its affiliates. All rights
reserved. Automate Usage of Oracle Loader for Hadoop ODI has
knowledge modules to Generate data transformation code to run on
Hive/Hadoop Invoke Oracle Loader for Hadoop Use the drag-and-drop
interface in ODI to Include invocation of Oracle Loader for Hadoop
in any ODI packaged flow Oracle Data Integrator (ODI)
Slide 37
37 Copyright 2011, Oracle and/or its affiliates. All rights
reserved.
Slide 38
Big Data Analytics Real Time Analytics Platform
Slide 39
R Statistical Programming Language Open source language and
environment Used for statistical computing and graphics Strength in
easily producing publication-quality plots Highly extensible with
open source community R packages
Slide 40
Drive Value from Big Data Conclusions
Slide 41
Big Data Appliance Big Data for the Enterprise Optimized and
Complete Everything you need to store and integrate your lower
information density data Integrated with Oracle Exadata Analyze all
your data Easy to Deploy Risk Free, Quick Installation and Setup
Single Vendor Support Full Oracle support for the entire system and
software set
Slide 42
DECIDE Oracle Analytic Applications Oracle Integrated Solution
Stack for Big Data ACQUIRE Oracle NoSQL Database HDFS Enterprise
Applications ORGANIZE Hadoop (MapReduce) Oracle Loader for Hadoop
Oracle Data Integrator ANALYZE In-Database Analytics Data
Warehouse
Slide 43
Oracle: Big Data for the Enterprise The most comprehensive
solution Includes everything needed to acquire, organize and
analyze all your data Optimized for Extreme Analytics Deepest
analytics portfolio with access to all data Engineered to Work
Together Eliminate deployment risk and support risk Enterprise
Ready Deliver extreme performance and scalability