Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle NoSQL Database OverviewDavid SegleauDirector Product Management
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described
2 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Agenda
• NoSQL Overview
• Oracle NoSQL Database – Architecture
3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
– Technical Overview– Benchmark Results– Use Cases
NoSQL – A Brief History
• Early 2000s, Web 2.0 companies started looking for “RDBMS alternatives”
• 2003: memcached (cached k-v store to reduce load on RDBMS)
• 2004: Google published MapReduce distributed processing paper
• 2006: Google published BigTable distributed database paper
4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• 2007: Amazon published Dynamo paper
• 2008+: Several open source projects are launched to productize NoSQL solutions
• 2009+: Local meetings to discuss and share RDBMS alternatives
• 2010+: Enterprises start to investigate NoSQL solutions
RDBMS vs NoSQL
• RDBMS– High value, high density, complex
data– Complex data relationships– Schema-centric
• NoSQL architectures– Low value, low density, simple
data– Very simple relationships– Schema-free, unstructured or
5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
– Schema-centric– Designed to scale up & out– Lots of general purpose
features/functionality
�High overhead ($ per operation)
– Schema-free, unstructured or semi-structured data
– Distributed storage and processing– Stripped down, special purpose
data store
�Lower overhead ($ per operation)
What is NoSQL?
• Not-only-SQL (2009)
• Broad class of non-relational DBMS systems that typically– Provide horizontal/distributed scalability
– Avoid joins
– Have relaxed consistency guarantees
6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
– Don’t require a structured schema
– Are application/developer-centric
• No standards– Rapid evolving set of solutions (100+ on nosql-database.org)
– Highly variable feature set
– UnQL launched in July 2011, still a thought experiment
• Majority are open source
What problems does NoSQL try to address?
• Cost– TBs to PBs of low/unknown value, simple/unstructured data
– Lower $ per operation (hardware and RDBMS license fees)
• Scalability – Scale out, don’t scale up
• Flexible schema – Diverse, changing data sets
7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• Flexible schema – Diverse, changing data sets
• Performance– High rate of data capture
– High volume of simple queries
– Eliminate ORM overhead
• Availability– Low cost highly available, distributed data store
– Move CAP more towards AP rather than CA
Agenda
• NoSQL Overview
• Oracle NoSQL Database – Architecture
8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
– Technical Overview– Benchmark Results– Use Cases
The NoSQL Challenge
• New, rapidly emerging database technology
• Simple data storage, typically non-SQL or Not-only-SQL
• Distributed (Cloud) storage
• Large amounts of data (Terabyte – Petabyte range)
Where to Start
9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• Solution categories– Storage for “Web Service” applications � Our focus is here
– ETL Processing (MR & Hadoop) � … and we integrate here
• Common data models– Key-Value � Our focus is here
– Document, Columnar, Graph
Oracle NoSQL DatabaseTarget Use Cases
SIMPLE QUERIES
DYNAMIC SCHEMA
High-throughput event processing
Customer profile management
Click-through data processing
Sensor & statistics data capture
10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
HIGH VOLUME
DATA INTERACTION
Social networks
Personalization
Mobile application backend infrastructure
Authentication & Content management
Archiving
Customer-Driven Requirements
• Terabytes to petabytes of unstructured or semi-structured data
• No single point of failure
• Cost effective, distributed storage. Scalable on commodity hardware
• Fast, predictable response time to simple queries
11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• Fast, reliable transactions
• Simple administration, enterprise support
• Commercial-grade NoSQL solution– Real 24x7 support
– Real database expertise
– Large vendor & dedicated resources building & testing the software
Oracle NoSQL Database A Distributed, Scalable Key-Value Database
Simple Data Model
Small, distributed footprint
Highly scalable, available
Application
NoSQL Database Driver
Application
NoSQL Database Driver
12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Highly scalable, available
Transparent load balancing
Integrates with Oracle Stack
Storage NodesDatacenter B
Storage NodesDatacenter A
Architecture Summary
• Scalability– Dynamic data partitioning and distribution
– Optimized data access via intelligent driver
• High availability– One or more replicas
Scalable, Highly Available, Optimized
13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
– One or more replicas
– Resilient to partition master failures
– No single point of failure
– Disaster recovery through location of replicas
• Transparent load balancing– Reads from master or replicas
– Driver is network topology & latency aware
Simple Data Model
• Simple data model – key-value pair (major+minor-key paradigm)
• Simple operations – read/insert/update/delete, RMW support
• Scope of transaction – records within a major key, single API call
• Unordered scan of all data (non-transactional)
Key-value pairs
14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• Unordered scan of all data (non-transactional)
userid
addresssubscriptions
email idphone #expiration date
Major key:
Minor key:
Value:
Strings
Byte Array �
Simple Data Model
• ACID transactions by default
• Transaction Scope– Single API call
ACID Transactions
15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
– Single API call– All records must have the same major key– Support for multiple operations within a transaction
• Can be relaxed for increased performance on a per-operation basis
Simple Data Model
• Configurable Durability Policy
ACID Transactions – Configurability
16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• Configurable Consistency Policy
Scalability and Availability
• Replicated Application Servers
• Driver is linked into each Application
• Storage Nodes kept current via replication (Berkeley DB Java Edition HA)
17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• Storage Nodes across Data Centers
• Automatic SN failure handling– Graceful degradation – Automatic recovery
�No Single Point of Failure
High Availability
• Automatic log-based replication
• Storage Node Failure– Node failures automatically detected, system continues to function
– Rejoining nodes automatically synchronize with the master
– Isolated nodes can still service reads
18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
– Isolated nodes can still service reads
• Master Failover– Automatic election of new master, distributed 2-phase election algorithm (PAXOS)
– Master election based on highest LSN (log sequence number)
• Multi-node or Shard (replication group) failure– System continues to function using remaining replication groups
• System automatically maintains group membership and status
Transparent Load Balancing
Hash Major Key to determine Partition ID
Partition Map maps Partition ID to a shard
Operation + Key[M,m] + Value + Transaction PolicyNoSQL DB Driver
Application
• Operation result
Partition Map Changes
19 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
State Table maps a shard to Storage Node(s)
Load Balancer selects best eligible Storage Node
Contact Storage Node directly
• Partition Map Changes
• Storage Node stats
Simple Administration
• Web-based console and
CLI commands
• Manages and Monitors– Configuration changes
20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
– Configuration changes– Load: Number of operations,
data size– Performance: Latency, throughput. Min, max,
average, trailing, …– Events: Failover, recovery, load distribution– Alerts: Failure, poor performance, …
Oracle NoSQL Database Differentiation
Commercial Grade Software and Support Simple Data Model
Simple Administration
Scalability and Availability
Integrates seamlessly with Oracle Stack (ODI, CEP, OLH)
21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• General Purpose
• Reliable – Based on proven Berkeley DB JE HA
• Easy to Install & Configure
• Simple Major + Minor key and Value data structure
• ACID transactions
• Configurable consistency and durability
• Web-based Console and CLI commands
• Manages and Monitors: • Topology• Load• Performance• Events• Alerts
• Intelligent Oracle NoSQL DB Driver
• Evenly distributes Data• Sends operation to fastest node• Bounded network hops for all operations
• Automatic replication and failover
•1.6 billion records
•94K insert/sec
•25K read/update/sec
Benchmarking
22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
•25K read/update/sec
•Low latency
•Linear scalability
Oracle NoSQL Database Use Cases
• provides PaaS for deploying applications over the cloud.– Oracle NoSQL Database exposed as a service through their cloud
infrastructure.
Success Stories
23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
infrastructure.
• , Oracle Platinum Partner, built an online gaming application for their customer using Oracle NoSQL Database.
Oracle Confidential
Oracle NoSQL DB Use Cases
• Problem: Manage e-mail accounts for 10s of millions of customers and hundreds of Terabytes of data.
• Requirements: – Fast, Scalable, flexible data management solution– Highly Available, Easy to manage & monitor
Cloud e-mailing Service
24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
– Highly Available, Easy to manage & monitor
• Solution: NoSQL DB
Oracle Confidential
Oracle NoSQL DB Use Cases
• Problem: Cloud-based infrastructure requires support services like Authentication, Authorization, Event Tracking
• Requirements– Real time performance and high throughput– Simple data structures
Cloud Architecture Services
25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
– Simple data structures
• Solution: NoSQL DB
Oracle Confidential
Oracle NoSQL DB Use Cases
• Problem: Need to preserve OCEP event history. Aggregated customer experience data can be used to identify trends, offer promotions, provide better insight and customer service.
• Requirements: – Rich, flexible customer profile
Customer data aggregation, trend analysis
26 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
– Rich, flexible customer profile – Aggregate and store discrete OCEP event data over time
• Solution: OCEP + NoSQL DB
Oracle Confidential
Oracle NoSQL Database
�Easy to use, easy to manage
�Scalable, Available, Predictable Latency
27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
�A NoSQL Database from a vendor you trust
Oracle NoSQL DB Resources
• Support via OTN forums and Oracle Support process
• OTN Forum: – Forum Home » Big Data » NoSQL Database– forums.oracle.com/forums/forum.jspa?forumID=1388
Support
28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
– forums.oracle.com/forums/forum.jspa?forumID=1388
• Oracle.com:– www.oracle.com/us/products/database/nosql/overview/index.html
• OTN (including documentation and download): – www.oracle.com/technetwork/products/nosqldb/overview/index.html
Oracle NoSQL DB Resources
• On OTN and in download– docs.oracle.com/cd/NOSQL/html/index.html
• Getting Started Guides
• Programmatic API
Documentation
29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• Programmatic API
• Installation & Release Notes
• FAQ
Oracle Big Data DB Resources
• Big Data on O.com:http://www.oracle.com/us/technologies/big-data/index.html
• Big Data on OTN: http://www.oracle.com/technetwork/topics/bigdata/learnmore/index.html
External
30 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
tml– Start here: “Big Data Essentials” webinar series
Q&A
Questions
31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Q&A
32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
APPENDIX
What is Big Data?
GEODATA
BLOG
33 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
VOLUME VELOCITY VARIETY VALUE
SMARTMETER
Why is Big Data important?
US HEALTH CARE US RETAIL MANUFACTURING GLOBAL PERSONA L LOCATION DATA
EUROPE PUBLIC SECTOR ADMIN
$300 B 60+% –50% $100 B €250 B
Increase industry value per year by
Increase net margin by
Decrease dev., assembly costs by
Increase service provider revenue by
Increase industry value per year by
34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
“In a big data world, a competitor that fails to su fficiently develop its capabilities will be left behind.”
Source: * McKinsey Global Institute: Big Data – The next frontier for innovation, competition and productivity (May 2011)
MakeBetter Decisions
Big Data Lifecycle
DECIDE ACQUIRE
35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
DecisionsUsingBig Data
ANALYZE ORGANIZE
Big DataAppliance Exadata Exalytics
Oracle Big Data Software Platform
Analytic ApplicationsAlerts,
Dashboards, MD-
Hadoop
Open Source R Oracle Big Data Connectors
Da
tab
ase
An
aly
tics
Data
Oracle Advanced Analytics
36 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
ACQUIRE ORGANIZE DECIDEANALYZE
InfiniBandInfiniBand
Dashboards, MD-
Analysis, Reports,
Query
Web Services
BI Abstraction Applications
Oracle NoSQLDatabase
Oracle DataIntegrator
In-D
ata
ba
se A
na
lyti
cs
DataWarehouse
OracleDatabase
Oracle Engineered Systems for Big Data
Exadata ExalyticsBig Data
Appliance
37 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
ACQUIRE ORGANIZE DECIDEANALYZE
Big Data Use Cases
Today’s Challenge New Data What’s Possible
HealthcareExpensive office visits
Remote patient monitoringPreventive care, reduced
hospitalization
ManufacturingIn-person support
Product sensors Automated diagnosis, support
38 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Location-Based ServicesBased on home zip code
Real time location dataGeo-advertising, traffic, local
search
UtilitiesComplex Distribution Grid
Detailed consumptionstatistics
Increased availability, reduced cost, tiered metering plans
RetailOne size fits all marketing
Social mediaSentiment analysis
segmentation
Big Data Characteristics
Batch-Oriented Real-Time
Process data to use Deliver a service
39 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Bulk storage Fast access to specific record
Write once, read all Read, write, delete update
Big Data Storage Choices
Hadoop Distributed File
System (HDFS)Oracle NoSQL Database
File System Database
Parallel scanning Indexed storage
40 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Parallel scanning Indexed storage
No inherent structure Simple data structure
High volume writes High volume random reads and writes
Batch Oriented Real-Time
Early Adopter Dilemma
41 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• Time to Build?
• Expertise?
• Cost and Difficulty Maintaining?
• Product Support?
•18 Sun X4270 M2 Servers per BDA
– 864 GB memory
– 216 cores
– 648 TB storage
•40 Gb/s InfiniBand Fabric
Oracle Big Data Appliance Hardware
42 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
•40 Gb/s InfiniBand Fabric
– Inter-rack Connectivity
– Inter-node Connectivity
•10 Gb/s Ethernet Connectivity
– Data center connectivity
Full Rack Configuration Only
Oracle NoSQL DB Licensing
• Two versions– Oracle NoSQL Database Community Edition. Open Source. AGPL
license. – Oracle NoSQL Database Enterprise Edition. Closed Source. Standard
Oracle License.
Community VS Enterprise Edition
43 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle License.
• Community Edition has all of the basic functionality and APIs. Gets you started. Competes with other OS NoSQL solutions.
• Enterprise Edition for large, production, multi-data center, Oracle integration centric customers and/or non-GPL compliant customers.
Benchmarking Configurations
• YCSB-based benchmark (Yahoo Cloud Services Benchmark)– Key ~13 bytes, Data ~= 1.1K
• Configurations of 3 (1x3) – 192 (64 x 3) storage nodes– Replication factor of 3 (master + 2 replicas)
44 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
– 100m to 2.1b records, 100m-400m records per storage node– Intel Systems: 2.93ghz Intel Westmere (wds024c) model x5670, dual
socket with 6 cores/socket, 24GB of memory, single 300GB local disk and RedHat 2.6.18-164.11.1.el5.crt1
– Cisco Systems: UCS C200 M2 & UCS C210 M2 systems (Intel 5600s), dual socket with 6 cores/socket, 18GB of memory, 4,8 or 16 disks for total of 8-16TB.
Benchmarking Configurations
• Btree fits in memory � one I/O per record read
• Writes are buffered + log structured storage system �fast write throughput
• GC and File System tuning to optimize throughput
Systems configured to minimize I/O overhead
45 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• GC and File System tuning to optimize throughput
Oracle NoSQL DB API
[...] indicates optional args
put(Key, Value, [Durability, timeout])
putIfAbsent(K, V, [Durability, timeout])
get(Key, [Consistency, timeout])
CRUD Operations
46 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
get(Key, [Consistency, timeout])
putIfPresent(K, V, [Durability, timeout])
putIfVersion(K, V, Version, [Durability, timeout])
delete(Key, [Durability, timeout])
deleteIfVersion(Key, Version, [Durability, timeout])
Oracle NoSQL DB API
iterator(Direction, int batchSize, [Key parentKey, KeyRange subRange, Depth, [Consistency, timeout]])
→ Iterator<KeyValueVersion>
keysIterator(Direction, int batchSize,[Key parentKey,
Iteration Operations
47 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
keysIterator(Direction, int batchSize,[Key parentKey, KeyRange subRange, Depth, [Consistency, timeout]])
→ Iterator<Key>
Oracle NoSQL DB API
Execute (List<Operation>, [Durability, timeout])
→ List<OperationResult>
multiGet(K, KeyRange, Depth, [Consistency, timeout])
→ SortedMap<K, V>
Sub-key “Multi” Operations
48 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
multiGetKeys(K, KeyRange, [Consistency, timeout])
→ SortedSet<K>
multiDelete (K, KeyRange, Depth, [Durability, timeout])
→ int
Oracle NoSQL DB API
• KVInputFormat class - Hadoop InputFormat class for reading data from Oracle NoSQL DB
• Static Methods:– setKVHelperHosts (String [] kvHelperHosts)
– setKVStoreName (String kvStoreName)
Hadoop Integration
49 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
– setKVStoreName (String kvStoreName)
– setParentKey (Key parentKey)
– setBatchsize (int batchSize)
– setConsistency (Consistency consistency)
– setDepth (Depth depth)
– setDirection(Direction direction)
– setSubRange(KeyRange subRange)
50 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.