Upload
gauri
View
51
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Taming the Big Data Fire Hose. John Hugg Sr. Software Engineer, VoltDB. Big Data Defined. Velocity Moves at very high rates (think sensor-driven systems) Valuable in its temporal, high velocity state Volume Fast-moving data creates massive historical archives - PowerPoint PPT Presentation
Citation preview
the NewSQL database you’ll never outgrow
Taming the Big DataFire Hose
John HuggSr. Software Engineer, VoltDB
VoltDB 2
Big Data Defined
Velocity+ Moves at very high rates (think sensor-driven systems)+ Valuable in its temporal, high velocity state
Volume+ Fast-moving data creates massive historical archives+ Valuable for mining patterns, trends and relationships
Variety+ Structured (logs, business transactions)+ Semi-structured and unstructured
VoltDB 3
Lower-frequency operations
High-frequency operations
DataSource
Example Big Data Use Cases
Capital markets Write/index all trades, store tick data
Show consolidated risk across traders
Call initiation request Real-time authorization Fraud detection/analysis
Inbound HTTP requests
Visitor logging, analysis, alerting Traffic pattern analytics
Online gameRank scores:•Defined intervals•Player “bests”
Leaderboard lookups
Real-time ad trading systems
Match form factor, placement criteria, bid/ask
Report ad performance from exhaust stream
Mobile device location sensor
Location updates, QoS, transactions Analytics on transactions
VoltDB 4
Big Data and You
Incoming data streams are different than traditional business apps
+ You need to write data quickly and reliably, but …
It’s not just about high speed writes+ You need to validate in real-time+ You need to count and aggregate+ You need to analyze in real-time+ You need to scale on demand+ You may need to transact
Big Data and You
VoltDB 5
Big Data Management Infrastructure
Online gaming
Adserving
Sensordata
Internetcommerc
e
SaaS,Web 2.0
Mobileplatforms
Financialtrade
Structured data ACID guarantees Relational/SQL Real-time analytics
NewSQL
Unstructured data Eventual consistency Schemaless KV, document
NoSQL
Other OLAPdata stores
AnalyticDatastore
High Velocity High Volume
VoltDB 6
Big Data Management Infrastructure
Online gaming
Adserving
Sensordata
Internetcommerc
e
SaaS,Web 2.0
Mobileplatforms
Financialtrade
NewSQL
NoSQL
Other OLAPdata stores
AnalyticDatastore
High Velocity High Volume
High VelocityData Management
VoltDB 8
High Velocity DBMS Requirements
Ingest at very high speeds and rates Scale easily to meet growth and demand peaks Support integrated fault tolerance Support a wide range of real-time (or “near-time”)
analytics Integrate easily with high volume analytic datastores
VoltDB 9
High Speed Data Ingestion
Support millions of write operations per second at scale
Read and write latencies below 50 milliseconds Provide ACID-level consistency guarantees (maybe) Support one or more well-known application
interfaces+ SQL+ Key/Value+ Document
VoltDB 10
Scale to Meet Growth and Demand
Scale-out on commodity hardware Built-in database partitioning
+ Manual sharding and/or add-on solutions are brittle, require apps to do “heavy lifting”, and can be an operational nightmare
Database must automatically implement defined partitioning strategy
+ Application should “see” a single database instance
Database should encourage scalability best practices+ For example, replication of reference data minimizes need for
multi-partition operations
VoltDB 11
A Look Inside Partitioning
1 101 21 101 34 401 2
1 knife2 spoon3 fork
Partition 1
2 201 15 501 35 502 2
1 knife2 spoon3 fork
Partition 2
3 201 16 601 16 601 2
1 knife2 spoon3 fork
Partition 3
table orders : customer_id (partition key)(partitioned) order_id
product_id
table products : product_id (replicated) product_name
select count(*) from orders where customer_id = 5single-partition
select count(*) from orders where product_id = 3multi-partition
insert into orders (customer_id, order_id, product_id) values (3,303,2)single-partition
update products set product_name = ‘spork’ where product_id = 3multi-partition
VoltDB 12
Integrated Fault Tolerance
Database should transparently support built-in “Tandem-style” HA
+ Users should be able to easily increase/decrease fault tolerance levels
Database should be easily and quickly recoverable in the event of severe hardware failures
Database should be able to automatically detect and manage a variety of partition fault conditions
Downed nodes should be “rejoinable” without the need for service windows
VoltDB 13
Partition Detection & Recovery
Server A
Server B
Server C
Network fault protectionDetects partition event
Determines which side of fault to disable
Snapshots and disables orphaned node(s)
Server A
Server B
Server C
Live node rejoinAllows “downed” nodes to rejoin live cluster
Automatically re-synchs all node data
Coordinates transactions during re-synch
VoltDB 14
Real-time Analytics
Database should support a wide variety of high performance reads
+ High-frequency single-partition+ Lower-frequency multi-partition
Common analytic queries should be optimized in the database
+ Multi-partition aggregations, limits, etc.
Database should accommodate a flexible range of relational data operations
+ Particularly relevant to structured data
VoltDB 15
Integration with Analytic Datastores
Database should offer high performance, transactional export
Export should allow a wide variety of common data enrichment operations
+ Normalize and de-normalize+ De-duplicate+ Aggregate
Architecture should support loosely-coupled integrations
+ Impedance mismatches+ Durability
VoltDB 16
VoltDB Export Data Flow
Loosely-coupled, asynchronous Queue must be durable Bi-directional durability
High VelocityDatabase Cluster
VoltDB 17
Summary
Big Data infrastructures will usually require more than one engine
+ High velocity engine for “fast” data+ Analytic engine for “deep” data
Data characteristics will often determine which high velocity engine to use
+ NewSQL is often well-suited to structured data+ NoSQL is often a good fit for unstructured data
Choose solutions that suit your needs and are designed for interoperability