"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB

Navigating the Database Universe

Dr. Michael Stonebraker and Scott Jarr

About Our Presenters

Mike Stonebraker

Co-founder & CTO, VoltDB

A pioneer of database research and technology for more than a quarter of a century, and the main architect of the Ingres relational DBMS and the object-relational DBMS PostgreSQL

Scott Jarr

Co-founder & Chief Strategy Officer, VoltDB

More than 20 years of experience building, launching and growing technology companies from inception to market leadership in the search, mobile, security, storage and virtualization markets

• The (proper) design of DBMSs– Presented by Dr. Michael Stonebraker

• The database universe

• Where the future value comes from

Agenda

• “Big Data” is a rare, transformative market

• Velocity is becoming the cornerstone

• Specialized databases (working together) are the answer

• Products must provide tangible customer value... Fast

We Believe…

THE (PROPER) DESIGNOF THE DBMS

Dr. Michael Stonebraker

Lessons from 40 Years of Database Design

1. Get the user interaction right

– Bet on a small number of easy-to-understand constructs

– Plus standards

2. Get the implementation right

– Bet on a small number of easy-to-understand constructs

3. One size does not fit all

– At least not if you want fast, big or complex

Those who don’t learn from history are destined to repeat it.

“”-Winston Churchill

#1: Get the User Interaction Right

Winner: RDBMS• Simple data model

(tables)• Simple access

language (SQL)• ACID (transactions)• Standards (SQL)

Loser: CODASYL• Complicated data model

(records; participate in “sets”; set has one owner and, perhaps, many members, etc.)

• Messy access language (sea of “cursors”; some -- but not all -- move on every command, navigation programming)

Loser: OODBs• Complex data model

(hierarchical records, pointers, sets, arrays, etc.)

• Complex access language (navigation, through this sea)

• No standards

Historical Lesson: RDBMS vs. CODASYL vs. OODB

Interaction Take Away − Simple is Good

• ACID was easy for people to understand

• SQL provided a standard, high-level language and made people productive (transportable skills)

#2: Get the Implementation Right

• Leverage a few simple ideas: Early relational implementations– System R storage system dropped links– Views (protection, schema modification, performance)– Cost-based optimizer

• Leverage a few simple ideas: Postgres– User-defined data types and functions (adopted by most everybody)– Rules/triggers– No-overwrite storage

• Leverage a few simple ideas: Vertica– Store data by column– Compressed up the ging gong– Parallel load without compromising ACID

Histo

rical Win

ners

#3: One Size Does NOT Fit All

• OSFA is an old technology with

hundreds of bags hanging off it

• It breaks 100% of the time when under

load

• Load = size or speed or complexity

• Load is increasing at a startling rate

• Purpose-built will exceed by 10x to 100x

• History has not been completely written

yet…but let’s look at VoltDB as an

example

…specialized systems can each be a factor of 50 faster than the single ‘one size fits all’ system…A factor of 50 is nothing to sneeze at.

“

”-My Top 10 Assertions About Data Warehouses, 2010

Example: VoltDB

• Get the interface right– SQL– ACID

• Implementation: Leverage a few simple ideas– Main memory– Stored procedures– Deterministic scheduling

• Specialization– OLTP focus allowed for above implementation choices

Proving the Theory

• Challenge: OLTP performance

– TPC-C CPU cycles

– On the Shore DBMS prototype

– Elephants should be similar

Recovery 24%Latching 24%

Buffer Pool 24%Locking 24%

Useful Work4%

Implementation Construct #1: Main Memory

• Main memory format for data

– Disk format gets you buffer pool overhead

• What happens if data doesn’t fit?

– Return to disk-buffer pool architecture (slow)

– Anti-caching

• Main memory format for data

• When memory fills up, then bundle together elderly tuples and write them out

• Run a transaction in “sleuth mode”; find the required records and move to main memory (and pin)

• Run Xact normally

Implementation Construct #2: Stored Procedures

• Round trip to the DBMS is expensive

– Do it once per transaction

– Not once per command

– Or even once per cursor move

• Ad-hoc queries supported

– Turn them into dynamic stored procedures

Implementation Construct #3: Deterministic and Non-deterministic Scheduling

• Non-deterministic (can’t tell order until commit time)

– MVCC

– Dynamic locking

• Deterministic

– Time stamp order

Result of Design Principles: VoltDB Example

• Good interface decisions – made developers more productive

– SQL & ACID

• Leveraging a few simple implementation ideas – made VoltDB wicked fast

– Main memory

– Stored procedures

– Deterministic scheduling

Proving the Theory

• Answer: OLTP performance

– 3 million transactions per second

– 7x Cassandra

– 15 million SQL statements per second

– 100,000+ transactions per commodity server

…we are heading toward a world with at least 5 (and probably more) specialized engines and the death of the ‘one size fits all’ legacy systems.

“

”-The End of an Architectural Era (It’s Time for a Complete

Rewrite), 2007

THE DATABASE UNIVERSE

Scott Jarr

Technology Meets the Market

Believe

– “Big Data” is a rare, transformative market

– Velocity is becoming the cornerstone

– Specialized databases (working together) are the answer

– Products must provide tangible customer value… Fast

Observations

– Noisy, crowded and new – kinda like Christmas shopping at the mall

– Everyone wants to understand where the pieces fit

– Analysts build maps on technology NOT use cases

What we need is…

Data Value Chain

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade• Serve ad• Enrich stream• Examine packet• Approve trans.

• Calculate risk• Leaderboard• Aggregate• Count

• Retrieve click stream

• Show orders

• Backtest algo• BI• Daily reports

• Algo discovery• Log analysis• Fraud pattern match

Age of Data

Data Value Chain


Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade• Serve ad• Enrich stream• Examine packet• Approve trans.

• Calculate risk• Leaderboard• Aggregate• Count

• Retrieve click stream

• Show orders

• Backtest algo• BI• Daily reports

• Algo discovery• Log analysis• Fraud pattern match

Value of Individual Data Item

Data V

alue

AggregateData Value

Age of Data

Traditional RDBMSSimple SlowSmall

FastComplexLarge

Ap

pli

cati

on

Co

mp

lexi

ty

Value of Individual Data Item Aggregate Data Value

Data V

alue

The Database Universe


Transactional Analytic

Traditional RDBMSSimple SlowSmall

FastComplexLarge

Ap

pli

cati

on

Co

mp

lexi

ty

Value of Individual Data Item Aggregate Data Value

Data V

alue

NewSQLData

Warehouse

Hadoop, etc.NoSQL

Velocity

The Database Universe


Transactional Analytic

Closed-loop Big Data

Interactive & Real-time Analytics

Historical Reports & Analytics

Exploratory Analytics

loginssensors impressionsorders

authorizations clickstrades

Closed-loop Big Data

• Make the most informed decision every time there is an interaction

• Real-time decisions are informed by operational analytics and past knowledge

Knowledge

Interactive & Real-time Analytics

Historical Reports & Analytics

Exploratory Analytics

loginssensors impressionsorders

authorizations clickstrades

The Velocity Use Case

What’s it look like?

– High throughput, relentless data feeds

– Fast decisions on high-value data

– Real-time, operational analytics present immediate visibility

What’s the big deal?

– Batch converts to real time = efficiency

– Decisions made at time of event = better decisions

– Ability to micro segment/target/personalize/etc. = conversion, satisfaction, more data is

coming at you, use it to improve your business

QUESTIONS AND ANSWERS

Next Up

THANK YOU

www.voltdb.com