The Future of Big Data is Relational (or why you can't escape SQL)

The Future of Relational (or Why You Can't

Escape SQL)

[email protected]

Twitter: @tobrien

Thursday, February 28, 13

mailto:[email protected]


In this session...OuroborosCopernican RevolutionPtolemaic EntrenchmentJanusA two minute summary of the last 15 yearsGoogle MagicThe Future of SQL


Tim O’Brien I’m a developer who also writes

[email protected] Twitter: @tobrien






Revolution


Remember all that Big DataStuff?


Remember when we all thought it was time to give up schemas?

Man, wasn’t that a lot of work.


What if the relational database “catches up”?

What then?


How we market Big Data:

Big Data == Paradigm Shift

“singularity” > “disruptor”




“Big Data” is to “Traditional Databases” as...

Copernicus is to Ptolemy


Out with the “old”In with the “new”


Copernicus’model

1543 AD

Claudius Ptolemy~150 AD


Google’s BigTablePaper - 2006

Edgar F. Codd

“A Relational Model ofData for Large Shared

Data Banks”1970

Hadoop - 2007




Codd

Hadoop - 2007

+ =Text

Google F1, SpannerTranslattice, Impala,Drawn-to-Scale

NuoDB, Akiban, manymore NewSQL products




YouthLooking Forward

AgeLooking Backward


Whatever.

Haven’t you heard?

Databases don’t scale.

Let’s create a schema.

Ok?


And, both are right...


• \




Text

2000 In the beginning...

Proprietary app servers

Big Oracle database


2001

Text

More traffic?

Specialized application servers

Throw hardware at the database


2002-2005 More traffic?

Specialized application servers

Throw hardware at the database


2005 Event More Traffic?

Sharding.... ugh.

Everything else was scaling horizontal exceptthe database.

Tex


2006 - New Reality of Big Data


Hadoop - 2007

Q: What would Google do?A: Not use a RDBMs


2006

Big Data for a few

RDBMs for most

vs.


2007

Who needs Foreign Keys?Transac3ons? Just Simplify

•

Text

•The rise of Database “Luddites”


2007

Text

•The rise of Database “Luddites”

Rails hacked away @ database “orthodoxy”

Opened the door to alterna3ve approaches


•Although, Basecamp is s3ll a single RDBMS…


2007- present == Alternatives•Documents

–MongoDB – Started in 2007, OSS in 2009–CouchDB – Started in 2005

•Graphs–Neo4j

•Key-‐Value Stores–Cassandra–Riak–Tokyo Cabinet

•Memory–Memcached / Redis

•Tabular–HBase


2012 Q: What databasedo you use?

A: All of them

Oracle, Mongo, MySQL, Impala,Riak, some memcache, and some Hadoop thrown in for fun

Text



Big Data a Necessity at Largest Scale

Most development still RDBMS

“A certain kind of developer at a certain kind of company”


•There’s this company that sells adver3sing–~96% of revenue came from adver3sing in 2011–~75% of the US Search Advert Market in 2011–~44% shared of overall online ad market

•One of the most important applica3ons at Google ran on MySQL –AdWords missed the NoSQL revolu3on


Digging into the evolution of Storage at Google

•Google’s BigTable – 2006–Tabular–Sparse, distributed, mul3-‐dimensional sorted map



•Google’s BigTable – 2006

–“New users [] uncertain of how to best use the BigTable interface, par3cularly if they are accustomed to using rela3onal databases that support general-‐purpose transac3ons.”



•Google’s Megastore – 2010–Hierarchical “schemas”–Posi3oned as a NoSQL store–ACID within par33ons



•Google’s Megastore – 2010

–“Supports two-‐phase commit for atomic updates [] these transac3ons have much higher latency and increase the risk of conten3on, we generally discourage applica3ons from using the feature“


Digging into the evolution of Storage at Google•Google’s Spanner & F1 – 2012•Paper published in 2012–Hierarchical, Semi-‐rela3onal Schemas–ACID across con3nents possible -‐ 14ms transac3on overhead in a data-‐center with clock uncertainty of 1ms.–SQL

–Focus on Performance •Gated by Clock Uncertainty•Consensus: Paxos


What Differentiates Google Spanner?•Transac3ons are only possible because of Paxos

•Forget NTP, Google has “Reified Clock Uncertainty”•Epsilon, clock uncertainty, is the ga3ng factor for gaining consensus on transac3on 3mestampe.

•It’s all about Time•“as the underlying system enforces 3ghter bounds on clock uncertainty, the overhead of the stronger seman3cs decreases. As a community, we should no longer depend on loosely synchronized clocks and weak 3me APIs in designing distributed algorithms.


Let me reiterate Google has Mastered Time


What Differentiates Google Spanner?•Hierarchical, Schema3zed Tables

•Similar to Akiban’s approach.

•Leads to some interes3ng possibili3es.

•Nested Subqueries and Tree Results


What Differentiates Google Spanner?

To reiterate:

* hierarchical, schematized tables* distributed “compute fabric” for data* Google has mastered Time* Google built a warp reactor


As goes Google so does the world... Translattice Drawn-to-Scale Akiban Impala

Several NewSQL companies quickly jumped on this train:- NuoDB- VoltDB

Yes, we’ve had Hive for a while, but these new initiatives resemble a more robust effort.


Translattice Translattice identifies itself as a database that resembles F1

It is a hosted database service which provides distributed transactions.

Translattice uses Paxos

They’ve extended Postgresql and emphasize customer control over data. A distributed, cloud-based database


Akiban Akiban’s approach to storage almost *exactly* matches the strategy Google uses in

Spanner.

Akiban lacks the distributed transaction capability of Spanner and F1, but they are working on developing the capability.

Akiban has implemented a query parser, optimizer, and execution engine atop a hierarchical approach to storage.


Drawn-to-Scale

Reports: the most similar to F1 in the market. Fault-tolerant in distributed environments

Created a Query Parser + Optimizer + Execution Engine atop a distributed “compute fabric”

No Paxos or Transactions... yet. To be released, shortly. Stay tuned.

Drawn to Scale aims to be an “installable” database. Not going the hosted route.

Data stored in HDFS/HBase.


So there.Big Data is turning into a Big Relational Database


Technology

The Future of Big Data is Relational (or why you can't escape SQL)