Upload
oreillystrata
View
3.156
Download
1
Tags:
Embed Size (px)
Citation preview
The Future of Relational (or Why You Can't
Escape SQL)
Twitter: @tobrien
Thursday, February 28, 13
In this session...OuroborosCopernican RevolutionPtolemaic EntrenchmentJanusA two minute summary of the last 15 yearsGoogle MagicThe Future of SQL
Thursday, February 28, 13
Tim O’Brien I’m a developer who also writes
[email protected] Twitter: @tobrien
Thursday, February 28, 13
Thursday, February 28, 13
Thursday, February 28, 13
Revolution
Thursday, February 28, 13
Remember all that Big DataStuff?
Thursday, February 28, 13
Remember when we all thought it was time to give up schemas?
Man, wasn’t that a lot of work.
Thursday, February 28, 13
What if the relational database “catches up”?
What then?
Thursday, February 28, 13
How we market Big Data:
Big Data == Paradigm Shift
“singularity” > “disruptor”
Thursday, February 28, 13
Thursday, February 28, 13
Thursday, February 28, 13
“Big Data” is to “Traditional Databases” as...
Copernicus is to Ptolemy
Thursday, February 28, 13
Out with the “old”In with the “new”
Thursday, February 28, 13
Copernicus’model
1543 AD
Claudius Ptolemy~150 AD
Thursday, February 28, 13
Google’s BigTablePaper - 2006
Edgar F. Codd
“A Relational Model ofData for Large Shared
Data Banks”1970
Hadoop - 2007
Thursday, February 28, 13
Thursday, February 28, 13
Google’s BigTablePaper - 2006
Codd
Hadoop - 2007
+ =Text
Google F1, SpannerTranslattice, Impala,Drawn-to-Scale
NuoDB, Akiban, manymore NewSQL products
Thursday, February 28, 13
Thursday, February 28, 13
Thursday, February 28, 13
YouthLooking Forward
AgeLooking Backward
Thursday, February 28, 13
Whatever.
Haven’t you heard?
Databases don’t scale.
Let’s create a schema.
Ok?
Thursday, February 28, 13
And, both are right...
Thursday, February 28, 13
• \
Thursday, February 28, 13
Thursday, February 28, 13
Thursday, February 28, 13
Text
2000 In the beginning...
Proprietary app servers
Big Oracle database
Thursday, February 28, 13
2001
Text
More traffic?
Specialized application servers
Throw hardware at the database
Thursday, February 28, 13
2002-2005 More traffic?
Specialized application servers
Throw hardware at the database
Thursday, February 28, 13
2005 Event More Traffic?
Sharding.... ugh.
Everything else was scaling horizontal exceptthe database.
Tex
Thursday, February 28, 13
2006 - New Reality of Big Data
Google’s BigTablePaper - 2006
Hadoop - 2007
Q: What would Google do?A: Not use a RDBMs
Thursday, February 28, 13
2006
Big Data for a few
RDBMs for most
vs.
Thursday, February 28, 13
2007
Who needs Foreign Keys?Transac3ons? Just Simplify
•
Text
•The rise of Database “Luddites”
Thursday, February 28, 13
2007
Text
•The rise of Database “Luddites”
Rails hacked away @ database “orthodoxy”
Opened the door to alterna3ve approaches
Thursday, February 28, 13
•Although, Basecamp is s3ll a single RDBMS…
Thursday, February 28, 13
2007- present == Alternatives•Documents
–MongoDB – Started in 2007, OSS in 2009–CouchDB – Started in 2005
•Graphs–Neo4j
•Key-‐Value Stores–Cassandra–Riak–Tokyo Cabinet
•Memory–Memcached / Redis
•Tabular–HBase
Thursday, February 28, 13
2012 Q: What databasedo you use?
A: All of them
Oracle, Mongo, MySQL, Impala,Riak, some memcache, and some Hadoop thrown in for fun
Text
Thursday, February 28, 13
Thursday, February 28, 13
Big Data a Necessity at Largest Scale
Most development still RDBMS
“A certain kind of developer at a certain kind of company”
Thursday, February 28, 13
•There’s this company that sells adver3sing–~96% of revenue came from adver3sing in 2011–~75% of the US Search Advert Market in 2011–~44% shared of overall online ad market
•One of the most important applica3ons at Google ran on MySQL –AdWords missed the NoSQL revolu3on
Thursday, February 28, 13
Digging into the evolution of Storage at Google
•Google’s BigTable – 2006–Tabular–Sparse, distributed, mul3-‐dimensional sorted map
Thursday, February 28, 13
Digging into the evolution of Storage at Google
•Google’s BigTable – 2006
–“New users [] uncertain of how to best use the BigTable interface, par3cularly if they are accustomed to using rela3onal databases that support general-‐purpose transac3ons.”
Thursday, February 28, 13
Digging into the evolution of Storage at Google
•Google’s Megastore – 2010–Hierarchical “schemas”–Posi3oned as a NoSQL store–ACID within par33ons
Thursday, February 28, 13
Digging into the evolution of Storage at Google
•Google’s Megastore – 2010
–“Supports two-‐phase commit for atomic updates [] these transac3ons have much higher latency and increase the risk of conten3on, we generally discourage applica3ons from using the feature“
Thursday, February 28, 13
Digging into the evolution of Storage at Google•Google’s Spanner & F1 – 2012•Paper published in 2012–Hierarchical, Semi-‐rela3onal Schemas–ACID across con3nents possible -‐ 14ms transac3on overhead in a data-‐center with clock uncertainty of 1ms.–SQL
–Focus on Performance •Gated by Clock Uncertainty•Consensus: Paxos
Thursday, February 28, 13
What Differentiates Google Spanner?•Transac3ons are only possible because of Paxos
•Forget NTP, Google has “Reified Clock Uncertainty”•Epsilon, clock uncertainty, is the ga3ng factor for gaining consensus on transac3on 3mestampe.
•It’s all about Time•“as the underlying system enforces 3ghter bounds on clock uncertainty, the overhead of the stronger seman3cs decreases. As a community, we should no longer depend on loosely synchronized clocks and weak 3me APIs in designing distributed algorithms.
Thursday, February 28, 13
Let me reiterate Google has Mastered Time
Thursday, February 28, 13
What Differentiates Google Spanner?•Hierarchical, Schema3zed Tables
•Similar to Akiban’s approach.
•Leads to some interes3ng possibili3es.
•Nested Subqueries and Tree Results
Thursday, February 28, 13
What Differentiates Google Spanner?
To reiterate:
* hierarchical, schematized tables* distributed “compute fabric” for data* Google has mastered Time* Google built a warp reactor
Thursday, February 28, 13
As goes Google so does the world... Translattice Drawn-to-Scale Akiban Impala
Several NewSQL companies quickly jumped on this train:- NuoDB- VoltDB
Yes, we’ve had Hive for a while, but these new initiatives resemble a more robust effort.
Thursday, February 28, 13
Translattice Translattice identifies itself as a database that resembles F1
It is a hosted database service which provides distributed transactions.
Translattice uses Paxos
They’ve extended Postgresql and emphasize customer control over data. A distributed, cloud-based database
Thursday, February 28, 13
Akiban Akiban’s approach to storage almost *exactly* matches the strategy Google uses in
Spanner.
Akiban lacks the distributed transaction capability of Spanner and F1, but they are working on developing the capability.
Akiban has implemented a query parser, optimizer, and execution engine atop a hierarchical approach to storage.
Thursday, February 28, 13
Drawn-to-Scale
Reports: the most similar to F1 in the market. Fault-tolerant in distributed environments
Created a Query Parser + Optimizer + Execution Engine atop a distributed “compute fabric”
No Paxos or Transactions... yet. To be released, shortly. Stay tuned.
Drawn to Scale aims to be an “installable” database. Not going the hosted route.
Data stored in HDFS/HBase.
Thursday, February 28, 13
So there.Big Data is turning into a Big Relational Database
Thursday, February 28, 13