21
© 2012 Microsoft SQL AND NOSQL ARE TWO SIDES OF THE SAME COIN Michael Rys, Microsoft Corp. @SQLServerMike Strata 2012 Conference, March 2012

SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

Embed Size (px)

DESCRIPTION

Presentation for http://strataconf.com/strata2012/public/schedule/detail/22693 Many of the new online and device-oriented application models require a high degree of operational and development agility such as unlimited elastic scale and flexible data models. The nascent NoSQL market is aiming to address these requirements but is extremely fragmented, with many competing vendors and technologies. Programming, deploying, and managing NoSQL solutions requires specialized and low-level knowledge that does not easily carry over from one vendor’s product to another. The SQL market on the other hand has a high level of maturity and at least conceptual standardization, but relational database systems were not originally designed for these requirements. However, in contrast to common belief, the question of big versus small data is orthogonal to the question of SQL versus NoSQL. While the NoSQL model naturally supports extreme sharding, the fact that it does not require strong typing and normalization makes it attractive for “small” data as well. On the other hand, it is possible to scale relational SQL databases. In this presentation, I will provide a short introduction to some architectural patterns that SQL-based solutions have been using to achieve scale and operational agility, contrast them with the NoSQL paradigms and show how SQL can be augmented with NoSQL paradigms at the platform level by using SQL Azure Federations as an example. I will also show how NoSQL offerings can benefit from the lessons learned with SQL. What this all means is that NoSQL, BigData and SQL are not in conflict, like good and evil. Instead they are sometimes overlapping, but often complementary solutions that benefit from common paradigms addressing different requirements and can and will coexist.

Citation preview

Page 1: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

© 2012 Microsoft

SQL AND NOSQL ARE TWO SIDES OF THE SAME COIN

Michael Rys, Microsoft Corp.@SQLServerMike

Strata 2012 Conference, March 2012

Page 2: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

AGENDA

• Scaling out your business is important!• NoSQL Paradigms and NoSQL Platforms• SQL learns from NoSQL (with a demo of SQL Azure Federations)

• NoSQL learns from SQL• Scalable Data Processing Platform of the Future

Page 3: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

OnlineBusinessApplicatio

n

Attract Individual Consumers:- Provide

interesting service

- Provide mobility

- Provide social

Monetize Individual:- Upsell service

- VIP- Speed- Extra

Capabilities

Monetize the Social:- Improve individual

experience- Re-sell Aggregate

Data (e.g., Advertisers)

THE WEB 2.0 BUSINESS ARCHITECTURE

Page 4: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

SOCIAL NETWORKING: THE BUSINESS PROBLEM• 100s of million of users

• 10s of million of users concurrently

• Terabytes to petabytes of data• Structured and unstructured

• Required (eventual) data consistency across users• E.g. show your updated state in your friends’ profile pages

Page 5: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

SOLUTION• Shard/Partition user data across

hundreds to thousands of SQL Databases• Propagate data changes from one DB to

other DBs using reliable, async Message Service• Managing routes from each DB to every other

DB would be too complex• Global Transactions would hinder scale and

availability

• Provide a caching layer for performance• And also used for

oClean-up state (e.g. on account close)

oDeploy business logic (stored procedures)

Page 6: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

EXAMPLE ARCHITECTURE

1-1000

3001-4000

2001-3000 1001-

2000

4001-5000

5001-6000

I change my status

userId=1024

Web TierData Tier

ServiceDispatch

er

AsyncMessage

AsyncMessage

TX2

My DBgets updatedTX

1AsyncMessage

TX3

TX4

TX5

Page 7: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

MANY LARGE SCALE CUSTOMERS USING SIMILAR PATTERNS• Patterns

• Sharding and reliable messaging• Sharding and fan/out query layer• Caching layer

• Customer Examples• Social Networking: Facebook, MySpace, etc• Online electronic stores (cannot give names )• Travel reservation systems (e.g. Choice International)• MSN Casual Gaming• etc.

Page 8: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

LESSONS LEARNED FROM THESE SCENARIOS• Require high availability• Be able to scale out:

• Functional and Data Partitioning Architecture• Provide scale-out processing:

o Function shippingo Fanout and Map/Reduce processing

• Be able to deal with failures:o Quorumo Retrieso Eventual Consistency (similar to Read-consistent Snapshot Isolation)

• Be able to quickly grow and change:• Elastic scale• Flexible, open schema• Multi-version schema support

Move better support for these patterns into the Data Platform!

Page 9: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

WHAT IS NOSQL ABOUT?• NoSQL = operational and developer agility at low CapEx and OpEx!

• Low Cost• Free Open Source Stores, Community Support• Scale CapEx cost below customer growth rate• Web friendly developer model and tool chain, ease of use

• Processing Paradigms• High Availability (scalable Replication, Fast Failover, DR/GeoDR, tunable latency)• Scale-out (Sharding, Map-Reduce, Elasticity)• Performance (tuned for specific workloads, Caching, co-located compute with partitioned state)• Tunable/Eventual Consistency

• Data Model Paradigms• Data first: Flexible Schema• Low-impedance mismatch between programming and data model:

o Key-Documents and Objects (BLOBS, JSON, XML, POJO)o Key-Wide Sparse Column Setso Graphs (e.g., RDF)

• Range from devices, over OLTP Web 2.0 applications to BigData Analytics

Page 10: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

DATA MODELSData Model Example Stores (apologies to the ones I did not

list)

Simple Key-Value Pairs Memcache, Redis, Dynamo, Voldermort, LevelDB, Azure Caching

Wide Sparse Column Sets HyperTable, Big Table, Cassandra, HBASE, Hyperbase, Amazon DynamoDB, Windows Azure Tables, SQL Server/Azure Sparse columns

BLOBs Amazon S3, Oracle Berkeley NoSQL, Windows Azure Blob Store, SQL Server RBS/FileTable

JSON Documents MongoDB, CouchBase, Riak, RavenDB

Graph Neo4J, GraphDB, HypergraphDB, Stig, Intellidimension

Objects and XML Documents

Versant, Oracle Berkeley NoSQL, MarkLogic, existDB, EMC HiveDB, SQL Server/Azure, Oracle, IBM DB2

Extended Relational Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres, SQL Server/Azure

Page 11: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

WHAT CAN SQL LEARN FROM NOSQL?

• Low CapEx, Low OpEx

• Built-in tunable High-Availability

• Data scale-out (Sharding)

• Processing scale-out (Map-Reduce, Fan-Out, tunable consistency)

• Flexible Data Models• JSON (& XML) support

• Sparse columns/Column sets

• Integrate with BigData Analytics (e.g., Hadoop)

Many Relational Database Systems are incorporating these learning!

Page 12: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

EXAMPLE: SQL AZURE FEDERATIONS

• Provides Data Partitioning/Sharding at the Data Platform• Enables applications to build elastic scale-out applications• Provides non-blocking SPLIT/DROP for shards (MERGE to

come later)• Auto-connect to right shard based on sharding keyvalue• Provides SPLIT resilient query mode

Page 13: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

SQL AZURE FEDERATION CONCEPTS

16

Federation “Orders_Fed”

ShardedApplication

Azure DB with Federation Root

Federation Directories, Federation Users, Federation Distributions, …

Federation- Represents the data being sharded

Federation Root- Database that logically houses federations,

contains federation meta data Federation Key

- Value that determines the routing of a piece of data (defines a Federation Distribution)

Atomic Unit- All rows with the same federation key value:

always together! Federation Member (aka Shard)

- A physical container for a set of federated tables for a specific key range and reference tables

Federated Table- Table that contains only atomic units for the

member’s key range Reference Table

- Non-sharded table

Member: PK [min, 100)

Member: PK [100, 488)

Member: PK [488, max)

(Federation Key: CustomerID)

AUPK=

5

AUPK=25

AUPK=35

AUPK=105

AUPK=235

AUPK=365

AUPK=555

AUPK=254

5

AUPK=356

5C

on

nectio

n

Gate

way

Page 14: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

17

DEMOMAP-REDUCE SCALE-OUT OVER SQL AZURE FEDERATIONS

• Sharded GamesInfo table using SQL Azure

Federations

• Use a C# library that does implement a

Map/Reduce processor on top SQL Azure

Federations

• Mapper and Reducer are specified using SQL

Page 15: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

WHAT CAN NOSQL LEARN FROM SQL?

• Flexible data is good, but:• Provide optional schema in data platform to help with constraints and optimizations

• Procedural Scale-Out processing is good, but:• Develop a declarative language suited for and across the data models (e.g., coSQL)

• Standardize suitable abstractions and languages

• Eventual Consistency is good, but:• Provide users the choice

• Simple Queries are good, but:• Provide me with secondary indexes

• it will be more efficient to join between two collections of JSON documents in the query engine than in the Application layer

Many NoSQL Database Systems are starting to incorporate these learnings!

Page 16: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

OnlineBusinessApplicatio

n

Attract Individual Consumers:- Provide

interesting service

- Provide mobility

- Provide social

Monetize Individual:- Upsell service

- VIP- Speed- Extra

Capabilities

Monetize the Social:- Improve individual

experience- Re-sell Aggregate

Data (e.g., Advertisers)

THE WEB 2.0 BUSINESS ARCHITECTURE

Page 17: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

Primary Shard

Readable

Replica

Readable

Replica

Primary Shard

Readable

Replica

Readable

Replica

Primary Shard

Readable

Replica

Readable

Replica

OLTP Workloads

Highly AvailableHigh ScaleHigh Flexibility

mostly touching 1 to low number of shards

Dynamic OLAP Workloads

3Vs (Volume, Velocity, Variety) Exploratory

Scale-out queries, often using eventual consistent scale-out frameworks like Hadoop

SCALE-OUT DATA PLATFORM ARCHITECTURE

Traditional OLAP Workloadsknown schemaData warehouse, “Star joins”

Copy

Query

SQL or NoSQL Store

Page 18: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

BIG DATA REQUIRES AN END-TO-END APPROACH

21

Page 19: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

22

CALL TO ACTION

• Familiarize yourself with the NoSQL genes in the Microsoft Online Platform• Free 3-Month Trial for Windows and SQL Azure: http://www.windowsazure.com

• Engage with us throughout Strata

• Download slides with additional information and related resources: http://www.slideshare.net/MichaelRys/presentations

Presentation Speaker Date and TimeDo We Have the Tools We Need to Navigate the New World of Data?

Dave Campbell 2/29 9:00am PST

Onsite Interview *Tim O’Reilly, Dave

Campbell2/29 10:15am PST

Unleash Insights on All Data With Microsoft Big Data

Alexander Stojanovic 2/29 11:30am PSTOffice Hours (Q&A session) Dave Campbell 2/29 1:30pm PSTHadoop + Javascript: What We Learned

Asad Khan 2/29 2:20pm PSTDemocratizing BI at Microsoft: 40,000 Users and Counting

Kirkland Barrett 3/1 10:40am PSTData Marketplaces For Your Extended Enterprise

Piyush Lumba 3/1 2:20pm PST

Page 20: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

23

APPENDIX

Page 21: SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

RELATED RESOURCES• Scale-Out with SQL Databases

• http://gigaom.com/cloud/facebook-shares-some-secrets-on-making-mysql-scale / • Windows Gaming Experience Case Study: http://

www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4000008310 • Scalable SQL: http://cacm.acm.org/magazines/2011/6/108663-scalable-sql• http://www.slideshare.net/MichaelRys/scaling-with-sql-server-and-sql-azure-federations

• NoSQL and the Windows Azure Platform• Whitepaper:

http://download.microsoft.com/download/9/E/9/9E9F240D-0EB6-472E-B4DE-6D9FCBB505DD/Windows%20Azure%20No%20SQL%20White%20Paper.pdf

• SQL Federation blog: http://blogs.msdn.com/b/cbiyikoglu/archive/2011/03/03/nosql-genes-in-sql-azure-federations.aspx

• Contact me• @SQLServerMike• http://sqlblog.com/blogs/michael_rys/default.aspx