65
M D The CIO's Guide to NoSQL Dan McCreary July 2011 Version 5

The CIOs Guide to NoSQL

Embed Size (px)

Citation preview

Page 1: The CIOs Guide to NoSQL

M

D

The CIO's Guide to NoSQL

Dan McCrearyJuly 2011Version 5

Page 2: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

2

Agenda

• Historical Context• The Business Case for NoSQL• Terminology• How NoSQL is Different• Key NoSQL Products• Call to Action: The NoSQL Pilot Project• The Future of NoSQL

Page 3: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

3

Background for Dan McCreary

• Bell Labs• NeXT Computer (Steve Jobs)• Owner of Custom Object-

Oriented Software Consultancy• Federal data integration

(National Information Exchange Model)

• Native XML/XQuery – 2006• Advocate of NoSQL/XRX

systems

Page 4: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

4

NoSQL Training Areas

The CIO'sGuide toNoSQL

Project Manager'sGuide to NoSQL

Transitioningto NoSQL

ArchitecturalTradeoff Modeling

XQueryMapReduce HadoopFunctionalProgramming

Managers

Architects/Project

Managers

Developer

Track CourseYou Are

Here

Page 5: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

5

Sample of NoSQL Jargon

Document orientation

Schema free

MapReduce

Horizontal scaling

Sharding and auto-sharding

Brewer's CAP Theorem

Consistency

Reliability

Partition tolerance

Single-point-of-failure

Object-Relational mapping

Key-value stores

Column stores

Document-stores

Memcached

Indexing

B-Tree

Configurable durability

Documents for archives

Functional programming

Document Transformation

Document Indexing and Search

Alternate Query Languages

Aggregates

OLAP

XQuery

MDX

RDF

SPARQL

Architecture Tradeoff Modeling

ATAM

Note that within the context of NoSQL many of these terms have different meanings!

Page 6: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

6

Selecting a Database…

"Selecting the right data storage solution is no longer a trivial task."

Does it look like

document?

Use MicrosoftOffice

Use theRDBMS

Start

Stop

No

Yes

Page 7: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

7

Pressures on SQL Only Systems

SQLOLAP/BI/DataWarehouse

Social Networks

Scalability

AgileSchema

FreeDocument-D

ata

Large Data Sets Reliability

Linked Data

Page 8: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

8

Simplicity is a Virtue

• Many systems derive their strength by dramatically limiting the features in their system

• Simplicity allows database designers to focus on the primary business driver

• Examples:– Touch screen interfaces– Key/Value data stores

Page 9: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

9

Historical Context

Mainframe Era

• 1 CPU• COBOL and FORTRAN• Punchcards and flat files• $10,000 per CPU hour

Commodity Processors

• 10,000 CPUs• Functional programming• MapReduce "farms"• Pennies per CPU hour

Page 10: The CIOs Guide to NoSQL

M

D

Two Approaches to Computation

10Copyright 2010 Dan McCreary & Associates

Alonzo ChurchJohn Von Neumann

Manage state with a program counter. Make computations act like math functions.

Which is simpler? Which is cheaper? Which will scale to 10,000 CPUs?

1930s and 40s

Page 11: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

11

Standard vs. MapReduce Prices

http://aws.amazon.com/elasticmapreduce/#pricing

John's Way Alonzo's Way

Page 12: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

12

MapReduce CPUs Cost Less!

0

5

10

15

20

25

30

35

40Cost Per CPU Hour (Cents)

http://aws.amazon.com/elasticmapreduce/#pricing

Cuts cost from 32 to 6 cents per CPU hour!Perhaps Alanzo was right!

82% CostReduction!

Why? (hint: how "shareable" is this process)

Page 13: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

13

Perspectives

Native XML

OLAPMDX

ObjectStores

GraphStores

NoSQL for Web 2.0

and BigData

Perspective depends on your context

Page 14: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

14

Architectural Tradeoffs

"I want a fast car with good mileage."

"I want a scaleable database with low cost that runs well on the 1,000 CPUs in our data center."

Page 15: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

15

Recent History

• The term NoSQL became re-popularized around 2009

• Used for conferences of advocates of non-relational databases

• Became a contagious idea "meme"• First of many "NoSQL meetups" in San

Francisco organized by Jon Oskarsson• Conversion from "No SQL" to "Not Only

SQL" in recent year

Page 16: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

16

NoSQL on Google Trends

Page 17: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

17

NoSQL and Web 2.0 Startups

• Many web 2.0 startups did not use Oracle or MySQL

• They built their own data stores influenced by Amazon’s Dynamo and Google’s BigTable in order to store and process huge amounts of data

• In the social community or cloud computing applications, most of these data stores became OpenSource software

Page 18: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

18

Google MapReduce

• 2004 paper that had huge impact of functional programming in the entire community

• Copied by many organizations, including Yahoo

Page 19: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

19

Google Bigtable Paper

• 2006 paper that gave focus to scaleable databases

• designed to reliably scale to petabytes of

data and thousands of machines

Page 20: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

20

Amazon's Dynamo Paper

• Werner Vogels• CTO - Amazon.com• October 2, 2007• Used to power Amazon's

S3 service• One of the most

influential papers in the NoSQL movement

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall and Werner Vogels, “Dynamo: Amazon's Highly Available Key-Value Store”, in the Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.

Page 21: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

21

NoSQL "Meetups"

“NoSQLers came to share how they had overthrown the tyranny of slow, expensive relational databases in favor of more efficient and cheaper ways of managing data.”

Computerworld magazine, July 1st, 2009

Page 22: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

22

Key Motivators

• Licensing RDBMS on multiple CPUs• The Thee "V"s

– Velocity – lots of data arriving fast– Volume – web-scale BigData– Variability – many exceptions

• Desire to escape rigid schema design• Avoidance of complex Object-Relational

Mapping (the "Vietnam" of computer science)

Page 23: The CIOs Guide to NoSQL

M

DCopyright 2008 Dan McCreary & Associates

23

Many Processes Today Are Driven By…

The constraints of yesterday…

Challenge:Ask ourselves the question…Do our current method of solving problems with tabular data…Reflect the storage of the 1950s…Or our actual business requirements?What structures best solve the actual business problem?

Page 24: The CIOs Guide to NoSQL

M

DCopyright 2008 Dan McCreary & Associates

24

No-Shredding!

• Relational databases take a single hierarchical document and shred it into many pieces so it will fit in tabular structures

• Document stores prevent this shredding

MyData

Page 25: The CIOs Guide to NoSQL

M

DCopyright 2008 Dan McCreary & Associates

25

Is Shredding Really Necessary?

• Every time you take hierarchical data and put it into a traditional database you have to put repeating groups in separate tables and use SQL “joins” to reassemble the data

Page 26: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

26

Object Relational Mapping

• T1 – HTML into Objects• T2 –Objects into SQL Tables• T3 – Tables into Objects• T4 – Objects into HTML

T1

T3

T2

T4

Object MiddleTier

RelationalDatabaseWeb Browser

Page 27: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

27

"The Vietnam of Applications"

• Object-relational mapping has become one of the most complex components of building applications today

• A "Quagmire" where many projects get lost• Many "heroic efforts" have been made to

solve the problem:– Hibernate– Ruby on Rails

• But sometimes the way to avoid complexity is to keep your architecture very simple

Page 28: The CIOs Guide to NoSQL

M

D

Document Stores Need No Translation

• Documents in the database• Documents in the application• No object middle tier• No "shredding"• No reassembly• Simple!

28Copyright 2010 Dan McCreary & Associates

Application Layer Database

Document Document

Page 29: The CIOs Guide to NoSQL

M

D

Zero Translation (XML)

• XML lives in the web browser (XForms)• REST interfaces• XML in the database (Native XML, XQuery)• XRX Web Application Architecture• No translation!

29Copyright 2010 Dan McCreary & Associates

Web Browser XML database

XForms REST-Interfaces

Page 30: The CIOs Guide to NoSQL

M

D

"Schema Free"

• Systems that automatically determine how to index data as the data is loaded into the database

• No a priori knowledge of data structure• No need for up-front logical data modeling

– …but some modeling is still critical

• Adding new data elements or changing data elements is not disruptive

• Searching millions of records still has sub-second response time

30Copyright 2010 Dan McCreary & Associates

Page 31: The CIOs Guide to NoSQL

M

D

Monoculture and Mono-architecture

31

Copyright 2010 Dan McCreary & Associates

Image Source: Wikipedia

Page 32: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

32

Eric Evans

“The whole point of seeking alternatives [to RDBMS systems] is that you need to solve a problem that relational databases are a bad fit for.”

Eric EvansRackspace

Page 33: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

33

Evolution of Ideas in OpenSource

• How quickly can new ideas be recombined into new database products?• OpenSource software has proved to be the most efficient way to quickly

recombine new ideas into new products

Product A

Product B

Product B

OpenSource

Proprietary SoftwareNew Database Ideas

Schema-free

MapReduceAuto-sharding

New Products

Cloud Computing

Page 34: The CIOs Guide to NoSQL

M

D 34Copyright 2010 Dan McCreary & Associates

Storage Architectural Patterns

Tables Trees

TriplesStars

Page 35: The CIOs Guide to NoSQL

M

D

Finding the Right Match

35

Copyright 2010 Dan McCreary & Associates

Schema-Free

Mature Query Language

Standards Compliant

Use CMU's Architectural Tradeoff and Modeling (ATAM) Process

Page 36: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

36

Brewer's CAP Theorem

Consistency

Availability Partition Tolerance

You can not have all three so pick two!

Page 37: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

37

Avoidance of Unneeded Complexity

• Relational databases provide a variety of features to ALWAYS support strict data consistency

• Rich feature set and the ACID properties implemented by RDBMSs might be more than necessary for particular applications and use cases

Page 38: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

38

High Throughput

• Some NoSQL databases provide a significantly higher data throughput than traditional RDBMS

• Hypertable which pursues Google’s Bigtable approach allows the local search engine Zvent to store one billion data cells per day

• Google is able to process 20 petabytes a day stored in BigTable via it’s MapReduce approach

Page 39: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

39

Complexity and Cost of Settingup Database Clusters

NoSQL databases are designedin a way that “PC clusters can be easily and cheaply expanded without the complexity and cost of ’sharding,’ which involves cutting up databases into multiple tables to run on large clusters or grids”.

Nati Shalom, CTO and founder of GigaSpaces

Page 40: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

40

Compromising Reliability for Better Performance

• Shalom argues that there are “different scenarios where applications would be willing to compromise reliability for better performance.”

• Performance over reliability• Example: HTTP session data example

– “needs to be shared between various web servers but since the data is transient in nature (it goes away when the user logs off) there is no need to store it in persistent storage.”

Page 41: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

41

"Once Size Fits…"

"One Size Does Not Fit All"James Hamilton Nov. 3rd, 2009

http://perspectives.mvdirona.com/CommentView,guid,afe46691-a293-4f9a-8900-5688a597726a.aspx

Page 42: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

42

Different Thinking

Sequential Processing

• The output of any step can be used in the next step

• State must be carefully managed

Parallel Processing

• Each loop of XQuery FLOWR statements are independent thread (no side-effects)

Page 43: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

43

Cloud Computing

• High scalability– Especially in the horizontal direction (multi

CPUs)

• Low administration overhead– Simple web page administration

Page 44: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

44

Databases work well in the cloud

• Data warehousing specific databases for batch data processing and map/reduce operations

• Simple, scalable and fast key/value-stores• Databases containing a richer feature set

than key/value-stores fitting the gap with traditional

• RDBMS while offering good performance and scalability properties (such as document databases).

Page 45: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

45

Auto-Sharding

• When one database gets almost full it tells a "coordinator" system and the data automatically gets migrated to other systems

90% full

45% full

45% full

Before

After

Page 46: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

46

Scale Up vs. Scale Out

Scale Up• Make a single CPU as fast as

possible• Increase clock speed• Add RAM• Make disk I/O go faster

Scale Out• Make Many CPUs work

together• Learn how to divide your

problems into independent threads

Page 47: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

47

Functional Programming

• What does it mean to your IT staff?• What experience do they have in

functional programming?• Can they "unlearn" the habits of the

procedural world?

Page 48: The CIOs Guide to NoSQL

M

D

The NO-SQL Universe

48

Copyright 2010 Dan McCreary & Associates

Document StoresKey-Value Stores

Graph Stores

XML

Object Stores Column Stores

Page 49: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

49

Key Value Stores

• A table with two columns and a simple interface– Add a key-value– For this key, give me the

value– Delete a key

• Blazingly fast and easy to scale

Key Value

Page 50: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

50

Types of Key-Value Stores

• Eventually‐consistent Key‐Value store• Hierarchical Key-Value Stores• Key-Value Stores In RAM• Key Value Stores on Disk• Ordered Key-Value Stores

Page 51: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

51

Cassendra

• Apache open source project• Originally developed by Facebook• Designed for highly distributed high-

reliable systems• No single point of failure• Column-family data model

http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf

Page 52: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

52

Voldomort

• A distributed key-value system• Used at LinkedIn• 10K-20K node operations/CPU• Auto-sharding• Graceful server failure handling

Page 53: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

53

MongoDB

• Open Source License• Document/Collection centric• Sharding built-in, automatic• Stores data in JSON format• Query language is JSON• Can be 10x faster than MySQL• Many languages (C++, JavaScript, Java,

Perl, Python etc.)

Page 54: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

54

Hadoop/Hbase

• Open source implementation of MapReduce algorithm written in Java

• Initially created by Yahoo– 300 person-years development

• Column-oriented data store• Java interface• Hbase designed specifically to work with

Hadoop

Page 55: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

55

CouchDB

• Apache Document Store• Written in ERLANG• RESTful JSON API• Distributed, featuring robust, incremental

replication with bi-directional conflict detection and management

Page 56: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

56

Memcached

• Free & open source in-memory caching system• Designed to speeding up dynamic web applications by alleviating

database load• RAM resident key-value store for small chunks of arbitrary data

(strings, objects) from results of database calls, API calls, or page rendering

• Simple interface• Designed for quick deployment, ease of development• APIs in many languages

Page 57: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

57

MarkLogic

• Native XML database designed to used by Petabyte data stores

• ACID compliant• Heavy use by federal agencies, document

publishers and "high-variability" data• Arguably the most successful NoSQL

company

Page 58: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

58

eXist

• OpenSource native XML database• Strong support for XQuery and XQuery

extensions• Heavily used by the Text Encoding

Initiative (TEI) community and XRX/XForms communities

• Ideal for metadata management• Integrated Lucene search and structured

search

Page 59: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

59

Riak

• Community and Commercial licenses• A "Dynamo-inspired" database• Written in ERLANG• Query JSON or ERLANG

Page 60: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

60

Hypertable

• Open Source• Closely modeled after Google's Bigtable

project• High performance distributed data storage

system• Designed to support applications requiring

maximum performance, scalability, and reliability

• Hypertable Query Language (HQL) that is syntactically similar to SQL

Page 61: The CIOs Guide to NoSQL

M

D

Selecting a NoSQL Pilot Project

• The "Goldilocks Pilot Project Strategy"

• Not to big, not to small, just the right size

• Duration• Sponsorship• Importance• Skills• Mentorship

61Copyright 2010 Dan McCreary & Associates

Page 62: The CIOs Guide to NoSQL

M

DCopyright Kelly-McCreary & Associates, LLC

62

The Future of the NoSQL Movement

• Will data sets continue to grow at exponential rates?• Will new system options become more diverse?• Will new markets have different demands?• Will some ideas be "absorbed" into existing RDBMS vendors products?• Will the NoSQL community continue to be the place where new

database ideas and products are incubated?• Will the job of doing high-quality architectural tradeoffs analysis become

easier?

Growth Diversity

Page 63: The CIOs Guide to NoSQL

M

D

Start Finish

Using the Wrong Architecture

Credit: Isaac Homelund – MN Office of the Revisor

Page 64: The CIOs Guide to NoSQL

M

D

Using the Right Architecture

Start Finish

Find ways to remove barriers to empoweringthe non programmers on your team.

Page 65: The CIOs Guide to NoSQL

M

DKelly-McCreary & Associates, LLC

65

Questions

Dan McCreary

President, Kelly-McCreary & Associates

[email protected]