45
Database Choices Lynn Langit Jan 2014 – Startup Code Camp in the OC

Not only SQL - Database Choices

Embed Size (px)

DESCRIPTION

deck from talk at StartupCodeCamp at House of Devs in the OC in Jan 2014

Citation preview

Page 1: Not only SQL - Database Choices

Database ChoicesLynn Langit

Jan 2014 – Startup Code Camp in the OC

Page 2: Not only SQL - Database Choices

Data Expertise / Lynn Langit

• Industry awards– Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform– 10Gen – Master for MongoDB

• Practicing Architect• Technical author / trainer

– Pluralsight – Google Cloud Series– DevelopMentor – SQL Server 2012 Series – 2 books on SQL Server BI– Cloudera trainer (certified)

• Former MSFT FTE– 4 years

Page 3: Not only SQL - Database Choices

Databases Now a Menu of

Choices

Page 4: Not only SQL - Database Choices

Data Pipeline

Clean Existing

Acquire New

Process All

Store Some

Query & Mine

Page 5: Not only SQL - Database Choices

Is Big Data = NoSQL and just Hadoop?

HUGE Hype factor since 2011

Apache Hadoop • a software framework that supports data-intensive 

distributed applications • under a free license enables applications to work with thousands of

nodes and petabytes of data • was inspired by Google's MapReduce and Google File System (GFS)

papers

Page 6: Not only SQL - Database Choices

Hadoop in the Enterprise

Page 7: Not only SQL - Database Choices

How you ‘get’ Hadoop

• roll your own

Open source

• Cloudera• MapR• Hortonworks• More…

Commercial distribution

• AWS• HDInsight

Rent it via the cloud

Page 8: Not only SQL - Database Choices

Demo – AWS MapReduce

Page 9: Not only SQL - Database Choices

Working with Hadoop

Page 10: Not only SQL - Database Choices

About Hadoop MapReduce

Image from - https://developers.google.com/appengine/docs/python/images/mapreduce_mapshuffle.png

Page 11: Not only SQL - Database Choices

The Hadoop on premises

Market LeaderIs

Cloudera

Page 12: Not only SQL - Database Choices

Example Comparison: RDBMS vs. Hadoop

Traditional RDBMS Hadoop / MapReduce

Data Size Gigabytes (Terabytes) Petabytes and greater

Access Interactive and Batch Batch – NOT Interactive

Updates Read / Write many times Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

Scaling Nonlinear Linear

Query Response Time

Can be near immediate Has latency (due to batch processing)

Page 13: Not only SQL - Database Choices

“Small” BigData vs. “Big” BigData

Hadoop

NoSQL

RDBMS

Hadoop

NoSQL

RDBMS

On Premises In the Cloud

Page 14: Not only SQL - Database Choices

But wait…

is there a relational database

that scalesthat is cheap

that runs in the cloud?

Page 15: Not only SQL - Database Choices

DEMO - AWS Redshift• About $1k per Terabyte per year - relational

Page 16: Not only SQL - Database Choices

Cloud-hosted NoSQL up to 50x CHEAPER

Page 17: Not only SQL - Database Choices

So many NoSQL options

• More than just the Elephant in the room• Over 150+ types of NoSQL databases

Page 18: Not only SQL - Database Choices

Flavors of NoSQLKey/ValueVolatile

Key/valuePersistent

Wide-Column Document Graph

Page 19: Not only SQL - Database Choices

Key / Value Database• Just keys and values

– No schema• Persistent or Volatile• Examples

– AWS Dynamo DB– Riak

Page 20: Not only SQL - Database Choices

DEMO - AWS DynamoDB

• Key/Value store on the AWS cloud

Page 21: Not only SQL - Database Choices

File (BLOB) Storage Buckets in the Cloud

• Amazon – S3 or Glacier• Google – Cloud Storage• Microsoft Azure BLOBS

Page 22: Not only SQL - Database Choices

DEMO - Battle of the Buckets

• Google Cloud Storage VS.• Windows Azure BLOBS VS.• AWS S3 (Archiving) in to AWS Glacier

Page 23: Not only SQL - Database Choices

Column Database

• Wide, sparse column sets• Schema-light

• Examples:– HBase w/Hadoop– Google Cloud Datastore– SQL Server Columnstore Indexes or SSAS Tabular Models

Page 24: Not only SQL - Database Choices

Types of Column Databases• Column-families

– Non-relational– Sparse– Examples:

• HBase• Cassandra• xVelocity (SQL 2012 Tabular)

• Column-stores– Relational– Dense– Example:

• SQL Server 2012 – Columnstore index

Page 25: Not only SQL - Database Choices

DEMO – Google Cloud Datastore

Page 26: Not only SQL - Database Choices

DEMO – SQL Server ‘NoSQL’

• SQL Server 2012 Columnstore Index• SQL Server 2012 Tabular Model (SSAS)

Page 27: Not only SQL - Database Choices

Document Database (Mongo DB)• document-oriented (collection of

JSON documents) w/semi structured data– Encodings include BSON, JSON, XML…

• binary forms – PDF, Microsoft Office documents --

Word, Excel…)

• Examples:– MongoDB– Couchbase

Page 28: Not only SQL - Database Choices

Demo - Mongo DB

Page 29: Not only SQL - Database Choices

Graph Databases

• a lot of many-to-many relationships• recursive self-joins • when your primary objective is quickly

finding connections, patterns and relationships between the objects within lots of data

• Examples:– Neo4J– Google Freebase

Page 30: Not only SQL - Database Choices

DEMO – Neo4J

Page 31: Not only SQL - Database Choices

“Small” BigData vs. “Big” BigData

Hadoop

Key/Value or Column

Document or Graph

RDBMS

On Premise or In the Cloud

Page 32: Not only SQL - Database Choices

Cloud-hosted RDBMS

• AWS RDS – SQL Server, mySQL, Oracle– Medium cost– Solid feature set, i.e.

backup, snapshot– Use existing tooling

• Google – mySQL– Lowest cost– Most limited RDBMS

functionality• Microsoft – SQLAzure

– Highest cost

Page 33: Not only SQL - Database Choices

DEMO - AWS RDS

• SQL Server, MySQL or Oracle• Essential to understand pricing models

Page 34: Not only SQL - Database Choices

Image - http://blog.outsourcing-partners.com/wp-content/uploads/2012/10/performance.png

Page 35: Not only SQL - Database Choices

NoSQL Applied

Soci

al G

ames

Prod

uct C

atal

ogs

Soci

al a

ggre

gato

rs

Log

File

s

Line

-of-B

usin

ess

ColumnstoreHBase

Key/ValueDynamoDB

DocumentMongoDB

GraphNeo4j

RDBMSSQL Server

Page 36: Not only SQL - Database Choices

Cloud Offerings– RDBMS AND NoSQL

AWS Google Microsoft

RDBMS RDS – all major mySQL SQL Azure

NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs

NoSQL Key-Value DynamoDB Cloud Datastore Azure Tables

Streaming ML or (Mahout)

Custom EC2 Prospective Search &Prediction API

StreamInsight

NoSQL Document or Graph

MongoDB on EC2 Freebase MongoDB on Windows Azure

NoSQL – ColumnHadoop (HBase)

Elastic MapReduce using S3 & EC2

none HDInsight

Dremel/Warehousing

RedShift BigQuery none

Page 37: Not only SQL - Database Choices

But wait…how do I queryNoSQL data?

Page 38: Not only SQL - Database Choices

Alw

ays

Map

Redu

ce?

Page 39: Not only SQL - Database Choices

Can Excel help?

Connector to Hadoop Data Explorer Data Quality

Services

Master Data Services

Integration with Azure

Data Market

Visualize with PowerView

Data Mining w/Predixion

Page 40: Not only SQL - Database Choices

Demo - Hadoop Connector to Excel

Page 41: Not only SQL - Database Choices

Other types of cloud data services

Hosting public datasets• Pay to read• Earn revenue by offering for

read

Cleaning / matching (your) data • ETL – Microsoft Data

Explorer, Google Refine• Data Quality – Windows

Azure Data Market, InfoChimps, DataMarket.com

Page 42: Not only SQL - Database Choices

Collecting for “BigData”• Sensors everywhere• Structured, Semi-structured, Unstructured vs. Data

Standards• M2M• Public Datasets

– Freebase– Azure DataMarket– Hillary Mason’s list

42

Page 43: Not only SQL - Database Choices

NoSQL To-Do List

Understand types of NoSQL databases• Use NoSQL when business needs designate• Use the right type of NoSQL for your business problem

Try out NoSQL on the cloud• Quick and cheap for behavioral data• Mashup cloud datasets• Good for specialized use cases, i.e. dev, test , training environments

Learn NoSQL access technologies & services• New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon

Karmasphere, Microsoft Excel connectors, etc…• Windows Azure Data Market, other public data markets

Page 44: Not only SQL - Database Choices

www.TeachingKidsProgramming.org• Free Courseware (Java, Small Basic or C# [on Pluralsight])• Do a Recipe Teach a Kid (Ages 10 ++)

• recipes)

Page 45: Not only SQL - Database Choices

Keep Learning• Twitter: @LynnLangit• YouTube:

http://www.youtube.com/user/SoCalDevGal

• Hire me– To help build your BI/Big Data solution– To teach your team next gen BI– To learn more about using NoSQL

solutions