56
7 Databases in 70 Minutes Overview of NoSQL in Azure

7 Databases in 70 minutes

Embed Size (px)

Citation preview

7 Databasesin 70 MinutesOverview of NoSQL in Azure

Technical Architect at Microsoft

Primary focus on data solutions in the cloud

Lara Rubbelke

@sqlgal

www.linkedin.com/in/lararubbelke/

Karen has 20+ years of data and information architecture

experience on large, multi-project programs.

She is a frequent speaker on data modeling, data-driven

methodologies and pattern data models.

She wants you to love your data.

Karen López #TEAMDATA

The only reason for time is so that

everything doesn’t happen at once.

- Albert Einstein*

Session inspired by the book

Seven Databases in Seven Weeks

key concepts for

hybrid database

architectures

database /

datastore types

reasons to go

explore

OutcomesWe want you to leave here understanding:

This

is

NOT…

a deep dive on any technology

a comprehensive list

a roadmap discussion

What We Will Cover

What We’ll CoverNoSQL

101Comparison to relational

Not Only SQL (but really “Not SQL”)

Terminology

Categories What they are

Why you use them

When you use them

A little of how to use them

CAPACID

BASE

SCHEMA

Cloud

Scale

Distributed Systems and the CAP Theorem

AvailabilityConsistency

Partition Tolerant

Eric Brewer’s

CAP Theorem

and even better

CAP Twelve Years Later

Myth: Eric Brewer On Why Banks Are

BASE Not ACID - Availability Is Revenue

Basically Available

Soft State

Eventually Consistent

BASE ACID

Atomic

Consistent

Isolated

Durable

BASE - ACID

Polyglot

persistence

• Optimized for data

• Optimized for workload

Not all new

• EAV

• XML

• Architecture paradigm: OLAP/DW

and OLTP

The And

Polyschematic

Multiple schemas over

the same data

Schema on read, not

on write

Data integrity may be

managed elsewhere

The Why

* ALL DATA HAS STRUCTURE!

** EMBRACE DENORMALIZATION

Kinect Telemetry Retail Application

Reporting/Analysis

Hadoop Batch

Processing

Sensor Data

Column Family

Price Check

Key-Value

Product Catalog

Document Store

{ }

Data-Intensive Applications in

the Cloud Computing World

Activity QueueAzure Storage

Google Analytics Logs

Azure Storage

Email DBsSQL Azure x 16

Username DBsSQL Azure x 16

User Profiles SQL Azure x 400

Activity TableX 50 PartitionsAzure Storage

IIS LogsAzure Storage

Data Analysis: StagingVirtual Machine

Data Warehouse

Reporting Services

Activity ProcessorsWorker Roles x 2

Cache

Users and Friends FeedGames and Leader BoardsResources and ReferencesDistributed Cache x 32

Cache TasksWorker Roles x 4

Back OfficeWeb Roles x 2

Background Tasks DBUtility DB, Content DB, Taxonomy DBSQL Azure

Web ApplicationWeb Roles x 180

Web Service/APIWeb Roles x 2

Moderation Service/Appliance

CRISP/3rd Party

NoSQL, Not Only SQL

Relational Key ValueColumn

Family

Document Hadoop Graph

…Lots of other sessions to learn about this….Relational

Azure

Tables

Azure

Redis

Cache

Key-Value

Database

Key-Value: Sample Use

Table: PriceCompare

LocationID ProductBySellerID ProductProperties

123 013803204131 {Seller:“Camera Superstore”,

Price:425.99, PriceDate:2014-11-06,

SellerType:”Online”}

Row Key PropertiesPartition Key

• Low cost, scalable, highly available

and geo-redundant

• Flexible schema

• Fast reads and writes on single key

values or partitioned key values

• Log data and cache

Patterns/What Works Anti-Pattern/Danger

Anything that requires:

• Joins

• Custom sorting

• Non-key filters

Why Key-Value

// Create a table client.

CloudTableClient tableKinect = account.CreateCloudTableClient();

CloudTable tableKinectTelemetry = tableKinect.GetTableReference(“pricecompare");

// Create a query for all entities.

IQueryable<DynamicTableEntity> query =

from q in tableKinectTelemetry.CreateQuery<DynamicTableEntity>()

where q.PartitionKey.Equals(123)

and q.RowKey.Equals(013803204131)

select q;

Azure Tables: LINQ Query

DocDB MongoDB BSON &

JSON

Databases,

Documents,

Collections

Document

Document: Persistence

Nested

Arrays

Keys & Values

Text, text, text….

Similar to XML patterns

Document Features

Document: Query

http://docs.mongodb.org/manual/core/read-operations-introduction/

DocDB

Mongo DB

• Variable Data Structures for same

type of entity

• Fast reads and writes on a complete

entity set

• Highly nested data stories

• Partially completed workflows

• You love JavaScript

Patterns/What Works Anti-Pattern/Danger

Anything that requires:

• Joins

• Complex transactional needs

• Lots of aggregation

Why Document

Logs

Pre-aggregated data

Product Catalog

Shopping Cart

Travel Reservation

Document Use Cases

Column Family

Sensor Data Analysis

Real-time Query

Web Indexer

Message Systems

Interactive Dashboards

Column Family Use Cases

Apache HBase Features

Random and Consistent Real-Time Read/Write

Automatic Sharding and Linear Scale

Billions of Rows and Millions of Columns

A map of maps….

With Tables

Column Families

Rows

Columns

Values

Column Family Stores

Do

n’t

Th

ink A

bo

ut

Th

ink A

bo

ut

Th

ink A

bo

ut

Row Key

720 gender -> male age -> 62

721 gender -> male photo -> image

723 video -> stream

Person Table

sparse | persistent | distributed | sorted | multidimensional

Understanding BigTable

{"trackingid" : 720,"gender" : "male","age" : 62

}

Great Reference: Understanding HBase and Big Table

HBase: A map of maps…{"720" : {"age" : "62","gender" : "male"

},"721" : {"age" : "40","gender" : "male","confidence" : "0.65"

},"722" : {"gender" : "female"

},“723" : {"age" : "12","gender" : "female","confidence" : "0.65"

},…

}

Row KeySparse

HBase: Column Families"720" : { “demographics” :

{ "age" : “62","gender" : “male“ },

“interactions” :{ “devicestate” : “removed”,“duration” : “100” }

},"721" : { “demographics” :

{ "age" : “40","gender" : “male“ },

“interactions” :{ “devicestate” : “replaced”,“duration” : “50” }

}…

Demographics

Interactions

Demographics

Interactions

Multidimensional

HBase: Physical View of a Sorted Map

Sort OrderRow Key

Column Name

Timestamp

Row Key Column Key Timestamp Value

720 demographics:age 1423234758774 62

720 demographics:gender 1423234758711 male

721 demographics:age 1423234758946 22

721 demographics:age 1423234758725 32

721 demographics:gender 1423234758950 female

telemetry

CellUninterpreted Bytes

{row, column, version}

HBase: Query

And… HBase SDK for .NET

CREATE TABLE IF NOT EXISTS "kinecttelemetry"("k" VARCHAR primary key, "age" VARCHAR, "gender" VARCHAR) default_column_family='demographics';

Apache Phoenix: SQL Skin over HBase

Phoenix in 15 Minutes or Less

Get started using HBase with Hadoop in HDInsight

Analyze Real-Time Twitter Sentiment with HBase in

HDInsight

Learn More: HBase on Azure

Distributed Storage

(HDFS or Blob Storage)

Distributed Processing

(MapReduce)

Scripting

(Pig)

SQL-like Query

(HiveQL)

SQL-like Query

(Impala)

Resource Scheduling

(YARN)

Hadoop Zoo

Real-Time

(HBase)

Hadoop On Your Terms

Cloudera Selects Microsoft

Azure as a Preferred Cloud

Platform

Hortonworks Data Platform

is now Microsoft Azure

Certified

100% Apache Hadoop-based

Service in the Cloud

Microsoft Azure

HDInsight

Qubole Partners with

Microsoft Azure

It’s a text file…really

Hadoop: Persistence

CREATE EXTERNAL TABLE irs_data_20082(

state string,

zipcode string,

agi_class int,

n1 int,

mars2 int,

prep int,

n2 int,

numdep int,

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE

LOCATION 'wasb://$containerName@$storageAccountName.blob.core.windows.net/all/data/';

Create Table Queryselect state, zipcode,

agi_class

from irs_Data_20082;

Hadoop Hive: External Table

• Batch processing

• Map…and reduce

• Lots of aggregation

• Multiple schemas on same data

• Fast

Patterns/What Works Anti-Pattern/Danger

Anything that requires:

• Joins

• Complex transactional needs

• Granular security requirements

• Not a relational database

replacement

• Not fast

Why Hadoop

http://azure.microsoft.com/en-

us/documentation/services/hdinsight/

http://vision.cloudera.com/cloudera-on-azure/

http://hortonworks.com/labs/microsoft/

Resource for Hadoop on Azure

Neo4j

Project Naiad (MSR

to Open Source)

Graph

CREATE Query

Graph Database

http://neo4j.com/docs/stable/cypherdoc-tv-shows.html

• Highly connected data

• Relationships make the data story

• Paths through data

• Finding shortest/longest path

Patterns/What Works Anti-Pattern/Danger

• Low connected data (e.g. Log data)

• Very high number of updates on a

regular basis.

Why Graph

FoaF

(Social Graph)

Market Basket Analysis

Forensics

Fraud Detection

Recommendations

Use Cases for Graph Databases

It’s fun

Database technologies aren’t YES/NO decisions

It’s inexpensive to learn

It’s fast to spin up a learning environment

A data professional needs to knows more than one tool

Using the right tool for the right job is key

It’s fun

7 Reasons to Go Explore

MicrosoftAzure.com

• MSDN Subscription

Benefit

• Trial Accounts

Go Explore!

key concepts for

hybrid database

architectures

database /

datastore types

reasons to go

explore

OutcomesWe want you to leave here understanding: