45
Azure DocumentDB: Deep Dive into Advanced Features Aravind Ramachandran Program Manager Azure DocumentDB @arkramac Andrew Liu Program Manager Azure DocumentDB @aliuy8

[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Embed Size (px)

Citation preview

Page 1: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Azure DocumentDB:Deep Dive into

Advanced FeaturesAravind RamachandranProgram ManagerAzure DocumentDB@arkramac

Andrew LiuProgram ManagerAzure DocumentDB@aliuy8

Page 2: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

A Quick Recap…

Page 3: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

3 V’s of data : Endless possibilities

Learning

Gaming

Retail

Telematics

Mobile Apps

IoT

Velocity :High

Throughputwith Low Latency

Volume :Massive

Amounts of Data

Variety : Schema-freedom

Page 4: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

The 2x2s of database tradeoffs

Latency

Dur

abili

tyLow

High

HighLow

Schema/Index Management

Que

ry

Poor

Rich

Agnostic

Required

Availability

Prog

ram

mab

ility

Low

High

HighLow

Scale

Dis

trib

uti

onSingle DC

World

Elastic

Static

Scale

Txn

Scop

e

Single item

Multiple items

HighLow

Performance Isolation

TCO

Low

High

AirtightNoisy Neighbor

Page 5: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

DocumentDB: Capabilities

Guaranteed low latency

• <10ms reads/<15ms writes @ P99. • Requests are served from local

region• Write optimized, latch-free database

engine designed for SSDs and low latency access.

• Synchronous and automatic document indexing at sustained ingestion rates

Elastic and limitless

global scale• Independently scale throughput and storage - locally and globally

• Transparent partition management and routing

Multiple consistency levels

• Multiple well defined consistency levels• Intuitive programming model for relaxed consistency

models • Clear PACELC tradeoffs and 99.99% availability SLAs

SQL and JavaScript –

schema free• Automatic tree path based indexing • No schemas or secondary indices

required upfront• SQL and JavaScript language

integrated queries• Hash, range, and spatial• Multi-document, JavaScript language

integrated transactions

Page 6: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

DocumentDB resource model Resources• identified by their logical and stable URI • Represented as JSON documents• Partitioned and across span machines, clusters and

regions

1

Resource model• Stateless interaction (HTTP and TCP)• Hierarchical overlay atop partitioning

model

2

Partitioning Model• Grid Partitioning – horizontal based on

hash/range and vertical across regions• Each partition made highly available via a

replica set

3

Replica-set

US-East

US-West

N Europe

Partitions

Partition set

Local distribution

Glo

bal d

istr

ibut

ion

Page 7: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Accessing DocumentDB

Java .NET

TCP/SSL HTTPS

DocumentDB Service

DocumentDB client SDKs and tools DocumentDB

Hadoop and Spark connectorsJSON, SQL,

JavaScript

MongoDB wire protocol

drivers for MongoDB

Java .NETRuby…

MongoDB toolchain, MongoDB client drivers, Parse SDK

Clients

BSON

Page 8: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Let’s talk about…• Modeling JSON Documents

• Collections and Scaling

• Query and Indexing

• Global Distribution

• Tips and Best Practices

Everything you need to know to build

Blazing fast, planet-scale applications!

Page 9: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Let’s talk about JSON documents

"With great power comes great responsibility“

- Uncle BenDocumentDB gives you the power of true schema-freedom.Generally de-normalize… but don't just do it blindy.

Page 10: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

How do approaches differ?

Page 11: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Data normalizationORM

How do approaches differ?

Page 12: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Come as you are

Data normalizationORM

How do approaches differ?

Page 13: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Person

Address

ContactDetail

ContactDetailType

PersonContactDetailLnk

PersonIdContactDetailId

Id Id

Id Id

Modeling Data: The Relational Way

Page 14: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Person Id

Addresses

{ "id": "0ec1ab0c-de08-4e42-a429-...", "addresses": [ { "street": "1 Redmond Way", "city": "Redmond", "state": "WA", "zip": 98052} ], "contactDetails": [ {"type": "home", "detail": “555-1212"}, {"type": "email", "detail": “[email protected]"} ], ...}

Address…

Address…

ContactDetails

ContactDetail…

Modeling Data: The Document Way

Page 15: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

To embed, or to reference, that is the question

Page 16: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "addresses": [ { "line1": "100 Some Street", "line2": "Unit 1", "city": "Seattle", "state": "WA", "zip": 98012 } ], "contactDetails": [ {"email: "[email protected]"}, {"phone": "+1 555 555-5555", "extension": 5555} ] }

Try model your entity as a self-contained documentGenerally, use embedded data models when:

There are "contains" relationships between entitiesThere are one-to-few relationships between entities Embedded data changes infrequentlyEmbedded data won’t grow without boundsEmbedded data is integral to data in a document

Data modeling with denormalization

Denormalizing typically provides for better read performance

Page 17: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

In general, use normalized data models when:

Write performance is more important than read performanceRepresenting one-to-many relationshipsCan representing many-to-many relationshipsRelated data changes frequently

Provides more flexibility than embeddingMore round trips to read data

Data modeling with referencing

{"id": "xyz","username:

"user xyz"}

{"id":

"address_xyz","userid": "xyz",

"address" : {…

}}

{"id:

"contact_xyz","userid": "xyz","email" :

"[email protected]" "phone" : "555 5555"}

User document

Address document

Contact details document

Normalizing typically provides better write performance

Page 18: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

No magic bullet

Hybrid Approach:Model on a property-level(as opposed to record-level)

Optimize your data model for your workload…(as opposed to blindly following types)

Modeling impacts RU due to document size

Hybrid models

{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "countOfBooks": 3, "books": [1, 2, 3], "images": [

{"thumbnail": "http://....png"} {"profile": "http://....png"}

] }

{ "id": 1, "name": "DocumentDB 101", "authors": [

{"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"},

{"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"}

] }

Author document

Book document

Page 19: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Collections + Elastic Scale

Page 20: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Elastic scale

Page 21: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Measuring Throughput (Request Units)

Replica gets a fixed budget of request units

Request Unit/sec (RU) is the normalized currency

% IOPS

% CPU

% Memory

READGET Documen

t

Documents

INSERT

POST

REPLACE

PUT Document

Operations consume request units (RUs)

QueryPOST Documen

ts

Min RU/sec

Max RU/sec

Inco

min

g Re

ques

ts

Replica Quiescent

Ratelimit

Nothrottling

Requests get rate limited if they exceed the SLA Customers pay for

reserved request units by the hour

Page 22: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

What are partitions?

…. ….

Partition 1

Partition 2

Partition i Partition n

Collection

Page 23: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

What are partitions?

…. ….

London

Paris

Partition 1

Partition 2

Partition i Partition n

New York …

Houston

Chicago

New Delhi

Mumbai

Boston

Berlin

Partition Key = city

Page 24: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Partitioning patterns Writes should scale across Partition Keys

…. ….

Partition 1

Partition 2

Partition i Partition n

……

Page 25: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Partitioning patterns Writes should scale across Partition Keys

…. ….

Partition 1

Partition 2

Partition i Partition n

……

Page 26: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Partitioning patterns Reads should minimize cross-partition lookups

…. ….

Partition 1

Partition 2

Partition i Partition n

……

Page 27: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Recipe for Choosing Partition Key• Start with the Workload – Is it Read vs Write heavy?

• Top Queries – Look for commonly filtered properties

• Transaction Boundary

• Avoid Storage + Performance Bottlenecks

• Aim for high cardinality… More partition key values = happiness

• Examples: Partition by TenantId or DeviceId… composite w/ Timestamp

Page 28: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Let's talk about Query and Indexing

Page 29: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Query and IndexingDemo

Page 30: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

DocumentDB: SQL and JavaScript queries

{ "locations": [ { "country": "Germany", "city": "Berlin" }, { "country": "France", "city": "Paris" } ], "headquarter": "Belgium", "exports": [{ "city": "Moscow" }, { "city": "Athens" }]};

locations headquarter exports

0 1

country

Germany

city

Berlin

country

France

city

Paris

city

Moscow

city

Athens

Belgium 0 1

{ "locations": [{ "country": "Germany", "city": "Bonn", "revenue": 200 } ], "headquarter": "Italy", "exports": [ { "city": "Berlin","dealers": [{"name": "Hans"}] }, { "city": "Athens" } ]}; locations headquarter

0

country

Germany

city

Bonn

revenue

200

Italy

exports

city

Berlin

city

Athens

0 1

dealers

0

Hans

name

{ "results": [ { "locations": [ {"country":"Germany","city":"Berlin"}, {"country":"France","city":"Paris"} ] } ]}

0

locations

0 1

country

Germany

city

Berlin

country

France

city

Paris

results

SELECT C.locations FROM company C WHERE C.headquarter = "Belgium"

SQL

function businessLogic() { var country = "Belgium"; __.filter(function(x){return x.headquarter===country;});}

JavaScript

Page 31: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Indexing under the hood• Logically the index is a union of all the document trees• Structure contributed by the interior nodes, instance values are

the leavesCommonstructure

• Structural information and instance values are normalized into a unifying concept of JSON-Path

Terms Postings List

$/location/0/ 1, 2location/0/country/ 1, 2location/0/city/ 1, 20/country/Germany

1, 2

1/country/France 2 … …0/city/Moscow 20/dealers/0 2

0

Germany

location

0

location

country

0

country

Range & ORDERBY queries

0

Germany

location

0

location

country

0

country

Wildcard queries Spatial queries

0

coordinates

1 2

Dynamic Encoding of Postings List(E-WAH/differential)

Check out our

VLDB paper, her

e!

Page 32: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Queries that use the index

• Equality: =• Range: <, >, <=, >=• ORDER BY• String operators: STARTSWITH• Spatial operators: ST_WITHIN and ST_DISTANCE• Array operators: ARRAY_CONTAINS• Schema operators: IS_DEFINED, IS_NUMBER, IS_STRING, …

Page 33: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Indexing PoliciesConfiguration Level Options

Automatic Per collection True (default) or False Override with each document write

Indexing Mode Per collection Consistent, Lazy, and NoneNone for KV workloads

Included and excluded paths

Per path Individual path or recursive includes (? And *)

Indexing Type Per path Support Hash, Range, and Spatial

Indexing Precision Per path Supports 1 – 100 per path (and max)Tradeoff storage, query RUs and write Rus

Page 34: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Let’s talk about Planet-Scale

Page 35: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Guaranteed low latency

“I want my data wherever my users are.”

Page 36: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Guaranteed high availability

Globally. With policy based failover.

99.99%

Page 37: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Multi-region DocumentDB databases

=DocumentDB Collection

Replica-set

US-East

US-West

India

Partitions

Partition set

Glo

bal d

istr

ibut

ion

Local distribution

Primary Replica-sets

2M RUs

…Secondary Replica-sets 2M

RUs …

2M RUs

Secondary Replica-sets

…A DocumentDB collection

2M RUs

Total RUs = Provisioned RUs x Number of regions

In this example: 2M RUs x 3 regions = 6M RUs

Page 38: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Programmable data consistency

“Its hard to write distributed apps.”

Strong consistency, High latency

Eventual consistency, Low latency

Page 39: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Consistency Levels• PACELC Theorem and the associated tradeoffs

Page 40: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Consistency Levels• Strong, Eventual, Bounded Staleness, and

Session

Strong

Bounded Staleness

Session

Eventual

LEFT TO RIGHT Weaker Consistency, Better Read scalability, Lower write latency

Client

P SS

Client

P SS

Client

P SS

Client

P SS

Client

• Consistent Prefix reads. • Reads lag behind writes by

K prefixes or T interval

• Monotonic reads, writes and Read your writes guarantee

Page 41: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Global DistributionDemo

Page 42: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

DocumentDB Recent Updates

• Automatic Expiration via Time-To-Live (TTL)

• Expanded Geo-Spatial support for Polygons and Lines

• Preview Support for• Local Emulator• IP Filtering• Self-Service Backup + Restore• Protocol Support for MongoDB

Page 43: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Q&A and more resources…

AskDocDB@microsoft

Follow @DocumentDBUse #DocumentDB

documentdb.com

#azure-documentDB

Page 44: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Session Evaluations

ways to access

Go to passSummit.com

Download the GuideBook App and search: PASS Summit 2016

Follow the QR code link displayed on session signage throughout the conference venue and in the program guide

Submit by 5pmFriday November 6th toWIN prizes

Your feedback is important and valuable. 3

Page 45: [PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features

Thank You Learn more from

Azure [email protected] or follow @DocumentDB