Build High-Scale Applications with Amazon DynamoDBfiles.meetup.com/10649082/Build High-Scale Applications with Amazon... · Amazon DynamoDB Amazon RDS Amazon Amazon ElastiCache S3

Build High-Scale Applications

with Amazon DynamoDB

David Pearson Business Development AWS Database Services

Databases in the Cloud data tier workload data store planning prioritization selection

Traditional Database Architecture

App/Web Tier

Client Tier

RDBMS

• key-value access

• complex queries

• transactions

• analytics

One Database for All Workloads

App/Web Tier

Client Tier

RDBMS

Cloud Data Tier Architecture

App/Web Tier

Client Tier

Data Tier

Search Cache Blob Store

RDBMS NoSQL Data Warehouse

Workload Driven Data Store Selection

Data Tier

Search Cache Blob Store

RDBMS NoSQL Data Warehouse

logging analytics

key/value simple query

rich search

transaction processing

hot reads

AWS Services for the Data Tier

Data Tier

Amazon DynamoDB

Amazon RDS

Amazon ElastiCache

Amazon S3

Amazon Redshift

Amazon CloudSearch

logging analytics

key/value simple query

rich search

transaction processing

hot reads

Simple Guide to Database Selection

Predominant Requirement Recommendation

Seamless scale and super availability Amazon DynamoDB

Complex query workloads and need relational capabilities

Amazon RDS

Caching Amazon ElastiCache

Deep analytics Amazon Redshift

Cases where these services are not the right fit

Build your own on EC2!

DynamoDB Fundamentals

original tier one use case applications

RDBMS = Default Choice

• Amazon.com page composed of responses from 1000’s of independent services

• Query patterns for different service are different Catalog service is usually heavy key-value

Ordering service is very write intensive (key-value)

Catalog search has a different pattern for querying

Relational Era @ Amazon.com

RDBMS

Poor Availability Limited Scalability High Cost

Dynamo = NoSQL Technology

• Replicated DHT with consistency management

• Consistent hashing

• Optimistic replication

• “Sloppy quorum”

• Anti-entropy mechanisms

• Object versioning

Distributed Era @ Amazon.com

lack of strong every engineer needs to operational consistency learn distributed systems complexity

DynamoDB = NoSQL Cloud Service

Cloud Era @ Amazon.com

Non-Relational

Fast & Predictable Performance

Seamless Scalability

Easy Administration

Tier One Applications

database service

automated operations predictable performance

durable low latency cost effective

=

• Developers are freed from: Performance tuning (latency)

Automatic 3-way multi-AZ replication

Scalability (and scaling operations)

Security inspections, patches, upgrades

Software upgrades, patches

Automatic hardware failover

Improving the underlying hardware

…and lots of other stuff

Built to make life easier for developers

Provisioned Throughput • Request-based capacity provisioning model

• Throughput is declared and updated via the API or the console CreateTable (foo, reads/sec = 100, writes/sec = 150) UpdateTable (foo, reads/sec=10000, writes/sec=4500)

• DynamoDB handles the rest Capacity is reserved and available when needed Scaling-up triggers repartitioning and reallocation No impact to performance or availability

Predictable Performance

WRITES Continuously replicated to 3 AZ’s Quorum acknowledgment Persisted to disk (custom SSD)

READS Strongly or eventually consistent

No trade-off in latency

Durable Low Latency

DynamoDB Primitives simple rich query fast application API support development

DynamoDB Concepts

table

DynamoDB Concepts

table

items

DynamoDB Concepts

attributes

items

table

schema-less schema is defined per attribute

DynamoDB Concepts

attributes

items

table

scalar data types • number, string, and binary multi-valued types • string set, number set, and binary set

DynamoDB Concepts

hash

hash keys mandatory for all items in a table key-value access pattern

PutItem UpdateItem DeleteItem BatchWriteItem

GetItem BatchGetItem

Hash = Distribution Key

partition 1 .. N

hash keys mandatory for all items in a table key-value access pattern determines data distribution

Hash = Distribution Key

large number of unique hash keys

uniform distribution of workload across hash keys

! +

Range = Query

range

hash

range keys model 1:N relationships enable rich query capabilities composite primary key

all items for a hash key ==, <, >, >=, <= “begins with” “between” sorted results counts top / bottom N values paged responses

Index Options

local secondary indexes (LSI) alternate range key + same hash key index and table data is co-located (same partition)

Projected Attributes

KEYS_ONLY INCLUDE ALL





Index Options

global secondary indexes (GSI)

any attribute indexed as new hash or range key


Designing for Scale access pattern use case under-rated modeling walk-thru features

• Method 1. Describe the overall use case – maintain context

2. Identify the individual access patterns of the use case

3. Model each access pattern to its own discrete data set

4. Consolidate data sets into tables and indexes

• Benefits Single table fetch for each query

Payloads are minimal for each access

Access Pattern Modeling

Multi-tenant application for file storing and sharing

• UserId is the unique identifier of each user

• FileId is the unique identifier of each file, owner by user

Use Case Walk Thru

Good PK selection: UserId (hash) + FileId (range)

use case access patterns data design

1. Users should be able to query all the files they own

2. Search by File Name

3. Search by File Type

4. Search by Date Range

5. Keep track of Shared Files

6. Search by descending order or File Size

Use Case Walk Thru


additional (non-PK) attributes & index candidates

Users

• Hash key = UserId (S)

• Attributes = UserName (S), Email (S), Address (SS) ...

User_Files

• Hash key = UserId (S)

• Range key = FileId (S)

• Attributes = Name (S), Type (S), Size (N), Date (S), SharedFlag (S), S3key (S)

DynamoDB Data Model

Secondary Indexes

Table Name Index Name Attribute to Index Projected Attribute

User_Files NameIndex Name KEYS

User_Files TypeIndex Type KEYS + Name

User_Files DateIndex Date KEYS + Name

User_Files SharedFlagIndex SharedFlag KEYS + Name

User_Files SizeIndex Size KEYS + Name

example only – required data returned determines optimal projections

• Find all files owned by a user

Query User_Files table (UserId = 2)

Access Pattern 1

UserId (Hash)

FileId (Range)

Name Date Type SharedFlag Size S3key

1 1 File1 2013-04-23 JPG 1000 bucket\1

1 2 File2 2013-03-10 PDF Y 100 bucket\2

2 3 File3 2013-03-10 PNG Y 2000 bucket\3

2 4 File4 2013-03-10 DOC 3000 bucket\4

3 5 File5 2013-04-10 TXT 400 bucket\5

• Search by file Name

Query (IndexName = NameIndex, UserId = 1, Name = File1)

Access Pattern 2

UserId (hash)

Name (range)

FileId

1 File1 1

1 File2 2

2 File3 3

2 File4 4

3 File5 5

NameIndex

• Search for file name by file Type

Query (IndexName = TypeIndex, UserId = 2, Type = DOC)

Access Pattern 3

UserId (hash)

Type (range)

FileId Name

1 JPG 1 File1

1 PDF 2 File2

2 DOC 4 File4

2 PNG 3 File3

3 TXT 5 File5

projection

TypeIndex

• Search for file name by Date range

Query (IndexName = DateIndex, UserId = 1, Date between 2013-03-01 and 2013-03-29)

Access Pattern 4

UserId (hash)

Date (range)

FileId Name

1 2013-03-10 2 File2

1 2013-04-23 1 File1

2 2013-03-10 3 File3

2 2013-03-10 4 File4

3 2013-04-10 5 File5

DateIndex

projection

• Search for names of Shared files

Query (IndexName = SharedFlagIndex, UserId = 1, SharedFlag = Y)

Access Pattern 5

UserId (hash)

SharedFlag (range)

FileId Name

1 Y 2 File2

2 Y 3 File3

SharedFlagIndex

projection

• Query for file names by descending order of file Size

Query (IndexName = SizeIndex, UserId = 1, ScanIndexForward = false)

Access Pattern 6

UserId (hash)

Size (range)

FileId Name

1 100 1 File1

3 400 2 File2

1 1000 3 File3

2 2000 4 File4

2 3000 5 File5

SizeIndex projection

• Users

• User_Files NameIndex

TypeIndex

DateIndex

SharedFlagIndex

SizeIndex

Use Case Walk Thru


Secondary Indexing Options

Local (LSI) Global (GSI)

Strongly or eventually consistent) reads

Eventually consistent reads only

Total storage limit per hash key (10GB)

No storage limit

Read and write units consumed from the table

Read and write units defined per GSI

• Consistent Reads Inventory, shopping cart applications

• Atomic Counters Increment and return new value in same operation

• Conditional Writes Expected value before write – fails on mismatch

“state machine” use cases

• Sparse Indexes Optimal for accessing boolean values

Popular: identify updated items for background clean-up process

Under-Rated Features

Moving to DynamoDB denormalized live NoSQL leaving entity-relationship migrations DynamoDB

Migration Considerations

• Data Layer Single-table use case

Denormalized preferred – to reduce multi-object access • Can be done on target design for high-scale patterns

Minimal dependencies (FK’s, triggers, procedures)

Design target for scale

• Application Code Review and validate access patterns

API / SQL mapping and conversion • DynamoDB = simple API’s… reads (2); writes (4); rich query (1); scan (1)

• Design for Scale Optimal target environment

• Identify minimal data unit to be migrated (user data)

• Plan migration activities Map source to target

Identify transforms

Update app to work with new target

Test and verify

• Implement the migration Triggered by user login

Login simulation (loop)

NoSQL – Live Migration Checklist (example)

Data Movement Templates

Export DynamoDB to S3 (archive) http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-dynamodbexportdataformat.html

http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-dynamodbexportdataformat.html





Getting Started zero cost use case design dev & test validation workshop

• DynamoDB Local Disconnected development with full API support

• No network

• No usage costs

• No SLA

http://aws.amazon.com/dynamodb/developer-resources/

• Use Case Assessment Want to test an idea, but unsure if your use case is a good fit?

• Design Call Ready to start coding (or scaling) and want to ensure optimal design?

Next Steps – Hands On

Register at: http://aws.amazon.com/about-aws/events/

Databases in the Cloud Webinar Series

• Data Modeling and Best Practices for Scaling your Application with Amazon DynamoDB

• February 27, 2014 10:00am PST

High-scale applications like social gaming, chat, and voting

Model these applications using DynamoDB, including how to use building blocks such as conditional writes, consistent reads, and batch operations.

Incorporate best practices such as index projections, item sharding, and parallel scan for maximum scalability

Next Steps – More Information

http://aws.amazon.com/about-aws/events/




• The Mill DTLV – AWS Perk

• Feb 27 – DynamoDB Webinar (Advanced Schema Design)

• March 10 – SXSW AWS Sessions

• March 20 – Las Vegas Meetup – Redshift

• March 26 – AWS Summit – San Francisco

Upcoming AWS Events

Questions David Pearson [email protected]

Documents

Build High-Scale Applications with Amazon DynamoDBfiles.meetup.com/10649082/Build High-Scale Applications with Amazon... · Amazon DynamoDB Amazon RDS Amazon Amazon ElastiCache S3