AWS July Webinar Series - Getting Started with Amazon DynamoDB

Preview:

Citation preview

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Nate Slater, AWS Solutions Architect

July 30, 2015

Introduction to DynamoDB

Agenda

• What is DynamoDB?• DynamoDB Fundamentals• Typical Workloads and Use Cases• Demo

What is DynamoDB?

What is DynamoDB?

DynamoDB is a fully managed, NoSQL document and key-value data store.

What is NoSQL?

NoSQL is a term to describe data stores that trade full ACID compliance for high availability and scale.

A

C

I

D

solation

urability

onsistency

tomicity Single row/single item only

Eventual consistency

Dirty Read

Data replication on commodity storage

Why NoSQL?

• Dirty Reads?• Eventual Consistency?• Single row transactions only?• Why would anybody trade ACID compliance for this?

NoSQL – Availability and Scale

Traditional SQL NoSQL

DBPrimary Secondary

Scale Up

DB

DB

DBDB

DB DB

Scale Out

Scale Up vs Scale Out

Scale-Up

Scale-Out

Cost

Complexity

The CAP Theorem

Network partitions will happen in distributed systems:

DB

DBDB

DB DB

Consistency

Availability

Partition Tolerance

C A

P

CA

APCP

Why NoSQL?

• Horizontal Scaling allows for infinite scalability• Cheaper to scale out than to scale up• Full consistency or availability that can survive a network

partition• Full ACID compliance is often not needed

What is DynamoDB?

DynamoDB is a fully managed, NoSQL document and key-value data store.

What is a Managed Service?

• A managed service is a web service in which consumers of the service never need to interact directly with the underlying compute, storage, and network resources.

Why use a Managed Service?

DynamoDB is a Managed Service

• AWS runs all the database infrastructure for you!• All the benefits and none of the operational overhead of running a

distributed system:• Infinitely scalable read and write I/O• High availability within a region• Data durably stored in 3 availability zones• Cross-region replication• Easily export data to S3• Triggers using Lambda functions• Tight integration with Kinesis, Lambda, EMR, and Redshift• Pay only for what you use, when you need it

DynamoDB Fundamentals

DynamoDB TableTable

Items

Attributes

HashKey

RangeKeyMandatory

Key-value access patternDetermines data distribution

OptionalModel 1:N relationshipsEnables rich query capabilities

All items for a hash key==, <, >, >=, <=“begins with”“between”sorted resultscountstop/bottom N valuespaged responses

Data types

String (S)

Number (N)

Binary (B)

String Set (SS)

Number Set (NS)

Binary Set (BS)

Boolean (BOOL)

Null (NULL)

List (L)

Map (M)

Used for storing nested JSON documents

00 55 A954 AA FF

Hash tableHash key uniquely identifies an item

Hash key is used for building an unordered hash index

Table can be partitioned for scale

00 FF

Id = 1Name = Jim

Hash (1) = 7B

Id = 2Name = AndyDept = Engg

Hash (2) = 48

Id = 3Name = KimDept = Ops

Hash (3) = CD

Key Space

Partitions are three-way replicated

Id = 2Name = AndyDept = Engg

Id = 3Name = KimDept = Ops

Id = 1Name = Jim

Id = 2Name = AndyDept = Engg

Id = 3Name = KimDept = Ops

Id = 1Name = Jim

Id = 2Name = AndyDept = Engg

Id = 3Name = KimDept = Ops

Id = 1Name = Jim

Replica 1

Replica 2

Replica 3

Partition 1 Partition 2 Partition N

Hash-range table• Hash key and range key together uniquely identify an Item.• Within unordered hash index, data is sorted by the range key.• No limit on the number of items (∞) per hash key.

• Unless you have local secondary indexes

00:0 FF:∞

Hash (2) = 48

Customer# = 2Order# = 10Item = Pen

Customer# = 2Order# = 11Item = Shoes

Customer# = 1Order# = 10Item = Toy

Customer# = 1Order# = 11Item = Boots

Hash (1) = 7B

Customer# = 3Order# = 10Item = Book

Customer# = 3Order# = 11Item = Paper

Hash (3) = CD

55 A9:∞54:∞ AA

Partition 1 Partition 2 Partition 3

Local Secondary Index (LSI)

alternate range key + same hash keyindex and table data is co-located (same partition)

10 GB max per hash key, i.e. LSIs limit the # of range keys!

Global Secondary Index

any attribute indexed as new hash and/or range key

RCUs/WCUs provisioned separately for GSIs

Online indexing

LSI or GSI?

LSI can be modeled as a GSI

If data size in an item collection > 10 GB, use GSI

If eventual consistency is okay for your scenario, use GSI!

CreateTable

UpdateTable

DeleteTable

DescribeTable

ListTables

PutItem

UpdateItem

DeleteItem

BatchWriteItem

GetItem

Query

Scan

BatchGetItem

ListStreams

DescribeStream

GetShardIterator

GetRecords

Tabl

e A

PI

Item

AP

I

New

DynamoDB API

Stream API

DynamoDB Streams and AWS Lambda

Emerging Architecture Pattern

Throughput

Provisioned at the table level• Write capacity units (WCUs) are measured in 1 KB per second• Read capacity units (RCUs) are measured in 4 KB per second

• RCUs measure strongly consistent reads• Eventually consistent reads cost 1/2 of consistent reads

Read and write throughput limits are independent

WCURCU

Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500

RCUs per partition = 5000/3 = 1666.67WCUs per partition = 500/3 = 166.67Data/partition = 10/3 = 3.33 GBRCUs and WCUs are uniformly spread across partitions

# of partitions (IO capacity) = 5000/3000 RCU + 500/1000 WCU = 2.17

# of partitions (storage) = 8/10 GB = 0.8

# of partitions = ceiling(max(2.17, 0.8)) = 3

Typical Workloads and Use-Cases

DynamoDB table examplescase class CameraRecord( cameraId: Int, // hash key ownerId: Int, subscribers: Set[Int], hoursOfRecording: Int, ...)

case class Cuepoint( cameraId: Int, // hash key timestamp: Long, // range key type: String, ...)HashKey RangeKey Value

Key Segment 1234554343254

Key Segment1 1231231433235

Typical Workloads• Ad-tech• IoT• Gaming• Web Analytics• Mobile Applications• Large Scale Websites

…And much more!

Demo

Recommended