51
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. August 11, 2016 Getting Started with Amazon DynamoDB Padma Malligarjunan, Sr. Technical Account Manager, AWS Enterprise Support Yekesa Kosuru, V.P Engineering, DataXu

Getting Started with Amazon DynamoDB

Embed Size (px)

Citation preview

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

August 11, 2016

Getting Started with

Amazon DynamoDB

Padma Malligarjunan, Sr. Technical Account Manager, AWS Enterprise Support

Yekesa Kosuru, V.P Engineering, DataXu

Agenda

• Brief history of data processing

• Relational (SQL) vs. non-relational (NoSQL)

• Fully managed features of DynamoDB

• Demo–serverless applications

• DataXu Use of DynamoDB

Data volume since 2010

• 90% of stored data generated in

last 2 years

• 1 terabyte of data in 2010 equals

6.5 petabytes today

• Linear correlation between data

pressure and technical innovation

• No reason these trends will not

continue over time

Timeline of database technologyD

ata

Pre

ssu

re

Technology adoption and the hype curve

Relational (SQL) vs.

non-relational (NoSQL)

Relational vs. non-relational databases

Traditional SQL NoSQL

DB

Primary Secondary

Scale up

DB

DB

DBDB

DB DB

Scale out

SQL vs. NoSQL schema design

NoSQL design optimizes for

compute instead of storage

Why NoSQL?

Optimized for storage Optimized for compute

Normalized/relational Denormalized/hierarchical

Ad-hoc queries Instantiated views

Scale vertically Scale horizontally

Good for OLAP Built for OLTP at scale

SQL NoSQL

Amazon DynamoDB

Run your business, not your database

Fully managed

Fast, consistent performance

Highly scalable

Flexible

Event-driven programming

Fine-grained access control

DynamoDB benefits

Fully managed service = automated operations

DB hosted on premises DB hosted on Amazon EC2

Fully managed service = automated operations

DB hosted on premises DynamoDB

Consistently low latency at scale

PREDICTABLE

PERFORMANCE!

WRITES

Replicated continuously to 3

Availability Zones

Persisted to disk (custom SSD)

READS

Strongly or eventually consistent

No latency trade-off

Designed to

support 99.99%

of availability

Built for high

durability

High availability and durability

Customer use cases

RDBMSDynamoDB

Amazon’s Path to DynamoDB

MLBAM (MLB Advanced Media) is a full service solutions

provider, operating a powerful content delivery platform.

For the first time, we can

measure things we’ve never

been able to measure

before.

Joe Inzerillo

Executive Vice President and CTO, MLBAM

“ • MLBAM can scale to support many games on a

single day.

• DynamoDB powers queries and supports the fast

data retrieval required.

• MLBAM distributes 25,000 live events annually and

10 million streams daily.

Major League Baseball fields big data,

excitement with Amazon DynamoDB

Redfin is a full-service real estate company with local

agents and online tools to help people buy & sell homes.

We have billions of records

on DynamoDB being

refreshed daily or hourly or

even by seconds.

Yong Huang

Director, Big Data Analytics, Redfin

“ • Redfin provides property and agent details and

ratings through its websites and apps.

• With DynamoDB, latency for “similar” properties

improved from 2 seconds to just 12 milliseconds.

• Redfin stores and processes five billion items in

DynamoDB.

Redfin is revolutionizing home buying and

selling with Amazon DynamoDB

Expedia is a leader in the $1 trillion travel industry, with an

extensive portfolio that includes some of the world’s most

trusted travel brands.

With DynamoDB we were up

and running in a less than

day, and there is no need for

a team to maintain.

Kuldeep Chowhan

Principal Engineer, Expedia

“ • Expedia’s real-time analytics application collects data

for its “test & learn” experiments on Expedia sites.

• The analytics application processes ~200 million

messages daily.

• Ease of setup, monitoring, and scaling were key

factors in choosing DynamoDB.

Expedia’s real-time analytics application uses

Amazon DynamoDB

Nexon is a leading South Korean video game developer

and a pioneer in the world of interactive entertainment.

By using AWS, we

decreased our initial

investment costs, and only

pay for what we use.

Chunghoon Ryu

Department Manager, Nexon

“ • Nexon used DynamoDB as its primary game

database for a new blockbuster mobile game,

HIT.

• HIT became the #1 Mobile Game in Korea

within the first day of launch and has > 2M

registered users.

• Nexon’s HIT leverages DynamoDB to deliver

steady latency of less than 10ms to deliver a

fantastic mobile gaming experience for

170,000 concurrent players.

Nexon scales mobile gaming with Amazon

DynamoDB

Ad Tech Gaming MobileIoT Web

Scaling high-velocity use cases with DynamoDB

That sounds really good. How

do I get started?

Let’s create a table…

Products

Product_Id

DynamoDB table structureTable

Items

Attributes

Partitionkey

Sortkey

Mandatory

Key-value access pattern

Determines data distribution Optional

Model 1:N relationships

Enables rich query capabilities

All items for key==, <, >, >=, <=“begins with”“between”“contains”“in”sorted resultscountstop/bottom N values

Local secondary index (LSI)

Alternate sort key attribute

Index is local to a partition key

A1(partition)

A3(sort)

A2(item key)

A1(partition)

A2(sort)

A3 A4 A5

LSIs A1(partition)

A4(sort)

A2(item key)

A3(projected)

Table

KEYS_ONLY

INCLUDE A3

A1(partition)

A5(sort)

A2(item key)

A3(projected)

A4(projected)

ALL

10 GB maximum per

partition key; LSIs limit the

number of range keys!

Global secondary index (GSI)Alternate partition and/or sort key

Index is across all partition keys

A1(partition)

A2 A3 A4 A5

GSIs A5(partition)

A4(sort)

A1(item key)

A3(projected)

Table

INCLUDE A3

A4(partition)

A5(sort)

A1(item key)

A2(projected)

A3(projected) ALL

A2(partition)

A1(itemkey) KEYS_ONLY

Online indexing

Read capacity units

(RCUs) and write

capacity units (WCUs)

are provisioned

separately for GSIs

How do GSI updates work?

Table

Primary

tablePrimary

tablePrimary

tablePrimary

table

Global

secondary

index

Client

2. Asynchronous

update (in progress)

If GSIs don’t have enough write capacity, table writes are throttled!

LSI or GSI?

LSI can be modeled as a GSI

If data size in an item collection > 10 GB, use GSI

If eventual consistency is okay for your scenario, use

GSI!

Advanced topics in DynamoDB

• Data modeling

• Understanding partitions

• # of partitions depends on table throughput and size

• DynamoDB scaling

• Design patterns and best practices

To learn more, please attend:

Deep Dive on Amazon DynamoDB 4:45 p.m.– 5:45 p.m.

Rick Houlihan, Principal Solutions Architect

Demo

Serverless web apps with

DynamoDB, Amazon API

Gateway, and AWS Lambda

Simple serverless web application–use case

Architecture of a simple serverless web

application

AWS Identity &

Access

ManagementDynamoDBAPI

Gateway

JavaScript

users

Amazon

S3 Bucket

internet

Lambda

Architecture of a simple serverless web

application

IAM DynamoDBAPI

Gateway

JavaScript

users

S3 Bucket

internet

Lambda

Architecture of a simple serverless web

application

IAM DynamoDBAPI

Gateway

JavaScript

users

S3 Bucket

internet

Lambda

Architecture of a simple serverless web

application

IAM DynamoDBAPI

Gateway

JavaScript

users

S3 Bucket

internet

Lambda

Architecture of a simple serverless web

application

IAM DynamoDBAPI

Gateway

JavaScript

users

S3 Bucket

internet

Lambda

Demo

• Free Tier

25 GB of storage

25 reads per second

25 writes per second

• Pricing for additional usage in US East (N. Virginia)

$0.25 per GB per month

Write throughput: $0.0065 per hour for every 10 units of Write Capacity

Read throughput: $0.0065 per hour for every 50 units of Read Capacity

DynamoDB pricing & Free Tier

Resources

Padma Malligarjunan | [email protected]

Getting Started Guide: https://aws.amazon.com/dynamodb/

Deep Dive on Amazon DynamoDB 4:45 p.m.– 5:45 p.m.

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Yekesa Kosuru, V.P. Engineering

Aug 11th 2016

DataXu Use of DynamoDB

DataXu

• Who

• A petabyte scale digital marketing platform

• Spun out of MIT Labs

• One of the fastest growing companies in Inc. 5000

• What

• To help world’s most valuable brands understand and engage

with their customer

• Attribution: identifying set of

user actions that contributed to

a outcome and attributing

value to prior impressions

• Production workload hosted in

AWS

• Leverages multiple AWS

Services

• Processes billions of events

• Dynamo DB is used as Key

Value store

Attribution Use Case

Meta

S3EMR

Job

Cloud

Watch

Dynamo

DBData

Pipeline

Kinesis EC2

Instances

3rd

Party

Cloud

Front

Technology Stack

Event Chains

E AI E I A(time)

• Users generate millions of different types of events / second while using the

internet (e.g. impressions, company interactions, clicks, video events, online

purchases)

• All of these events arrive into the platform via different mechanisms (e.g. log

files and streaming data), and need to be correlated, based on time and by

user, to recreate a user’s interaction on the internet.

E

I

Event

E

Impression

A Attribution

DynamoDB Scaling

Avg. Read Capacity Avg. Write Capacity

Provisioned Throughput 60,000 160,000

Storage: 25TB

Average record size: 1-4KB

Partition Key: User

Sort Key: Timestamp

Fully Managed Benefits of Dynamo DB:

• Consistent read/write performance up to provisioned throughput

• Flexible elasticity, never need to overprovision

• Low operational overhead

• Fast & predictable reads with low latencies - millisecs

DataXu Optimizations

• Combined Reads and Writes with LZO compression

• Combined multiple rows that share the same hash key to the

same row (3X less puts)

• Table rotation to match attribution windows

• Drop entire table when it is no longer necessary

ThankYou

Yekesa Kosuru

[email protected]

www.dataxu.com

Remember to complete

your evaluations!

Thank you!