58
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rory Richardson, AWS Business Development April 19, 2016 Getting Started with Amazon DynamoDB

Getting Started with Amazon DynamoDB

Embed Size (px)

Citation preview

Page 1: Getting Started with Amazon DynamoDB

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Rory Richardson, AWS Business Development

April 19, 2016

Getting Started with

Amazon DynamoDB

Page 2: Getting Started with Amazon DynamoDB

Agenda

• Brief history of data processing

• Relational (SQL) vs. nonrelational (NoSQL)

• DynamoDB tables, API, data types, indexes

• Scaling

• Graph and search capabilities

• Pricing and Free Tier

• Customer use cases

Page 3: Getting Started with Amazon DynamoDB

Timeline of database technology

Page 4: Getting Started with Amazon DynamoDB

Data volume since 2010

• 90% of stored data generated in

last 2 years

• 1 terabyte of data in 2010 equals

6.5 petabytes today

• Linear correlation between data

pressure and technical innovation

• No reason these trends will not

continue over time

Page 5: Getting Started with Amazon DynamoDB

Technology adoption and the hype curve

Page 6: Getting Started with Amazon DynamoDB

Relational (SQL) vs.

nonrelational (NoSQL)

Page 7: Getting Started with Amazon DynamoDB

Amazon’s Path to DynamoDB

RDBMSDynamoDB

Page 8: Getting Started with Amazon DynamoDB

Relational vs. nonrelational databases

Traditional SQL NoSQL

DB

Primary Secondary

Scale up

DB

DB

DBDB

DB DB

Scale out

Page 9: Getting Started with Amazon DynamoDB

Why NoSQL?

Optimized for storage Optimized for compute

Normalized/relational Denormalized/hierarchical

Ad hoc queries Instantiated views

Scale vertically Scale horizontally

Good for OLAP Built for OLTP at scale

SQL NoSQL

Page 10: Getting Started with Amazon DynamoDB

SQL vs. NoSQL schema design

NoSQL design optimizes for

compute instead of storage

Page 11: Getting Started with Amazon DynamoDB

SQL NoSQL

Evolution of databases

Page 12: Getting Started with Amazon DynamoDB

The Year of the Monkey

DynamoDB!

Page 13: Getting Started with Amazon DynamoDB

Amazon DynamoDB

Fully managed

Low cost

Predictable performance

Massively scalable

Highly available

Page 14: Getting Started with Amazon DynamoDB

Consistently low latency at scale

PREDICTABLE

PERFORMANCE!

Page 15: Getting Started with Amazon DynamoDB

High availability and durability

WRITESReplicated continuously to 3 AZs

Persisted to disk (custom SSD)

READSStrongly or eventually consistent

No latency trade-off

Designed to

support

99.99%of availability

Built for high

durability

Page 16: Getting Started with Amazon DynamoDB

How DynamoDB scales

partitions1 .. N

table

DynamoDB automatically partitions data

• Partition key spreads data (and workload) across

partitions

• Automatically partitions as data grows and throughput

needs increase

Large number of unique hash keys+

Uniform distribution of workloadacross hash keys

High-scale apps

Page 17: Getting Started with Amazon DynamoDB

Flexibility and low cost

Reads per

second

Writes per

second

table

• Customers can configure a table

for just a few RPS or for

hundreds of thousands of RPS

• Customers only pay for how

much they provision

• Provides maximum flexibility to

adjust expenditure based on the

workload

Page 18: Getting Started with Amazon DynamoDB

Fully managed service = automated operations

DB hosted on-premises DB hosted on Amazon EC2

Page 19: Getting Started with Amazon DynamoDB

Fully managed service = automated operations

DB hosted on premise DynamoDB

Page 20: Getting Started with Amazon DynamoDB

DynamoDB tables and indexes

Page 21: Getting Started with Amazon DynamoDB

DynamoDB table structureTable

Items

Attributes

Partitionkey

Sortkey

Mandatory

Key-value access pattern

Determines data distribution Optional

Model 1:N relationships

Enables rich query capabilities

All items for key==, <, >, >=, <=“begins with”“between”“contains”“in”sorted resultscountstop/bottom N values

Page 22: Getting Started with Amazon DynamoDB

00 55 A954 FFAA

Partition keys

Partition key uniquely identifies an item

Partition key is used for building an unordered hash index

Allows table to be partitioned for scale

Id = 1

Name = Jim

Hash (1) = 7B

Id = 2

Name = Andy

Dept = Eng

Hash (2) = 48

Id = 3

Name = Kim

Dept = Ops

Hash (3) = CD

Key Space

Page 23: Getting Started with Amazon DynamoDB

Partition:Sort keyPartition:Sort key uses two attributes together to uniquely identify an Item

Within unordered hash index, data is arranged by the sort key

No limit on the number of items (∞) per partition key• Except if you have local secondary indexes

00:0 FF:∞

Hash (2) = 48

Customer# = 2

Order# = 10

Item = Pen

Customer# = 2

Order# = 11

Item = Shoes

Customer# = 1

Order# = 10

Item = Toy

Customer# = 1

Order# = 11

Item = Boots

Hash (1) = 7B

Customer# = 3

Order# = 10

Item = Book

Customer# = 3

Order# = 11

Item = Paper

Hash (3) = CD

55 A9:∞54:∞ AA

Partition 1 Partition 2 Partition 3

Page 24: Getting Started with Amazon DynamoDB

Partitions are three-way replicated

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Replica 1

Replica 2

Replica 3

Partition 1 Partition 2 Partition N

Page 25: Getting Started with Amazon DynamoDB

Local secondary index (LSI)

Alternate sort key attribute

Index is local to a partition key

A1(partition)

A3(sort)

A2(item key)

A1(partition)

A2(sort)

A3 A4 A5

LSIs A1(partition)

A4(sort)

A2(item key)

A3(projected)

Table

KEYS_ONLY

INCLUDE A3

A1(partition)

A5(sort)

A2(item key)

A3(projected)

A4(projected)

ALL

10 GB maximum per

partition key; LSIs limit the

number of range keys!

Page 26: Getting Started with Amazon DynamoDB

Global secondary index (GSI)Alternate partition and/or sort key

Index is across all partition keys

A1(partition)

A2 A3 A4 A5

GSIs A5(partition)

A4(sort)

A1(item key)

A3(projected)

Table

INCLUDE A3

A4(partition)

A5(sort)

A1(item key)

A2(projected)

A3(projected) ALL

A2(partition)

A1(itemkey) KEYS_ONLY

Online indexing

Read capacity units

(RCUs) and write

capacity units (WCUs)

are provisioned

separately for GSIs

Page 27: Getting Started with Amazon DynamoDB

How do GSI updates work?

Table

Primary

tablePrimary

tablePrimary

tablePrimary

table

Global

secondary

index

Client

2. Asynchronous

update (in progress)

If GSIs don’t have enough write capacity, table writes will be throttled!

Page 28: Getting Started with Amazon DynamoDB

LSI or GSI?

LSI can be modeled as a GSI

If data size in an item collection > 10 GB, use GSI

If eventual consistency is okay for your scenario, use

GSI!

Page 29: Getting Started with Amazon DynamoDB

Scaling

Page 30: Getting Started with Amazon DynamoDB

Scaling

Throughput

• Provision any amount of throughput to a table

Size

• Add any number of items to a table

• Maximum item size is 400 KB

• LSIs limit the number of range keys due to 10 GB limit

Scaling is achieved through partitioning

Page 31: Getting Started with Amazon DynamoDB

Throughput

Provisioned at the table level

• Write capacity units (WCUs) are measured in 1 KB per second

• Read capacity units (RCUs) are measured in 4 KB per second

• RCUs measure strictly consistent reads

• Eventually consistent reads cost 1/2 of consistent reads

Read and write throughput limits are independent

WCURCU

Page 32: Getting Started with Amazon DynamoDB

Partitioning math

In the future, these details might change…

Number of partitions

By capacity (Total RCU / 3000) + (Total WCU / 1000)

By size Total Size / 10 GB

Total partitions CEILING(MAX (Capacity, Size))

Page 33: Getting Started with Amazon DynamoDB

Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500

RCUs per partition = 5000/3 = 1666.67WCUs per partition = 500/3 = 166.67Data/partition = 10/3 = 3.33 GB

RCUs and WCUs are uniformly

spread across partitions

Number of partitions

By capacity (5000 / 3000) + (500 / 1000) = 2.17

By size 8 / 10 = 0.8

Total partitions CEILING(MAX (2.17, 0.8)) = 3

Page 34: Getting Started with Amazon DynamoDB

To learn more, please attend:

Deep Dive on DynamoDB Room E450a, 11:45 a.m.–12:45 p.m.

Rick Houlihan, principal solutions architect

Page 35: Getting Started with Amazon DynamoDB

Integration capabilities

DynamoDB Triggers

Implemented as AWS

Lambda functions

Your code scales

automatically

Java, Node.js, and Python

DynamoDB Streams

Stream of table updates

Asynchronous

Exactly once

Strictly ordered

24-hr lifetime per item

Page 36: Getting Started with Amazon DynamoDB

Integration capabilities

• Amazon Elasticsearch Service

integration

• Full-text queries

Add search to mobile apps

Monitor IoT sensor status codes

App telemetry pattern discovery

using regular expressions

• Fine-grained access control by

using AWS Identity and Access

Management (IAM)

• Table-, item-, and attribute-

level access control

Page 37: Getting Started with Amazon DynamoDB

Connect to other AWS data stores

Page 38: Getting Started with Amazon DynamoDB

Customer use cases

Page 39: Getting Started with Amazon DynamoDB

Over 200 million usersOver 4 billion items stored

Millions of ads per month

Cross-device ad solutions

130+ million new users in 1 year

150+ million messages per month

Process requests in milliseconds High-performance ads

Statcast uses burst scalability

for many games on a single day

Flexibility for fast growth

Web clickstream insights

Specialty online and retail stores

Over 5 billion items

processed daily

About 200 million messages

processed daily

Cognitive training

Job-matching platform

5+ million registered users

Mobile game analytics

10M global users

Home security

Wearable and IoT

solutions

170,000 concurrent players

Page 40: Getting Started with Amazon DynamoDB

The Climate Corporation (TCC) scales with Amazon DynamoDB

The Climate Corporation is a San Francisco-based

company that examines weather data to help farmers

optimize their decision-making.

The elasticity of DynamoDB

read/write ops made

DynamoDB the fastest and

most efficient solution to

achieve our high ingest rate.Mohamed Ahmed

Director of Engineering,

Site Reliability Engineering and Data Analytics

The Climate Corporation

“ • Climate is digitizing agriculture, helping

farmers increase their yields and productivity

using scientific and mathematical models on

top of massive amounts of data.

• Weather and satellite imagery is one large

source of data used in TCC’s calculations.

• TCC uses DynamoDB to ingest a burst of

data and satellite images retrieved from third

parties before processing them.

• TCC goes from few read/write operations to

thousands each day to keep up with the

bursts of data written and read from it main

DynamoDB tables.

Page 41: Getting Started with Amazon DynamoDB

Thank you!

Page 42: Getting Started with Amazon DynamoDB

Mobile use case

Redfin is revolutionizing home buying and selling with DynamoDB

Redfin is a full-service real estate company with local

agents and online tools to help people buy and sell homes.

We have billions of records

on DynamoDB being

refreshed daily or hourly or

even by seconds.

Yong Huang

Director, Big Data Analytics, Redfin

“ • Redfin provides property and agent

details and ratings through its websites

and apps.

• With DynamoDB, latency for “similar”

properties improved from 2 seconds to

just 12 milliseconds.

• Redfin stores and processes 5 billion

items daily.

Page 43: Getting Started with Amazon DynamoDB

Gaming use case

Nexon delivers unparalleled mobile gaming with DynamoDB

Nexon is a leading South Korean video game developer

and a pioneer in the world of interactive entertainment.

By using AWS, we

decreased our initial

investment costs, and only

pay for what we use.

Chunghoon Ryu

Department Manager, Nexon

“ • Nexon used DynamoDB as its primary

game database for a new blockbuster

mobile game, HIT.

• HIT became the #1 mobile game in

Korea within the first day of launch and

has more than 2 million registered

users.

• Nexon’s HIT leverages DynamoDB to

deliver steady latency of less than 10

milliseconds to deliver a fantastic mobile

gaming experience for 170,000

concurrent players.

Page 44: Getting Started with Amazon DynamoDB

Analytics use case

Expedia’s real-time analytics application uses DynamoDB

Expedia is a leader in the $1 trillion travel industry, with an

extensive portfolio that includes some of the world’s most

trusted travel brands.

With DynamoDB, we were up

and running in a less than

day, and there is no need for

a team to maintain.

Kuldeep Chowhan

Engineering Manager, Expedia

“ • Expedia’s real-time analytics

application collects data for its “test

and learn” experiments on Expedia

sites.

• The analytics application processes

~200 million messages daily.

• Ease of setup, monitoring, and

scaling were key factors in

choosing DynamoDB.

Page 45: Getting Started with Amazon DynamoDB

Major League Baseball fields big data, excitement with DynamoDB

MLBAM (MLB Advanced Media) is a full-service solutions

provider, operating a powerful content delivery platform.

For the first time, we can

measure things we’ve never

been able to measure

before.

Joe Inzerillo

Executive Vice President and CTO, MLBAM

“ • MLBAM can scale to support many

games on a single day.

• DynamoDB powers queries and

supports the fast data retrieval required.

• MLBAM distributes 25,000 live events

annually and 10 million streams daily.

Page 46: Getting Started with Amazon DynamoDB

MediaTek delivers IoT databases to developers with DynamoDB

MediaTek designs and develops silicon wafers for

wireless communications and digital multimedia solutions.

We have been able to have

scalability, consistency, and

availability, which is really

helpful with our customers and

our internal stakeholders.

Marc Naddell

Vice President, MediaTek Labs

“ • MediaTek LinkIt One developers

can access their database by using

mobile devices or a desktop

browser.

• MediaTek uses DynamoDB as part

of its highly scalable, consistent,

and available cloud solution.

• MediaTek has expanded globally

and reduced development time by

50 percent with AWS.

Page 47: Getting Started with Amazon DynamoDB

Mapbox leverages DynamoDB for fast global map delivery

Mapbox provides building blocks that make it easy to

integrate location into any mobile or online application.

Our customers want very fast

maps. DynamoDB saves a

lot of time in the back end.

Ian Ward

Software Engineer, Mapbox

“ • Mapbox maps reach over 200

million users worldwide.

• With DynamoDB, Amazon

ElastiCache, and Amazon S3,

Mapbox cut end-user latency in

half.

• Mapbox found AWS was 85 percent

less expensive than alternatives.

• DynamoDB integrates with AWS

CloudTrail and AWS Identity and

Access Management (IAM).

Page 48: Getting Started with Amazon DynamoDB

Duolingo scales to store over 4 billion items using AWS

Duolingo is a free language learning service where

users help translate the web and rate translations.

Using AWS, we can handle

traffic spikes that expand up

to seven times the amount of

normal traffic.

Severin Hacker

CTO, Duolingo

“ • Duolingo stores data about each user to

be able to generate personalized

lessons.

• The MySQL database couldn’t keep up

with Duolingo’s rate of growth.

• By using the scalable database service,

data store capacity increased from 100

million to more than 4 billion items.

• Duolingo has the capacity to scale to

support over 8 million active users.

Page 49: Getting Started with Amazon DynamoDB

Peak innovates and scales quickly with DynamoDB

Peak is a London-based startup is focused on improving

cognitive skills for its millions of users each month.

The ability to grow from two

to four services or increase

read/write capacity of a

database without spending

more than five minutes on it

from an IT management

perspective is amazing.

Bertrand Lamarque

Director of Engineering, Peak

“ • Peak designs cognitive training

games.

• DynamoDB provides fast access to

data and massive scalability.

• Moving faster with fewer resources

is a significant advantage for Peak.

• DynamoDB enables the product

agility needed to keep enhancing

its product.

Page 50: Getting Started with Amazon DynamoDB

Tigerspike reduces latency, improves uptime with DynamoDB

Tigerspike is one of the foremost providers of personal

media technology consulting.

Since moving to AWS, we’ve

seen latency in some instances

drop by up to 300 milliseconds

and we've achieved 99.95

percent uptime.

Dean Jezard

Chief Technology Officer, Tigerspike

“ • Tigerspike won the Deloitte Technology

Fast 50 Australia award for 7

consecutive years.

• An alternative provider would cost up to

75 percent more than AWS.

• The AWS architecture supports

Tigerspike’s existing operating system

and language platforms.

Page 51: Getting Started with Amazon DynamoDB

DoApp minimizes costs, achieves scale with Amazon DynamoDB

DoApp is a mobile and web app development company

whose apps can be customized for organizations.

AWS provides Fortune 500

infrastructure, reliability, and

scalability at a reasonable

cost to a company with just

12 people.

Ryan Pendergast

Software Developer, DoApp

“ • DoApp provides more than 460 mobile

apps.

• Its most popular product is a news app

that can be customized for clients,

including Gannett.

• DoApp operates its own advertising

network to customers around the world.

• Amazon DynamoDB enables high

transaction rates at scale for its

advertising platform.

Page 52: Getting Started with Amazon DynamoDB

Myriad Group accelerates growth with DynamoDB

Myriad Group is a French-Swiss software company that

builds and markets software for mobile operators.

DynamoDB has allowed us

to grow in a way perhaps we

wouldn’t have been able to

before.

Bruce Jackson

Chief Technology Officer, Myriad Group

“ • Myriad Group develops software for

mobile operators.

• Myriad also offers a direct-to-consumer

social content sharing service called

Versy.

• Myriad grew its Versy user base from

38 million users to 170 million users in

about one year.

• Amazon DynamoDB allows Myriad to

focus on the business problem and not

worry about infrastructure.

Page 53: Getting Started with Amazon DynamoDB

Beatpacking chose DynamoDB over Cassandra and Hbase

The Beatpacking Company is a Seoul-based startup that

provides a free streaming radio service.

I think DynamoDB was the

best choice since we can

provision appropriate

throughput and control

pricing.

Minyoung Jeong

Chief Technology Officer,

The Beatpacking Company

“ • Beatpacking provides a free streaming

radio service called Beat that focuses

on Korean K-pop music.

• The Beat service has grown to over 2

million users within 7 months of its

launch.

• Beatpacking typically stores 10,000 to

20,000 events per second, and peaks of

100,000 events per second.

• DynamoDB allows Beatpacking to

easily provision throughput capacity

while the service is running.

Page 54: Getting Started with Amazon DynamoDB

JustGiving scales event storage for viral giving campaigns with DynamoDB

London-based JustGiving is one of the world’s largest

online social platforms for charitable fundraising.

When you look at concepts

such as #NoMakeupSelfie or

#IceBucketChallenge …

they’re extremely viral. …

To size up for those spikes

would have been impractical.

Richard Atkinson

Chief Information Officer, JustGiving

“ • JustGiving is one of the world’s largest

online social platforms for charitable

fundraising.

• The London-based organization’s 24

million users have helped raise $3.5

billion for over 13,000 causes.

• JustGiving uses DynamoDB to store

website clickstream events for its big-

data analytics platform.

• JustGiving can make near–real-time

improvements to its product based on

billions of annual site visits.

Page 55: Getting Started with Amazon DynamoDB

jobandtalent uses DynamoDB to power its new microservice

Europe-based jobandtalent is a job-matching platform that

uses proprietary algorithms to connect candidates with

openings.

We recently started to use

Amazon DynamoDB with one

of our microservices.

Teo Ruiz

Infrastructure and Platform Architect, jobandtalent

“ • More than 5 million users depend on

jobandtalent to connect skills with job

openings.

• jobandtalent uses a real-time

recommendation engine to match 4

million openings and 5 million

candidates.

• With DynamoDB, jobandtalent can

scale to 60,000 requests per minute

during peak traffic periods.

Page 56: Getting Started with Amazon DynamoDB

Infraware uses DynamoDB to improve Polaris Office for its 10M users

Infraware is a software development company that makes

Polaris Office for desktops and mobile devices.

We use Amazon RDS,

Amazon DynamoDB for

database, and Amazon EMR,

and Amazon Redshift for

data analysis.

Seongtaek Kim

Team Leader, Infraware

“ • Infraware delivers the Polaris Office

mobile app to more than 10 million

global users.

• Infraware saves time building

infrastructure by taking advantage of

AWS.

• Infraware now delivers much faster

service.

• Polaris Office is improving with events

stored in DynamoDB.

Page 57: Getting Started with Amazon DynamoDB

Unalis uses DynamoDB to store game events for its analytics platform

Headquartered in Taiwan, Unalis provides mobile gaming

content, apps, and gaming analytics for developers.

By using our UniCloud analytics

platform, developers can easily

obtain and understand campaign

tracking information as well as

consumption patterns, game

analytics, and product data.

Simpson Chou

Director, Game Development Department, Unalis

“ • Unalis provides mobile gaming content,

apps, and gaming analytics for

developers through UniCloud.

• UniCloud captures about 200

megabytes of raw data from users in

different countries daily.

• Tens of thousands of players are online

at peak periods.

• Unalis uses DynamoDB to store large

volumes of gaming data for deeper

analytics.

Page 58: Getting Started with Amazon DynamoDB

Canary scaled with DynamoDB to process 150M+ videos per day

Canary is a fast-growing New York startup that makes the

Canary in-home security system.

People have checked into our

app from 185 countries and

they’re using our devices in 140

countries, so there’s an

incredible amount of demand

for our services. We can scale

to meet that demand by

running on the AWS cloud.

Chris Rill

Cofounder and Chief Technology Officer, Canary

“ • The Canary in-home security system

consists of a wireless device about the

size of a tall coffee mug.

• The Canary system sends a notification

along with video directly to its owner’s

smartphone through an app.

• DynamoDB lets Canary handle 150+

million videos every day.

• Canary can innovate and execute much

faster with AWS.