Upload
amazon-web-services
View
574
Download
3
Embed Size (px)
Citation preview
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rory Richardson, AWS Business Development
April 19, 2016
Getting Started with
Amazon DynamoDB
Agenda
• Brief history of data processing
• Relational (SQL) vs. nonrelational (NoSQL)
• DynamoDB tables, API, data types, indexes
• Scaling
• Graph and search capabilities
• Pricing and Free Tier
• Customer use cases
Timeline of database technology
Data volume since 2010
• 90% of stored data generated in
last 2 years
• 1 terabyte of data in 2010 equals
6.5 petabytes today
• Linear correlation between data
pressure and technical innovation
• No reason these trends will not
continue over time
Technology adoption and the hype curve
Relational (SQL) vs.
nonrelational (NoSQL)
Amazon’s Path to DynamoDB
RDBMSDynamoDB
Relational vs. nonrelational databases
Traditional SQL NoSQL
DB
Primary Secondary
Scale up
DB
DB
DBDB
DB DB
Scale out
Why NoSQL?
Optimized for storage Optimized for compute
Normalized/relational Denormalized/hierarchical
Ad hoc queries Instantiated views
Scale vertically Scale horizontally
Good for OLAP Built for OLTP at scale
SQL NoSQL
SQL vs. NoSQL schema design
NoSQL design optimizes for
compute instead of storage
SQL NoSQL
Evolution of databases
The Year of the Monkey
DynamoDB!
Amazon DynamoDB
Fully managed
Low cost
Predictable performance
Massively scalable
Highly available
Consistently low latency at scale
PREDICTABLE
PERFORMANCE!
High availability and durability
WRITESReplicated continuously to 3 AZs
Persisted to disk (custom SSD)
READSStrongly or eventually consistent
No latency trade-off
Designed to
support
99.99%of availability
Built for high
durability
How DynamoDB scales
partitions1 .. N
table
DynamoDB automatically partitions data
• Partition key spreads data (and workload) across
partitions
• Automatically partitions as data grows and throughput
needs increase
Large number of unique hash keys+
Uniform distribution of workloadacross hash keys
High-scale apps
Flexibility and low cost
Reads per
second
Writes per
second
table
• Customers can configure a table
for just a few RPS or for
hundreds of thousands of RPS
• Customers only pay for how
much they provision
• Provides maximum flexibility to
adjust expenditure based on the
workload
Fully managed service = automated operations
DB hosted on-premises DB hosted on Amazon EC2
Fully managed service = automated operations
DB hosted on premise DynamoDB
DynamoDB tables and indexes
DynamoDB table structureTable
Items
Attributes
Partitionkey
Sortkey
Mandatory
Key-value access pattern
Determines data distribution Optional
Model 1:N relationships
Enables rich query capabilities
All items for key==, <, >, >=, <=“begins with”“between”“contains”“in”sorted resultscountstop/bottom N values
00 55 A954 FFAA
Partition keys
Partition key uniquely identifies an item
Partition key is used for building an unordered hash index
Allows table to be partitioned for scale
Id = 1
Name = Jim
Hash (1) = 7B
Id = 2
Name = Andy
Dept = Eng
Hash (2) = 48
Id = 3
Name = Kim
Dept = Ops
Hash (3) = CD
Key Space
Partition:Sort keyPartition:Sort key uses two attributes together to uniquely identify an Item
Within unordered hash index, data is arranged by the sort key
No limit on the number of items (∞) per partition key• Except if you have local secondary indexes
00:0 FF:∞
Hash (2) = 48
Customer# = 2
Order# = 10
Item = Pen
Customer# = 2
Order# = 11
Item = Shoes
Customer# = 1
Order# = 10
Item = Toy
Customer# = 1
Order# = 11
Item = Boots
Hash (1) = 7B
Customer# = 3
Order# = 10
Item = Book
Customer# = 3
Order# = 11
Item = Paper
Hash (3) = CD
55 A9:∞54:∞ AA
Partition 1 Partition 2 Partition 3
Partitions are three-way replicated
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Replica 1
Replica 2
Replica 3
Partition 1 Partition 2 Partition N
Local secondary index (LSI)
Alternate sort key attribute
Index is local to a partition key
A1(partition)
A3(sort)
A2(item key)
A1(partition)
A2(sort)
A3 A4 A5
LSIs A1(partition)
A4(sort)
A2(item key)
A3(projected)
Table
KEYS_ONLY
INCLUDE A3
A1(partition)
A5(sort)
A2(item key)
A3(projected)
A4(projected)
ALL
10 GB maximum per
partition key; LSIs limit the
number of range keys!
Global secondary index (GSI)Alternate partition and/or sort key
Index is across all partition keys
A1(partition)
A2 A3 A4 A5
GSIs A5(partition)
A4(sort)
A1(item key)
A3(projected)
Table
INCLUDE A3
A4(partition)
A5(sort)
A1(item key)
A2(projected)
A3(projected) ALL
A2(partition)
A1(itemkey) KEYS_ONLY
Online indexing
Read capacity units
(RCUs) and write
capacity units (WCUs)
are provisioned
separately for GSIs
How do GSI updates work?
Table
Primary
tablePrimary
tablePrimary
tablePrimary
table
Global
secondary
index
Client
2. Asynchronous
update (in progress)
If GSIs don’t have enough write capacity, table writes will be throttled!
LSI or GSI?
LSI can be modeled as a GSI
If data size in an item collection > 10 GB, use GSI
If eventual consistency is okay for your scenario, use
GSI!
Scaling
Scaling
Throughput
• Provision any amount of throughput to a table
Size
• Add any number of items to a table
• Maximum item size is 400 KB
• LSIs limit the number of range keys due to 10 GB limit
Scaling is achieved through partitioning
Throughput
Provisioned at the table level
• Write capacity units (WCUs) are measured in 1 KB per second
• Read capacity units (RCUs) are measured in 4 KB per second
• RCUs measure strictly consistent reads
• Eventually consistent reads cost 1/2 of consistent reads
Read and write throughput limits are independent
WCURCU
Partitioning math
In the future, these details might change…
Number of partitions
By capacity (Total RCU / 3000) + (Total WCU / 1000)
By size Total Size / 10 GB
Total partitions CEILING(MAX (Capacity, Size))
Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500
RCUs per partition = 5000/3 = 1666.67WCUs per partition = 500/3 = 166.67Data/partition = 10/3 = 3.33 GB
RCUs and WCUs are uniformly
spread across partitions
Number of partitions
By capacity (5000 / 3000) + (500 / 1000) = 2.17
By size 8 / 10 = 0.8
Total partitions CEILING(MAX (2.17, 0.8)) = 3
To learn more, please attend:
Deep Dive on DynamoDB Room E450a, 11:45 a.m.–12:45 p.m.
Rick Houlihan, principal solutions architect
Integration capabilities
DynamoDB Triggers
Implemented as AWS
Lambda functions
Your code scales
automatically
Java, Node.js, and Python
DynamoDB Streams
Stream of table updates
Asynchronous
Exactly once
Strictly ordered
24-hr lifetime per item
Integration capabilities
• Amazon Elasticsearch Service
integration
• Full-text queries
Add search to mobile apps
Monitor IoT sensor status codes
App telemetry pattern discovery
using regular expressions
• Fine-grained access control by
using AWS Identity and Access
Management (IAM)
• Table-, item-, and attribute-
level access control
Connect to other AWS data stores
Customer use cases
Over 200 million usersOver 4 billion items stored
Millions of ads per month
Cross-device ad solutions
130+ million new users in 1 year
150+ million messages per month
Process requests in milliseconds High-performance ads
Statcast uses burst scalability
for many games on a single day
Flexibility for fast growth
Web clickstream insights
Specialty online and retail stores
Over 5 billion items
processed daily
About 200 million messages
processed daily
Cognitive training
Job-matching platform
5+ million registered users
Mobile game analytics
10M global users
Home security
Wearable and IoT
solutions
170,000 concurrent players
The Climate Corporation (TCC) scales with Amazon DynamoDB
The Climate Corporation is a San Francisco-based
company that examines weather data to help farmers
optimize their decision-making.
The elasticity of DynamoDB
read/write ops made
DynamoDB the fastest and
most efficient solution to
achieve our high ingest rate.Mohamed Ahmed
Director of Engineering,
Site Reliability Engineering and Data Analytics
The Climate Corporation
”
“ • Climate is digitizing agriculture, helping
farmers increase their yields and productivity
using scientific and mathematical models on
top of massive amounts of data.
• Weather and satellite imagery is one large
source of data used in TCC’s calculations.
• TCC uses DynamoDB to ingest a burst of
data and satellite images retrieved from third
parties before processing them.
• TCC goes from few read/write operations to
thousands each day to keep up with the
bursts of data written and read from it main
DynamoDB tables.
Thank you!
Mobile use case
Redfin is revolutionizing home buying and selling with DynamoDB
Redfin is a full-service real estate company with local
agents and online tools to help people buy and sell homes.
We have billions of records
on DynamoDB being
refreshed daily or hourly or
even by seconds.
Yong Huang
Director, Big Data Analytics, Redfin
”
“ • Redfin provides property and agent
details and ratings through its websites
and apps.
• With DynamoDB, latency for “similar”
properties improved from 2 seconds to
just 12 milliseconds.
• Redfin stores and processes 5 billion
items daily.
Gaming use case
Nexon delivers unparalleled mobile gaming with DynamoDB
Nexon is a leading South Korean video game developer
and a pioneer in the world of interactive entertainment.
By using AWS, we
decreased our initial
investment costs, and only
pay for what we use.
Chunghoon Ryu
Department Manager, Nexon
”
“ • Nexon used DynamoDB as its primary
game database for a new blockbuster
mobile game, HIT.
• HIT became the #1 mobile game in
Korea within the first day of launch and
has more than 2 million registered
users.
• Nexon’s HIT leverages DynamoDB to
deliver steady latency of less than 10
milliseconds to deliver a fantastic mobile
gaming experience for 170,000
concurrent players.
Analytics use case
Expedia’s real-time analytics application uses DynamoDB
Expedia is a leader in the $1 trillion travel industry, with an
extensive portfolio that includes some of the world’s most
trusted travel brands.
With DynamoDB, we were up
and running in a less than
day, and there is no need for
a team to maintain.
Kuldeep Chowhan
Engineering Manager, Expedia
”
“ • Expedia’s real-time analytics
application collects data for its “test
and learn” experiments on Expedia
sites.
• The analytics application processes
~200 million messages daily.
• Ease of setup, monitoring, and
scaling were key factors in
choosing DynamoDB.
Major League Baseball fields big data, excitement with DynamoDB
MLBAM (MLB Advanced Media) is a full-service solutions
provider, operating a powerful content delivery platform.
For the first time, we can
measure things we’ve never
been able to measure
before.
Joe Inzerillo
Executive Vice President and CTO, MLBAM
”
“ • MLBAM can scale to support many
games on a single day.
• DynamoDB powers queries and
supports the fast data retrieval required.
• MLBAM distributes 25,000 live events
annually and 10 million streams daily.
MediaTek delivers IoT databases to developers with DynamoDB
MediaTek designs and develops silicon wafers for
wireless communications and digital multimedia solutions.
We have been able to have
scalability, consistency, and
availability, which is really
helpful with our customers and
our internal stakeholders.
Marc Naddell
Vice President, MediaTek Labs
”
“ • MediaTek LinkIt One developers
can access their database by using
mobile devices or a desktop
browser.
• MediaTek uses DynamoDB as part
of its highly scalable, consistent,
and available cloud solution.
• MediaTek has expanded globally
and reduced development time by
50 percent with AWS.
Mapbox leverages DynamoDB for fast global map delivery
Mapbox provides building blocks that make it easy to
integrate location into any mobile or online application.
Our customers want very fast
maps. DynamoDB saves a
lot of time in the back end.
Ian Ward
Software Engineer, Mapbox
”
“ • Mapbox maps reach over 200
million users worldwide.
• With DynamoDB, Amazon
ElastiCache, and Amazon S3,
Mapbox cut end-user latency in
half.
• Mapbox found AWS was 85 percent
less expensive than alternatives.
• DynamoDB integrates with AWS
CloudTrail and AWS Identity and
Access Management (IAM).
Duolingo scales to store over 4 billion items using AWS
Duolingo is a free language learning service where
users help translate the web and rate translations.
Using AWS, we can handle
traffic spikes that expand up
to seven times the amount of
normal traffic.
Severin Hacker
CTO, Duolingo
”
“ • Duolingo stores data about each user to
be able to generate personalized
lessons.
• The MySQL database couldn’t keep up
with Duolingo’s rate of growth.
• By using the scalable database service,
data store capacity increased from 100
million to more than 4 billion items.
• Duolingo has the capacity to scale to
support over 8 million active users.
Peak innovates and scales quickly with DynamoDB
Peak is a London-based startup is focused on improving
cognitive skills for its millions of users each month.
The ability to grow from two
to four services or increase
read/write capacity of a
database without spending
more than five minutes on it
from an IT management
perspective is amazing.
Bertrand Lamarque
Director of Engineering, Peak
”
“ • Peak designs cognitive training
games.
• DynamoDB provides fast access to
data and massive scalability.
• Moving faster with fewer resources
is a significant advantage for Peak.
• DynamoDB enables the product
agility needed to keep enhancing
its product.
Tigerspike reduces latency, improves uptime with DynamoDB
Tigerspike is one of the foremost providers of personal
media technology consulting.
Since moving to AWS, we’ve
seen latency in some instances
drop by up to 300 milliseconds
and we've achieved 99.95
percent uptime.
Dean Jezard
Chief Technology Officer, Tigerspike
”
“ • Tigerspike won the Deloitte Technology
Fast 50 Australia award for 7
consecutive years.
• An alternative provider would cost up to
75 percent more than AWS.
• The AWS architecture supports
Tigerspike’s existing operating system
and language platforms.
DoApp minimizes costs, achieves scale with Amazon DynamoDB
DoApp is a mobile and web app development company
whose apps can be customized for organizations.
AWS provides Fortune 500
infrastructure, reliability, and
scalability at a reasonable
cost to a company with just
12 people.
Ryan Pendergast
Software Developer, DoApp
”
“ • DoApp provides more than 460 mobile
apps.
• Its most popular product is a news app
that can be customized for clients,
including Gannett.
• DoApp operates its own advertising
network to customers around the world.
• Amazon DynamoDB enables high
transaction rates at scale for its
advertising platform.
Myriad Group accelerates growth with DynamoDB
Myriad Group is a French-Swiss software company that
builds and markets software for mobile operators.
DynamoDB has allowed us
to grow in a way perhaps we
wouldn’t have been able to
before.
Bruce Jackson
Chief Technology Officer, Myriad Group
”
“ • Myriad Group develops software for
mobile operators.
• Myriad also offers a direct-to-consumer
social content sharing service called
Versy.
• Myriad grew its Versy user base from
38 million users to 170 million users in
about one year.
• Amazon DynamoDB allows Myriad to
focus on the business problem and not
worry about infrastructure.
Beatpacking chose DynamoDB over Cassandra and Hbase
The Beatpacking Company is a Seoul-based startup that
provides a free streaming radio service.
I think DynamoDB was the
best choice since we can
provision appropriate
throughput and control
pricing.
Minyoung Jeong
Chief Technology Officer,
The Beatpacking Company
”
“ • Beatpacking provides a free streaming
radio service called Beat that focuses
on Korean K-pop music.
• The Beat service has grown to over 2
million users within 7 months of its
launch.
• Beatpacking typically stores 10,000 to
20,000 events per second, and peaks of
100,000 events per second.
• DynamoDB allows Beatpacking to
easily provision throughput capacity
while the service is running.
JustGiving scales event storage for viral giving campaigns with DynamoDB
London-based JustGiving is one of the world’s largest
online social platforms for charitable fundraising.
When you look at concepts
such as #NoMakeupSelfie or
#IceBucketChallenge …
they’re extremely viral. …
To size up for those spikes
would have been impractical.
Richard Atkinson
Chief Information Officer, JustGiving
”
“ • JustGiving is one of the world’s largest
online social platforms for charitable
fundraising.
• The London-based organization’s 24
million users have helped raise $3.5
billion for over 13,000 causes.
• JustGiving uses DynamoDB to store
website clickstream events for its big-
data analytics platform.
• JustGiving can make near–real-time
improvements to its product based on
billions of annual site visits.
jobandtalent uses DynamoDB to power its new microservice
Europe-based jobandtalent is a job-matching platform that
uses proprietary algorithms to connect candidates with
openings.
We recently started to use
Amazon DynamoDB with one
of our microservices.
Teo Ruiz
Infrastructure and Platform Architect, jobandtalent
”
“ • More than 5 million users depend on
jobandtalent to connect skills with job
openings.
• jobandtalent uses a real-time
recommendation engine to match 4
million openings and 5 million
candidates.
• With DynamoDB, jobandtalent can
scale to 60,000 requests per minute
during peak traffic periods.
Infraware uses DynamoDB to improve Polaris Office for its 10M users
Infraware is a software development company that makes
Polaris Office for desktops and mobile devices.
We use Amazon RDS,
Amazon DynamoDB for
database, and Amazon EMR,
and Amazon Redshift for
data analysis.
Seongtaek Kim
Team Leader, Infraware
”
“ • Infraware delivers the Polaris Office
mobile app to more than 10 million
global users.
• Infraware saves time building
infrastructure by taking advantage of
AWS.
• Infraware now delivers much faster
service.
• Polaris Office is improving with events
stored in DynamoDB.
Unalis uses DynamoDB to store game events for its analytics platform
Headquartered in Taiwan, Unalis provides mobile gaming
content, apps, and gaming analytics for developers.
By using our UniCloud analytics
platform, developers can easily
obtain and understand campaign
tracking information as well as
consumption patterns, game
analytics, and product data.
Simpson Chou
Director, Game Development Department, Unalis
”
“ • Unalis provides mobile gaming content,
apps, and gaming analytics for
developers through UniCloud.
• UniCloud captures about 200
megabytes of raw data from users in
different countries daily.
• Tens of thousands of players are online
at peak periods.
• Unalis uses DynamoDB to store large
volumes of gaming data for deeper
analytics.
Canary scaled with DynamoDB to process 150M+ videos per day
Canary is a fast-growing New York startup that makes the
Canary in-home security system.
People have checked into our
app from 185 countries and
they’re using our devices in 140
countries, so there’s an
incredible amount of demand
for our services. We can scale
to meet that demand by
running on the AWS cloud.
Chris Rill
Cofounder and Chief Technology Officer, Canary
”
“ • The Canary in-home security system
consists of a wireless device about the
size of a tall coffee mug.
• The Canary system sends a notification
along with video directly to its owner’s
smartphone through an app.
• DynamoDB lets Canary handle 150+
million videos every day.
• Canary can innovate and execute much
faster with AWS.