© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Nate Slater, AWS Solutions Architect
July 30, 2015
Introduction to DynamoDB
Agenda
• What is DynamoDB?• DynamoDB Fundamentals• Typical Workloads and Use Cases• Demo
What is DynamoDB?
What is DynamoDB?
DynamoDB is a fully managed, NoSQL document and key-value data store.
What is NoSQL?
NoSQL is a term to describe data stores that trade full ACID compliance for high availability and scale.
A
C
I
D
solation
urability
onsistency
tomicity Single row/single item only
Eventual consistency
Dirty Read
Data replication on commodity storage
Why NoSQL?
• Dirty Reads?• Eventual Consistency?• Single row transactions only?• Why would anybody trade ACID compliance for this?
NoSQL – Availability and Scale
Traditional SQL NoSQL
DBPrimary Secondary
Scale Up
DB
DB
DBDB
DB DB
Scale Out
Scale Up vs Scale Out
Scale-Up
Scale-Out
Cost
Complexity
The CAP Theorem
Network partitions will happen in distributed systems:
DB
DBDB
DB DB
Consistency
Availability
Partition Tolerance
C A
P
CA
APCP
Why NoSQL?
• Horizontal Scaling allows for infinite scalability• Cheaper to scale out than to scale up• Full consistency or availability that can survive a network
partition• Full ACID compliance is often not needed
What is DynamoDB?
DynamoDB is a fully managed, NoSQL document and key-value data store.
What is a Managed Service?
• A managed service is a web service in which consumers of the service never need to interact directly with the underlying compute, storage, and network resources.
Why use a Managed Service?
DynamoDB is a Managed Service
• AWS runs all the database infrastructure for you!• All the benefits and none of the operational overhead of running a
distributed system:• Infinitely scalable read and write I/O• High availability within a region• Data durably stored in 3 availability zones• Cross-region replication• Easily export data to S3• Triggers using Lambda functions• Tight integration with Kinesis, Lambda, EMR, and Redshift• Pay only for what you use, when you need it
DynamoDB Fundamentals
DynamoDB TableTable
Items
Attributes
HashKey
RangeKeyMandatory
Key-value access patternDetermines data distribution
OptionalModel 1:N relationshipsEnables rich query capabilities
All items for a hash key==, <, >, >=, <=“begins with”“between”sorted resultscountstop/bottom N valuespaged responses
Data types
String (S)
Number (N)
Binary (B)
String Set (SS)
Number Set (NS)
Binary Set (BS)
Boolean (BOOL)
Null (NULL)
List (L)
Map (M)
Used for storing nested JSON documents
00 55 A954 AA FF
Hash tableHash key uniquely identifies an item
Hash key is used for building an unordered hash index
Table can be partitioned for scale
00 FF
Id = 1Name = Jim
Hash (1) = 7B
Id = 2Name = AndyDept = Engg
Hash (2) = 48
Id = 3Name = KimDept = Ops
Hash (3) = CD
Key Space
Partitions are three-way replicated
Id = 2Name = AndyDept = Engg
Id = 3Name = KimDept = Ops
Id = 1Name = Jim
Id = 2Name = AndyDept = Engg
Id = 3Name = KimDept = Ops
Id = 1Name = Jim
Id = 2Name = AndyDept = Engg
Id = 3Name = KimDept = Ops
Id = 1Name = Jim
Replica 1
Replica 2
Replica 3
Partition 1 Partition 2 Partition N
Hash-range table• Hash key and range key together uniquely identify an Item.• Within unordered hash index, data is sorted by the range key.• No limit on the number of items (∞) per hash key.
• Unless you have local secondary indexes
00:0 FF:∞
Hash (2) = 48
Customer# = 2Order# = 10Item = Pen
Customer# = 2Order# = 11Item = Shoes
Customer# = 1Order# = 10Item = Toy
Customer# = 1Order# = 11Item = Boots
Hash (1) = 7B
Customer# = 3Order# = 10Item = Book
Customer# = 3Order# = 11Item = Paper
Hash (3) = CD
55 A9:∞54:∞ AA
Partition 1 Partition 2 Partition 3
Local Secondary Index (LSI)
alternate range key + same hash keyindex and table data is co-located (same partition)
10 GB max per hash key, i.e. LSIs limit the # of range keys!
Global Secondary Index
any attribute indexed as new hash and/or range key
RCUs/WCUs provisioned separately for GSIs
Online indexing
LSI or GSI?
LSI can be modeled as a GSI
If data size in an item collection > 10 GB, use GSI
If eventual consistency is okay for your scenario, use GSI!
CreateTable
UpdateTable
DeleteTable
DescribeTable
ListTables
PutItem
UpdateItem
DeleteItem
BatchWriteItem
GetItem
Query
Scan
BatchGetItem
ListStreams
DescribeStream
GetShardIterator
GetRecords
Tabl
e A
PI
Item
AP
I
New
DynamoDB API
Stream API
DynamoDB Streams and AWS Lambda
Emerging Architecture Pattern
Throughput
Provisioned at the table level• Write capacity units (WCUs) are measured in 1 KB per second• Read capacity units (RCUs) are measured in 4 KB per second
• RCUs measure strongly consistent reads• Eventually consistent reads cost 1/2 of consistent reads
Read and write throughput limits are independent
WCURCU
Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500
RCUs per partition = 5000/3 = 1666.67WCUs per partition = 500/3 = 166.67Data/partition = 10/3 = 3.33 GBRCUs and WCUs are uniformly spread across partitions
# of partitions (IO capacity) = 5000/3000 RCU + 500/1000 WCU = 2.17
# of partitions (storage) = 8/10 GB = 0.8
# of partitions = ceiling(max(2.17, 0.8)) = 3
Typical Workloads and Use-Cases
DynamoDB table examplescase class CameraRecord( cameraId: Int, // hash key ownerId: Int, subscribers: Set[Int], hoursOfRecording: Int, ...)
case class Cuepoint( cameraId: Int, // hash key timestamp: Long, // range key type: String, ...)HashKey RangeKey Value
Key Segment 1234554343254
Key Segment1 1231231433235
Typical Workloads• Ad-tech• IoT• Gaming• Web Analytics• Mobile Applications• Large Scale Websites
…And much more!
Demo