Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Build High-Scale Applications
with Amazon DynamoDB
David Pearson Business Development AWS Database Services
Databases in the Cloud data tier workload data store planning prioritization selection
Traditional Database Architecture
App/Web Tier
Client Tier
RDBMS
• key-value access
• complex queries
• transactions
• analytics
One Database for All Workloads
App/Web Tier
Client Tier
RDBMS
Cloud Data Tier Architecture
App/Web Tier
Client Tier
Data Tier
Search Cache Blob Store
RDBMS NoSQL Data Warehouse
Workload Driven Data Store Selection
Data Tier
Search Cache Blob Store
RDBMS NoSQL Data Warehouse
logging analytics
key/value simple query
rich search
transaction processing
hot reads
AWS Services for the Data Tier
Data Tier
Amazon DynamoDB
Amazon RDS
Amazon ElastiCache
Amazon S3
Amazon Redshift
Amazon CloudSearch
logging analytics
key/value simple query
rich search
transaction processing
hot reads
Simple Guide to Database Selection
Predominant Requirement Recommendation
Seamless scale and super availability Amazon DynamoDB
Complex query workloads and need relational capabilities
Amazon RDS
Caching Amazon ElastiCache
Deep analytics Amazon Redshift
Cases where these services are not the right fit
Build your own on EC2!
DynamoDB Fundamentals
original tier one use case applications
RDBMS = Default Choice
• Amazon.com page composed of responses from 1000’s of independent services
• Query patterns for different service are different Catalog service is usually heavy key-value
Ordering service is very write intensive (key-value)
Catalog search has a different pattern for querying
Relational Era @ Amazon.com
RDBMS
Poor Availability Limited Scalability High Cost
Dynamo = NoSQL Technology
• Replicated DHT with consistency management
• Consistent hashing
• Optimistic replication
• “Sloppy quorum”
• Anti-entropy mechanisms
• Object versioning
Distributed Era @ Amazon.com
lack of strong every engineer needs to operational consistency learn distributed systems complexity
DynamoDB = NoSQL Cloud Service
Cloud Era @ Amazon.com
Non-Relational
Fast & Predictable Performance
Seamless Scalability
Easy Administration
Tier One Applications
database service
automated operations predictable performance
durable low latency cost effective
=
• Developers are freed from: Performance tuning (latency)
Automatic 3-way multi-AZ replication
Scalability (and scaling operations)
Security inspections, patches, upgrades
Software upgrades, patches
Automatic hardware failover
Improving the underlying hardware
…and lots of other stuff
Built to make life easier for developers
Provisioned Throughput • Request-based capacity provisioning model
• Throughput is declared and updated via the API or the console CreateTable (foo, reads/sec = 100, writes/sec = 150) UpdateTable (foo, reads/sec=10000, writes/sec=4500)
• DynamoDB handles the rest Capacity is reserved and available when needed Scaling-up triggers repartitioning and reallocation No impact to performance or availability
Predictable Performance
WRITES Continuously replicated to 3 AZ’s Quorum acknowledgment Persisted to disk (custom SSD)
READS Strongly or eventually consistent
No trade-off in latency
Durable Low Latency
DynamoDB Primitives simple rich query fast application API support development
DynamoDB Concepts
table
DynamoDB Concepts
table
items
DynamoDB Concepts
attributes
items
table
schema-less schema is defined per attribute
DynamoDB Concepts
attributes
items
table
scalar data types • number, string, and binary multi-valued types • string set, number set, and binary set
DynamoDB Concepts
hash
hash keys mandatory for all items in a table key-value access pattern
PutItem UpdateItem DeleteItem BatchWriteItem
GetItem BatchGetItem
Hash = Distribution Key
partition 1 .. N
hash keys mandatory for all items in a table key-value access pattern determines data distribution
Hash = Distribution Key
large number of unique hash keys
uniform distribution of workload across hash keys
! +
Range = Query
range
hash
range keys model 1:N relationships enable rich query capabilities composite primary key
all items for a hash key ==, <, >, >=, <= “begins with” “between” sorted results counts top / bottom N values paged responses
Index Options
local secondary indexes (LSI) alternate range key + same hash key index and table data is co-located (same partition)
Projected Attributes
KEYS_ONLY INCLUDE ALL
Projected Attributes
KEYS_ONLY INCLUDE ALL
Projected Attributes
KEYS_ONLY INCLUDE ALL
Index Options
global secondary indexes (GSI)
any attribute indexed as new hash or range key
KEYS_ONLY INCLUDE ALL
Designing for Scale access pattern use case under-rated modeling walk-thru features
• Method 1. Describe the overall use case – maintain context
2. Identify the individual access patterns of the use case
3. Model each access pattern to its own discrete data set
4. Consolidate data sets into tables and indexes
• Benefits Single table fetch for each query
Payloads are minimal for each access
Access Pattern Modeling
Multi-tenant application for file storing and sharing
• UserId is the unique identifier of each user
• FileId is the unique identifier of each file, owner by user
Use Case Walk Thru
Good PK selection: UserId (hash) + FileId (range)
use case access patterns data design
1. Users should be able to query all the files they own
2. Search by File Name
3. Search by File Type
4. Search by Date Range
5. Keep track of Shared Files
6. Search by descending order or File Size
Use Case Walk Thru
use case access patterns data design
additional (non-PK) attributes & index candidates
Users
• Hash key = UserId (S)
• Attributes = UserName (S), Email (S), Address (SS) ...
User_Files
• Hash key = UserId (S)
• Range key = FileId (S)
• Attributes = Name (S), Type (S), Size (N), Date (S), SharedFlag (S), S3key (S)
DynamoDB Data Model
Secondary Indexes
Table Name Index Name Attribute to Index Projected Attribute
User_Files NameIndex Name KEYS
User_Files TypeIndex Type KEYS + Name
User_Files DateIndex Date KEYS + Name
User_Files SharedFlagIndex SharedFlag KEYS + Name
User_Files SizeIndex Size KEYS + Name
example only – required data returned determines optimal projections
• Find all files owned by a user
Query User_Files table (UserId = 2)
Access Pattern 1
UserId (Hash)
FileId (Range)
Name Date Type SharedFlag Size S3key
1 1 File1 2013-04-23 JPG 1000 bucket\1
1 2 File2 2013-03-10 PDF Y 100 bucket\2
2 3 File3 2013-03-10 PNG Y 2000 bucket\3
2 4 File4 2013-03-10 DOC 3000 bucket\4
3 5 File5 2013-04-10 TXT 400 bucket\5
• Search by file Name
Query (IndexName = NameIndex, UserId = 1, Name = File1)
Access Pattern 2
UserId (hash)
Name (range)
FileId
1 File1 1
1 File2 2
2 File3 3
2 File4 4
3 File5 5
NameIndex
• Search for file name by file Type
Query (IndexName = TypeIndex, UserId = 2, Type = DOC)
Access Pattern 3
UserId (hash)
Type (range)
FileId Name
1 JPG 1 File1
1 PDF 2 File2
2 DOC 4 File4
2 PNG 3 File3
3 TXT 5 File5
projection
TypeIndex
• Search for file name by Date range
Query (IndexName = DateIndex, UserId = 1, Date between 2013-03-01 and 2013-03-29)
Access Pattern 4
UserId (hash)
Date (range)
FileId Name
1 2013-03-10 2 File2
1 2013-04-23 1 File1
2 2013-03-10 3 File3
2 2013-03-10 4 File4
3 2013-04-10 5 File5
DateIndex
projection
• Search for names of Shared files
Query (IndexName = SharedFlagIndex, UserId = 1, SharedFlag = Y)
Access Pattern 5
UserId (hash)
SharedFlag (range)
FileId Name
1 Y 2 File2
2 Y 3 File3
SharedFlagIndex
projection
• Query for file names by descending order of file Size
Query (IndexName = SizeIndex, UserId = 1, ScanIndexForward = false)
Access Pattern 6
UserId (hash)
Size (range)
FileId Name
1 100 1 File1
3 400 2 File2
1 1000 3 File3
2 2000 4 File4
2 3000 5 File5
SizeIndex projection
• Users
• User_Files NameIndex
TypeIndex
DateIndex
SharedFlagIndex
SizeIndex
Use Case Walk Thru
use case access patterns data design
Secondary Indexing Options
Local (LSI) Global (GSI)
Strongly or eventually consistent) reads
Eventually consistent reads only
Total storage limit per hash key (10GB)
No storage limit
Read and write units consumed from the table
Read and write units defined per GSI
• Consistent Reads Inventory, shopping cart applications
• Atomic Counters Increment and return new value in same operation
• Conditional Writes Expected value before write – fails on mismatch
“state machine” use cases
• Sparse Indexes Optimal for accessing boolean values
Popular: identify updated items for background clean-up process
Under-Rated Features
Moving to DynamoDB denormalized live NoSQL leaving entity-relationship migrations DynamoDB
Migration Considerations
• Data Layer Single-table use case
Denormalized preferred – to reduce multi-object access • Can be done on target design for high-scale patterns
Minimal dependencies (FK’s, triggers, procedures)
Design target for scale
• Application Code Review and validate access patterns
API / SQL mapping and conversion • DynamoDB = simple API’s… reads (2); writes (4); rich query (1); scan (1)
• Design for Scale Optimal target environment
• Identify minimal data unit to be migrated (user data)
• Plan migration activities Map source to target
Identify transforms
Update app to work with new target
Test and verify
• Implement the migration Triggered by user login
Login simulation (loop)
NoSQL – Live Migration Checklist (example)
Data Movement Templates
Export DynamoDB to S3 (archive) http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-dynamodbexportdataformat.html
Getting Started zero cost use case design dev & test validation workshop
• DynamoDB Local Disconnected development with full API support
• No network
• No usage costs
• No SLA
http://aws.amazon.com/dynamodb/developer-resources/
• Use Case Assessment Want to test an idea, but unsure if your use case is a good fit?
• Design Call Ready to start coding (or scaling) and want to ensure optimal design?
Next Steps – Hands On
Register at: http://aws.amazon.com/about-aws/events/
Databases in the Cloud Webinar Series
• Data Modeling and Best Practices for Scaling your Application with Amazon DynamoDB
• February 27, 2014 10:00am PST
High-scale applications like social gaming, chat, and voting
Model these applications using DynamoDB, including how to use building blocks such as conditional writes, consistent reads, and batch operations.
Incorporate best practices such as index projections, item sharding, and parallel scan for maximum scalability
Next Steps – More Information
• The Mill DTLV – AWS Perk
• Feb 27 – DynamoDB Webinar (Advanced Schema Design)
• March 10 – SXSW AWS Sessions
• March 20 – Las Vegas Meetup – Redshift
• March 26 – AWS Summit – San Francisco
Upcoming AWS Events
Questions David Pearson [email protected]