Upload
amazon-web-services
View
2.406
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Amazon DynamoDB is a fully managed, highly scalable distributed database service. Global Secondary Indexes (GSI) give you the flexibility to query your DynamoDB tables in new and powerful ways. In this session, we will: • Describe how GSI's work under the covers to ensure consistent low latency at any scale. • Walk through various access patterns so that you will learn how to take full advantage of GSI's and implement best practice designs that will scale efficiently and cost-effectively. This session is designed for developers and architects seeking to build rich applications that require performance and availability with absolute data durability.
Citation preview
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Optimize Your Database for the Cloud
with DynamoDB
A Deep Dive into
Global Secondary Indexes (GSI)
David Pearson
Siva Raghupathy
1
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
database service
automated operations predictable performance
durable low latency cost effective
=
2
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
WRITES
Continuously replicated to 3 AZ’s
Quorum acknowledgment
Persisted to disk (custom SSD)
READS
Strongly or eventually consistent
No trade-off in latency
Durable Low Latency
3
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Recent Announcements
Secondary Indexes (Local and Global)
DynamoDB Local
• Disconnected development with full API support
• No network
• No usage costs
• No SLA
Fine-Grained Access Control
• Direct-to-DynamoDB access for mobile devices
Geospatial and Transaction Libraries
4
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DynamoDB Concepts
table
5
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DynamoDB Concepts
table
items
6
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DynamoDB Concepts
attributes
items
table
schema-less schema is defined per attribute
7
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DynamoDB Concepts
hash
hash keys mandatory for all items in a table key-value access pattern
8
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DynamoDB Concepts
partition 1 .. N
hash keys mandatory for all items in a table key-value access pattern determines data distribution
9
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DynamoDB Concepts
range
hash
range keys model 1:N relationships enable rich query capabilities composite primary key
all items for a hash key ==, <, >, >=, <= “begins with” “between” sorted results counts top / bottom N values paged responses
10
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DynamoDB Concepts
local secondary indexes (LSI) alternate range key + same hash key index and table data is co-located (same partition)
11
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
A1 (hash)
A3 (range)
A2 (table key)
LSI Attribute Projections A1
(hash) A2
(range) A3 A4 A5
LSIs
A1 (hash)
A4 (range)
A2 (table key)
A3 (projected)
Table
KEYS_ONLY
INCLUDE A3
A1 (hash)
A4 (range)
A2 (table key)
A3 (projected)
A5 (projected) ALL
12
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DynamoDB Concepts
global secondary indexes (GSI)
any attribute indexed as new hash and/or range key
13
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Local Secondary Index Global Secondary Index
1 Key = hash key and a range key Key = hash or hash-and-range
2 Hash same attribute as that of the table. Range key
can be any scalar table attribute
The index hash key and range key (if present) can be
any scalar table attributes
3 For each hash key, the total size of all indexed items
must be 10 GB or less No size restrictions for global secondary indexes
4 Query over a single partition, as specified by the hash
key value in the query Query over the entire table, across all partitions
5 Eventual consistency or strong consistency Eventual consistency only
6 Read and write capacity units consumed from the
table.
Every global secondary index has its own provisioned
read and write capacity units
7 Query will automatically fetch non-projected attributes
from the table
Query can only request projected attributes. It will not
fetch any attributes from the table
14
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
A5 (hash)
A3 (range)
A1 (table key)
GSI Attribute Projections A1
(hash) A2 A3 A4 A5
GSIs
A5 (hash)
A4 (range)
A1 (table key)
A3 (projected)
Table
KEYS_ONLY
INCLUDE A3
A4 (hash)
A5 (range)
A1 (table key)
A2 (projected)
A3 (projected) ALL
A2 (hash)
A1 (table key) KEYS_ONLY
15
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
GSI Query Pattern
Query covered by GSI
• Query GSI & get the attributes
Query not covered by GSI
• Query GSI get the table key(s)
• BatchGetItem/GetItem from table
• 2 or more round trips to DynamoDB
Tip: If you need very low latency then project all required attributes into GSI
16
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
How do GSI updates work
Table
Primary
table Primary
table Primary
table Primary
table
Global
Secondary
Index
Client
2. Asynchronous
update (in progress)
17
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Table Operation No of GSI index
updates
• Item not in Index before or after update 0
• Update introduces a new indexed-attribute
• Update deletes the indexed-attribute
1
• Updated changes the value of an indexed attribute from
A to B
2
1 Table update = 0, 1 or 2 GSI updates
18
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
GSI EXAMPLES
19
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Example1: Multi-tenant application for file
storing and sharing
Access Patterns
1. Users should be able to query all the files they own
2. Search by File Name
3. Search by File Type
4. Search by Date Range
5. Keep track of Shared Files
6. Search by descending order or File Size
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DynamoDB Data Model
Users
• Hash key = UserId (S)
• Attributes = User Name (S), Email (S), Address (SS), etc.
User_Files
• Hash key = UserId (S) – This is also the tenant id
• Range key = FileId (S)
• Attributes = Name (S), Type (S), Size (N), Date (S), SharedFlag
(S), S3key (S)
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Global Secondary Indexes
Table Name Index Name Attribute to
Index
Projected Attribute
User_Files NameIndex Name KEYS
User_Files TypeIndex Type KEYS + Name
User_Files DateIndex Date KEYS + Name
User_Files SharedFlagIndex SharedFlag KEYS + Name
User_Files SizeIndex Size KEYS + Name
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Access Pattern 1
Find all files owned by a user
• Query (UserId = 2)
UserId
(Hash)
FileId
(Range)
Name Date Type SharedFlag Size S3key
1 1 File1 2013-04-23 JPG 1000 bucket\1
1 2 File2 2013-03-10 PDF Y 100 bucket\2
2 3 File3 2013-03-10 PNG Y 2000 bucket\3
2 4 File4 2013-03-10 DOC 3000 bucket\4
3 5 File5 2013-04-10 TXT 400 bucket\5
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Access Pattern 2
Search by file name
• Query (IndexName =
NameIndex, UserId =
1, Name = File1)
UserId
(hash)
Name
(range)
FileId
1 File1 1
1 File2 2
2 File3 3
2 File4 4
3 File5 5
NameIndex
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Access Pattern 3
Search for file name
by file Type
• Query (IndexName =
TypeIndex, UserId = 2,
Type = DOC)
UserId
(hash)
Type
(range)
FileId Name
1 JPG 1 File1
1 PDF 2 File2
2 DOC 4 File4
2 PNG 3 File3
3 TXT 5 File5
Projection
TypeIndex
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Access Pattern 4
Search for file name by
date range
• Query (IndexName =
DateIndex, UserId = 1,
Date between 2013-03-
01 and 2013-03-29)
UserId
(hash)
Date
(range)
FileId Name
1 2013-03-10 2 File2
1 2013-04-23 1 File1
2 2013-03-10 3 File3
2 2013-03-10 4 File4
3 2013-04-10 5 File5
Projection
DateIndex
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Access Pattern 5
Search for names of
Shared files
• Query (IndexName =
SharedFlagIndex,
UserId = 1,
SharedFlag = Y)
UserId
(hash)
SharedFlag
(range)
FileId Name
1 Y 2 File2
2 Y 3 File3
Projection
SharedFlagIndex
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Access Pattern 6
Query for file names by
descending order of file
size
• Query (IndexName =
SizeIndex, UserId = 1,
ScanIndexForward =
false)
UserId
(hash)
Size
(range)
FileId Name
1 100 1 File1
3 400 2 File2
1 1000 3 File3
2 2000 4 File4
2 3000 5 File5
Projection
SizeIndex
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Example2: Find top score for game G1
Id (hash key)
User Game Score Date
1 Bob G1 1300 2012-12-23 18:00:00
2 Bob G1 1450 2012-12-23 19:00:00 3 Jay G1 1600 2012-12-24 20:00:00
4 Mary G1 2000 2012-10-24 17:00:00
5 Ryan G2 123 2012-03-10 15:00:00 6 Jones G2 345 2012-03-20 15:00:00
Game-scores-table
29
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
GameScoresIndex
Id (hash key)
User Game Score Date
Games (hash)
Score (range)
Id (table key)
User (projected)
Date (projected)
Game-scores-table
Game-scores-index
30
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Game-scores-index
Game (Hash)
Score (Range)
Id User Date
G1 2000 4 Mary 2012-10-24 17:00:00
G1 1600 3 Jay 2012-12-24 20:00:00
G1 1450 2 Bob 2012-12-23 19:00:00
G1 1300 1 Bob 2012-12-23 18:00:00
G2 345 6 Jones 2012-03-20 15:00:00
G2 123 5 Ryan 2012-03-10 15:00:00
31
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Query: Find top score for game G1
32
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DATA MODELING WITH GSI
33
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Modeling 1:1 relationships
Use a table with a Hash key or a GSI with a hash key
Example:
• Users
Hash key = UserID
• Users-email-GSI
Hash key = Email
Users Table Hash key Attributes UserId = bob Email = [email protected], JoinDate = 2011-11-15 UserId = fred Email = [email protected], JoinDate = 2011-12-
01, Sex = M
Users-email-GSI Hash key Attributes Email = [email protected]
UserId = bob, JoinDate = 2011-11-15
Email = [email protected]
UserId = fred, JoinDate = 2011-12-01, Sex = M
34
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Modeling 1:N relationships
Use a table with Hash and Range key or GSI ()
Example:
• One (1) User can play many (N) Games User-Games-GSI
Hash Key Range key
Attributes
UserId = bob GameId = Game1
HighScore = 10500, ScoreDate = 2011-10-20
UserId = fred
GameId = Game2
HIghScore = 12000, ScoreDate = 2012-01-10
UserId = bob GameId = Game3
HighScore = 20000, ScoreDate = 2012-02-12
User-Games-Table Hash Key Attributes UserId = bob
GameId = Game1, HighScore = 10500, ScoreDate = 2011-10-20
UserId = fred
GameId = Game2 HIghScore = 12000, ScoreDate = 2012-01-10
UserId = bob
GameId = Game3 HighScore = 20000, ScoreDate = 2012-02-12
35
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Modeling N:M relationships
Use GSI
• Example: 1 user plays multiple games
and 1 game has multiple users
User-Games-Table Hash Key Range key UserId = bob GameId = Game1
UserId = fred GameId = Game2
UserId = bob GameId = Game3
Game-Users-GSI Hash Key Range key GameId = Game1 UserId = bob GameId = Game2 UserId = fred GameId = Game3 UserId = bob
36
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Best Practices
Choose a GSI Hash Key with high cardinality
Id (hash) Name Sex DOB Address
Employee-Table
Sex (Hash) DOB Id Name Address
SexDOB-GSI Cardinality of Sex = 2 (M/F)
Solution: Generate aliases for M/F by suffixing a known range
of integers (say 1 to 100) and Query for each value M_1 to M_100
37
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Best Practices
Take advantage of Sparse Indexes
Id (hash)
User Game Score Date Award
1 Bob G1 1300 2012-12-23
2 Bob G1 1450 2012-12-23
3 Jay G1 1600 2012-12-24
4 Mary G1 2000 2012-10-24 Champ
5 Ryan G2 123 2012-03-10 6 Jones G2 345 2012-03-20
Game-scores-table
Award (hash)
Id User Score
Champ 4 Mary 2000
Award-GSI
38
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Best Practices
Query GSI for quick item lookups
• Less read capacity units consumed
Mail Box-Table
ID (hash key)
Timestamp (range key)
Attribute1
Attribute2
Attribute3
….
LargeAttachment
Mail Box-lookup-GSI
ID (hash key)
Timestamp (range key)
Attribute1
Attribute2
Attribute3
39
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Best Practices
Provision enough throughput for GSI
• one update to the table may result in two writes to an index
If GSIs do not have enough write capacity, table writes
will eventually be throttled down to what the "slowest"
index can consume
40
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Debugging Throughput Issues
ProvisionedThroughputExceededException (HTTP
status code 400)
• "The level of configured provisioned throughput for one or more
global secondary indexes of the table was exceeded. Consider
increasing your provisioning level for the under-provisioned
global secondary indexes with the UpdateTable API"
41
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Debugging Throughput Issues
GSI CloudWatch Metrics
• ProvisionedReadCapacityUnits Vs ConsumedReadCapacityUnits
• ProvisionedWriteCapacityUnits Vs ConsumedWriteCapacityUnits
• ReadThrottleEvents
• WriteThrottleEvents
42
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Questions
43