17
ZAHID MIAN FEBRUARY 27, 2011 Amazon SimpleDB

Amazon SimpleDB

Embed Size (px)

Citation preview

Page 1: Amazon SimpleDB

Z A H I D M I A N

F E B R U A R Y 2 7 , 2 0 1 1

Amazon SimpleDB

Page 2: Amazon SimpleDB

Need for NoSQL

Avoid Overhead Associated with Traditional RDBMS

Scale Horizontally (significant) as well as Vertically

High Availability

Simplify data storage and model (make it efficient for storing and retrieving data)

Generally a Hash table

Page 3: Amazon SimpleDB

Tradeoffs SimpleDB vs. RDBMS

Simplicity

Lack of support for joins, views, constraints, transactions, stored procedures, etc.

Schema-less, type-less (all values are stored as text)

Simplified querying language (Select * …)

No fine-tuning necessary

Uses Web Services to access data

BASE implementation instead of ACID

Key is “Eventual” commits

Page 4: Amazon SimpleDB

Tradeoffs SimpleDB vs. RDBMS

Proprietary Query “language”

Designed to retrieve Items (not records)

Basic operations

Specific operations like CreateDomain, DeleteAttributes, PutAttributes, etc.

Storage Structure

One large Hash table

Each value is hash, so automatically indexed

Little or No Infrastructure planning

Hosted by Amazon

Page 5: Amazon SimpleDB

Sample SimpleDB SOAP Message

Page 6: Amazon SimpleDB

SimpleDB Object Model

User Account (One Store per account)

Domain – equivalent to a Table

Item – equivalent to a Record

Attribute – equivalent to a Column

Value – equivalent to a column value

Multiple values per attribute are allowed

Page 7: Amazon SimpleDB

User Account(Account/Authentication Info)

Domains Items Attributes

Values

SimpleDB Model

Page 8: Amazon SimpleDB

Application Design Considerations

Normalized vs. Non-Normalized Storage

Data Caching at the Application level

Normalized Data

Contacts ContactEmailAdresses

ContactID Name DOB Gender ContactID EmailAddress

1 Adam Smith … M 1 asmith1@...

2 Sarjo T … M 1 asmith2@...

3 Sarah K … F 2 sarjot@...

3 sarah1@...

3 sarah2@...

Page 9: Amazon SimpleDB

Application Design Considerations

Non-Normalized Data in SimpleDB

Contacts

ContactID Name DOB Gender EmailAddress

1 Adam Smith … M asmith1@...

asmith2@...

asmith3@...

2 Sarjo T … M sarjot@...

3 Sarah K … F sarah1@...

sarah2@...

Contacts

ContactID Name DOB Gender EmailAddress1 EmailAddress2 EmailAddress3

1 Adam Smith … M asmith1@... asmith2@... asmith3@...

2 Sarjo T … M sarjot@...

3 Sarah K … F sarah1@... sarah2@...

Add Additional Attributes as needed

Null attributes don’t exist

Add Additional Values as needed

Page 10: Amazon SimpleDB

Application Design Considerations

Analytical Processing

No support for group by or aggregation

Application must implement appropriate functionality

Can be costly operation at the data level

Bulk Operations

Little support for bulk updates

At least two trips (one to get the items, the other to send batch request)

Page 11: Amazon SimpleDB

Application Design Considerations

No Transactional Support

Application must “mimic” a transaction by guaranteeing commits

Support for Consistent Reads (discouraged)

Constraints

All constraints (type or data) must be handled by the Application

Page 12: Amazon SimpleDB

Application Design Considerations

Working With Data/Values

Value Size Limit of 1024 bytes

Possibly break into chunks of data

Lexicographical search creates problems

Negative Numbers Offset

Need to use an “offset” number to add to numeric values to handle negative values

Zero Padding

Pad all numbers with leading “0”

Dates

Convert all dates to ISO 8601 standard before saving

Page 13: Amazon SimpleDB

Hosting Environment

Challenges to Consider

Data Privacy

Legal Requirements

No Backup Support

“Lock-in” Factor (can’t migrate from SimpleDB)

“Open Cash Register” Problem (rogue script/processing can be costly)

Difficult to Maintain DB for Application Development

Lifecycle (unit test, dev, test, perf, prod)

Page 14: Amazon SimpleDB

Pros of Using SimpleDB

Item ExplanationInfrastructure Amazon hosts the environment, so virtually no cost to get started; no

need for a local datacenter; “pay as you go” for processing

Simplicity Extremely efficient storage and retrieval of data

Flexibility Schema-less; type-less data; easy prototyping

Security data is stored with Amazon and accessible through authenticated

requests only

High Availability BASE implementation provides high availability

Fault-Tolerance Data replicated across multiple nodes; managed by Amazon

Indexing Hash table storage means all data is “automatically” indexed

Page 15: Amazon SimpleDB

Cons of Using SimpleDB

Item ExplanationNot RDBMS Not a RDBMS substitute. Lacks features like stored procedures, referential integrity, views,

datatypes, text search, schemas, granular security

Lacks “rich” SQL Rudimentary search operations; cannot group by, aggregate, etc.

SLA Loosely defined SLA;

Joins Joins can be performed at the application layer, but requires multiple operations between

client/server

Limits on data 10 GB Store; 100 domains; 256 values per item; 1,024 bytes per attribute

Limits on Operations 1 MB response size; 2,500 items returned per Select; 5 seconds maximum for operation

Limits on Predicates 20 maximum predicates per Select; cannot reference other attributes of the Item

Hosting No local implementation makes it difficult to develop application (release management,

performance testing, unit testing, etc.); no backup support; privacy issues

Migration Limited options for migrating data

Page 16: Amazon SimpleDB

Appropriate Use Cases

Type of Application

Explanation

Managing Data for Online

Games

User scores and achievement data; User settings or preferences; user-generated

content (comments, feedback, etc.); dynamic game content

Managing Session State Applications like online games, web sites, and batch processes can manage the

state of their process

“static” content Nightly Builds from RDBMS (e.g. pre-configured Sales Per Region data);

Simple Collections Any collections (e.g., urls, contacts, etc.)

Page 17: Amazon SimpleDB

Inappropriate Use Cases

Type of Application ExplanationAnalytical Processing Applications where data computation is required

on large data

Highly Structured Data Requirements Applications that require constraints and

structures

Data Privacy If data privacy is an issue

Allowing Third-party Extensions Makes it difficult since there is no schema

Data is core-competency When data infrastructure is the core-

competency; when data storage is what gives you

leverage over others