Upload
aws-germany
View
454
Download
0
Tags:
Embed Size (px)
Citation preview
Jan Borch - AWS Solutions Architect
Understanding Database Options on AWS
Jan Borch
#awssummit
Berlin
Berlin
We want to make it easy for you to start
1. Zero to Application in ____ Minutes
2. Zero to Millions of users in ____ Days
3. Zero to “Profits!” ASAP
AWS can help
Totally up to you!
Berlin
https://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systemshttp://nosql-database.org/
Berlin
Spectrum of options on AWS
SQL NoSQL
Low Cost High Cost
Do-it-yourself Fully Managed
Not available on AWS
Berlin
RDS- MySQL- Oracle- SQL Server
MySQLOracleSQL ServerPostgreSQLYour favorite RDBMS
Spectrum of options on AWS
SQL NoSQL
Do-it-yourself Fully Managed
Berlin
Spectrum of options on AWS
SQL NoSQL
Do-it-yourself Fully ManagedMongoDBCassandraRedisMemcached…
Amazon DynamoDBAmazon ElastiCache
Berlin
Thinking about the questions
Should I use SQL or NoSQL?
Should I use MySQL on EC2 or RDS?
Should I use MongoDB,
Cassandra, or DynamoDB?
Should I use Redis, Memcached, or
ElastiCache?
?
Berlin
Actually, thinking about the right questions
What are my scale and latency needs?
What are my transactional and
consistency needs?
What are my read/write, storage
and IOPS needs?
What are my time to market and server
control needs?
?
Berlin
“I need root access to the instance to do some
custom configuration”
“My object persistence
framework does not support
Amazon DynamoDB”
“My team has a strongPostgreSQL expertise”
Option 1:Run your databases on EC2
Virtual core: 1Memory: 1.7 GiBI/O performance: Moderate
m1.small cc2.8xlarge
Virtual core: 32 - 2 x Intel XeonMemory: 60,5 GiBI/O performance: 10 Gbit
cr1.8xlarge
Virtual core: 32 - 2 x Intel XeonMemory: 240 GiBI/O performance: 10 GbitSSD Instance store: 240 GB
cr1.8xlarge
Virtual core: 16Memory: 60.5 GiBI/O performance: 10 GbitSSD Instance store: 2 x 1TB
cr1.8xlarge
Virtual core: 16Memory: 117 GiBI/O performance: 10 GbitInstance store: 24 x 2TB
Berlin
Leverage AWS services
EBS storage Volumes with EBS Snapshots
S3 for backups (for example Oracle RMAN)
Automation with AWS API or CloudFormation
Berlin
Option 2:Let AWS manage my databases
“I want to reduce the time
developers spend on database
administration tasks”“I need a database that is
simple to deploy and easy to scale”
Berlin
backup & recovery,data load & unload
performance tuning5%
25%
20%
40%
5% 5%
scripting & coding
securityplanning
install, upgrade, patch and migrate
documentation, licensing &
training
differentiated effort increases the
uniqueness of an application
Why Managed Databases?
Berlin
We believe in choiceOne size does not fit all
Traditional Apps
Relational DB Needs
High Performance, High Scale Data
Warehouses
New Web Apps
Massive Scalability
Amazon RDS
Amazon ElasticCache
Amazon DynamoeDB
Amazon Redshift
Berlin
Option 2.1:Managed SQL database
“I have a complex data model
a need integrity constraints”“My business apps only understands SQL”“I need complex transactions, joins, updates?”
Amazon Relational Database ServicesAmazonRDS
RDS is a fully managed relational database service that is simple to deploy, easy to scale, reliable and
cost-effective
Berlin
Pricereduction
High Availability: Multi-AZ Deployments
Multi AZ price reductions ranging from 15% to 32%
Berlin
Horizontal Scaling with Read Replicas
New Features
• Endpoint Renaming• ReadReplica
to master promotion
Berlin
Security
Oracle Native Network Encryption and Transparent Data Encryption on Oracle EE
SSL support for SQL Server and mysql
Berlin
Amazon RDSConfiguration
ImproveAvailability
IncreaseThroughput
ReduceLatency
Push-Button Scaling
Multi-AZ
Read Replicas
Provisioned IOPS
Read ReplicasPush-Button ScalingProvisioned IOPS
Region
Multi-AZ
Availability Zone
Availability Zone
Availability and performance options
Berlin
Who is succeeding with RDS?Thousands of developers use RDS every single day
Gaming Web Apps Mobile/Social Media
Berlin
Option 2.2:Managed noSQL database
“I have very low latency
requirements ”
“I do not require complex queries or transactions”“I need to scale (now, or in future)”
“I want to eliminate administrative costs”
Berlin
Single digit millisecond latency.
Backed on solid-state drives.
Consistent, predictable performance
Berlin
Consistent, disk only writes.
Replication across data centers and availability zones.
Durable
Berlin
Three click or on API call
Table name + Primary Key Level of throughput
Optional: Secondary local indexes
Berlin
Pay per capacity unit
READ
Capacity Units = Size of item (KB) x read per second
Consistent read:
$0.0065 for 50 read units
Eventually consistent reads:
$0.0065 for 100 read units
WRITE
Capacity Units = Size of item (KB) x write per
second
$0.0065 for 10 write units
Berlin
Transactions
Item level transactions only
Puts, updates and deletes are ACID
Atomic increment and decrement
Conditional writes
Optimistic concurrency control
Berlin
Read Consistency
Strong or eventually consistent reads
Same latency expectations for strong
Mix and match at ‘read time’
Berlin
id = 100 date = 2012-05-16-09-00-10 total = 25.00
id = 101 date = 2012-05-15-15-00-11 total = 35.00
id = 101 date = 2012-05-16-12-00-10
total = 100.00
id = 102 date = 2012-03-20-18-23-10 total = 20.00
id = 102 date = 2012-03-20-18-23-10
total = 120.00
Data modelingTable
Berlin
id = 100 date = 2012-05-16-09-00-10 total = 25.00
id = 101 date = 2012-05-15-15-00-11 total = 35.00
id = 101 date = 2012-05-16-12-00-10
total = 100.00
id = 102 date = 2012-03-20-18-23-10 total = 20.00
id = 102 date = 2012-03-20-18-23-10
total = 120.00
Data modeling
Item
Berlin
id = 100 date = 2012-05-16-09-00-10 total = 25.00
id = 101 date = 2012-05-15-15-00-11 total = 35.00
id = 101 date = 2012-05-16-12-00-10
total = 100.00
id = 102 date = 2012-03-20-18-23-10 total = 20.00
id = 102 date = 2012-03-20-18-23-10
total = 120.00
Data modeling
Attributes
Berlin
Items are indexed by primary and secondary keys
Primary keys can be composite
Secondary keys are local to the table
Indexing
Berlin
Programming DynamoDB.Small but perfectly formed API.
CreateTable
UpdateTable
DeleteTable
DescribeTable
ListTables
PutItem
GetItem
UpdateItem
DeleteItem
BatchGetItem
BatchWriteItem
Query
Scan
Manage tables
Query specific items OR scan the full table
“Select”, “insert”, “update” items
Bulk select or update (max 1MB)
Berlin
Query patterns
Retrieve all items by hash key.
Range key conditions:
==, <, >, >=, <=, begins with, between.
Counts. Top and bottom n values.
Paged responses.
Berlin
Option 2.3:Managed datawarehouse database
“I need to query high volume of
data”
“I do primarily SQL analytic queries”
“I need high performance for my reporting queries”
Berlin
OLTP <-> OLAP
SELECT ProductID, Name
FROM Products
Where ProductID = 1234;
SELECT ProductID, count(*) FROM Page_Hits
WHERE hour in (12,13)
GROUP BY ProductID
Berlin
Transactional Processing
• Global context– Daily revenue report
• Throughput• Full table scans• Sequential IO• Disk Transfer rates
Analytical Processing
• Transactional context– Get order total
• Latency• Indexed access• Random IO• Disk Seek times
OLTP <-> OLAP
Berlin
Fast and powerful
Parallelize and Distribute Everything
Dramatically Reduce I/ODirect-attached storageLarge data block sizesColumn data storeData compressionZone maps
MPPLoadQueryResizeBackupRestore
Berlin
Fully Managed
Protect Operations
Simplify ProvisioningRedshift data is always encryptedContinuously backed up to S3Automatic node recoveryTransparent disk failure
Create a cluster in minutesAutomatic OS and software patching
Scale up to 1.6PB with a few clicks and no downtime
Berlin
Amazon Redshift architecture
10 GigE(HPC)
IngestionBackupRestore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
Amazon S3
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute Node
128GB RAM
16TB disk
16 coresCompute Node
128GB RAM
16TB disk
16 coresCompute Node
LeaderNode