Amazon RDS for Microsoft SQL: Performance, Security, Best Practices (DAT303) | AWS re:Invent 2013

Preview:

DESCRIPTION

Come learn about architecting high-performance applications and production workloads using Amazon RDS for SQL Server. Understand how to migrate your data to an Amazon RDS instance, apply security best practices, and optimize your database instance and applications for high availability.

Citation preview

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

DAT303 - A Closer Look at Amazon RDS for Microsoft SQL Server Deep Dive into Performance, Security, and Data Migration Best Practices

Sergei Sokolenko - Sr Product Manager, AWS

Allan Parsons - VP Operations, Viddy

November 13, 2013

• Best Practices – Security

– Performance

– Data Migration

– Data Durability

• Viddy’s Case

Next Hour …

Security Best Practices

Control Access Internet

IAM

VPC

Encrypt Your Data

• “In transit” with SSL – Import public Amazon RDS certificate into Windows

https://rds.amazonaws.com/doc/rds-ssl-ca-cert.pem

– Add "encrypt=true" to your connection string

• “At rest” with Transparent Data Encryption – Encrypts data before writing to storage

– Decrypts when reading

Performance Best Practices

High Performance Relational Databases

Amazon RDS Configuration

Increase Throughput

Reduce Latency

Push-Button Scaling

DB Shards

Provisioned IOPS

Push-Button Scaling Provisioned IOPS Database Shards

Push Button Scaling & Sharding

• Scale nodes vertically up or down – M1.small (1 virtual core, 1.7GB)

– M2.4XLarge (8 virtual cores, 64GB)

• Scale out nodes horizontally – Shard based on data or workload

characteristics

Production = Provisioned IOPS Consistently fast performance

• 1 TB max instance size

• 10,000 Provisioned IOPS

• I/O-Optimized instances

• Check I/O blockers – Database contention

– Locking

Data Migration Best Practices

Replication + Switchover

Linked Servers

SSIS

Bulk Migration

Import/Export Wizard

BCP Bulk Load

Migrating Data to Amazon RDS

One-time Bulk Migration

On Premise AWS

Migration Code Snippets -- Run SSMS’s “Generate and Publish Scripts” Wizard

-- .BAT script for export BCP commands

SELECT 'bcp ' + db_name() + '..' + name + ' out “C:\Data\' + name + '.txt" -E -n -S localhost –U usr –P pwd' FROM sysobjects WHERE type = 'U'

bcp dbname..table out “C:\Data\table.txt” –E -n -S localhost -U usr -P pwd

-- .BAT script for import BCP commands

SELECT 'bcp ' + db_name() + '..' + name + ' in “C:\Data\' + name + '.txt" -E -n –S RDSEndpoint –U usr –P pwd‘ from sysobjects where type = 'U‘

bcp dbname..table in “C:\Data\table.txt” –E -n -S endpoint,port -U usr -P

pwd

More Info: Data Import Guide for SQL Server

Tables Only

Script USE DATABASE = False

Script Check Constraints = False

Script Foreign Keys = False

Script Primary Keys = False

Script Unique Keys = False

Ongoing Replication with Switchover

SourceINST

On Premise TargetINST

AWS

Linked Server

On Target Instance (Amazon RDS) USE master;

CREATE LOGIN [repl_login] WITH PASSWORD=N'password01', DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF;

USE UserDB1;

CREATE USER [repl_user] FOR LOGIN [repl_login];

EXEC sp_addrolemember 'db_datareader', [repl_user];

EXEC sp_addrolemember 'db_datawriter', [repl_user];

-- Assume Source DB has a table “Customers”

CREATE TABLE StageCustomers ( CustomerID int, UpdatedDate datetime );

On Source Instance (On-Premise) USE master;

EXEC sp_addlinkedserver N'[TargetINST.amazonaws.com,port]', N'SQL Server';

CREATE LOGIN [repl_login] WITH PASSWORD=N'password02', DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF;

EXEC sp_addlinkedsrvlogin

@rmtsrvname = N'[TargetINST.amazonaws.com,port]', N'SQL Server',

@useself = 'FALSE', @locallogin = N'repl_login',

@rmtuser = N'repl_login', @rmtpassword = N'password01';

USE UserDB1;

INSERT INTO [TargetINST.amazonaws.com,port].UserDB1.dbo.StageCustomers (CustomerID, UpdatedDate)

SELECT CustomerID,UpdatedDate FROM Customers WHERE UpdatedDate >= DATEADD(DD,-2,GETDATE());

Data Durability Best Practices

Backups and Disaster Recovery • Automated Backups

Nightly system snapshots + transaction backup

Enables point-in-time restore to any point in retention period

Max retention period = 35 days

• DB Snapshots

User-driven snapshots of database

Kept until explicitly deleted

Region 1

AZ 1

Region 2

AZ 1

Cross Region Snapshot Copy

Viddy’s Case

Allan Parsons, Viddy

Scaling viddy.com on Amazon RDS for SQL Server

Vision

To entertain and connect

people around the world by

empowering mobile users to

easily capture, beautify and

share amazing videos to

those who matter most.

Viddy By The Numbers

• Reach :: 41+ Million Registered Users

• Connections :: 250+ Million Users Connections

• Media :: 6.0+ Million Unique Videos

• CDN Assets (encoded videos + images)

• Videos :: 30+ Million Video Files

• Images :: 2+ Billion Image Files

• Human Power

• Executives & Support Staff :: 4

• Software Engineers :: 6

• DevOps Engineers :: 1

• Database Administrators :: 0

What Powers Viddy

• Web / Front-End :: Windows / IIS (C# / .NET / MVC)

• Cache :: Linux / memcached (via Couchbase)

• Persistent Cache :: Linux / Redis (2x Master-Slave Environments)

• Source Control :: Team Foundation Server

• Continuous Integration & Build Automation :: Jenkins, Powershell, msbuild

• AWS & EC2 Tools

• VPCs :: 1 VPC/Environment (Production, QA, Dev)

• RDS :: 11 SQL Server Instances Housing 144 Databases (Production)

• SNS / SQS :: Used for Eventual Consistency

• Route53 & ELBs :: DNS and Load Balancing

• CloudWatch :: Monitoring & Trending

• CloudSearch :: Media, Tag, and User Searching

• S3 & CloudFront :: Asset Storage and Delivery

We’re a Technology Agnostic Stack & Team

Early Technical Challenges Wrong Cloud Ideology

• Inherited a PaaS Cloud Infrastructure

Difficulty in Caching Data

• Twitter-based Service Model

Underestimated Power of Facebook

• Open Graph drove 1MM+ User Registrations / 24H Period

Very Very Busy SQL Instance

• 1 Instance, 6 Databases

• Disabled Key Constraints to Improve Performance

• Too busy to get transactionally consistent backups

Inflexible Platform

• Adding machines would make inefficiencies worse

• On PaaS, more money != more scalability

Moving to AWS

VPC

• Guaranteed affinity between Web, Cache, SQL

• Low Latency

• Better security

SQL

• Tremendous cleanup effort

• 144 RDS shells & filled via ETL

• Engineered Eventual Consistency to Move Deltas

Build Automation

• Build Scripts dual-deployed to PaaS and IaaS

• Developers could build & test multiple times per hour on 2 providers

DNS

• Moved all zones to Route53 & Lowered TTLs

• Updated DNS entries Christmas Eve 2012 (low traffic)

Goal: PaaS to IaaS with Zero Downtime

RDS Eventual Consistency

[1] :: API Servers Push Messages to Amazon SNS Topic

[2] :: Amazon SNS Distributes Message to SQS Queue

[3] :: Windows Service Monitors Queues

[4] :: Windows Service Pushes Message to Shard

Advantages :: Can lose Windows Service, keep messages

:: Can lose DB Shard, keep messages

:: Easy to Scale!

+ more queues

+ more messages

= More Windows Services / EC2 Machines

Shards Based On UserID (GUID)

Provisioning On RDS

SQL Edition

• SQL Server 2012 Standard (BizSpark)

Storage Allocation

• We took the max (1TB)

• Changing Storage = downtime

IOPS

• Busiest Instance (ViddyDB) has 7,000 provisioned IOPS

• Shards have no provisioned IOPS

• Occasional hotspots when celebrities post content

• Changing IOPS = downtime

Instance Size

• Busiest Instance (ViddyDB) has largest size (m2.4xlarge)

• Shards running (m2.2xlarge)

• Changing Instance Size = downtime

VPC Placement

• VPC guarantees node affinity (ours sit in private segment)

• Change VPC Placement = downtime

Goal: As Hands Off As Possible (we don’t have a DBA)

Designing for High Availability

Amazon RDS In VPCs

• At the time we provisioned (Nov-2012), no data replication across AZs

• Single point of failure is Availability Zone

• Running our own replication meant no RDS (and need a DBA)

• RDS didn’t force SQL Server’s AlwaysOn Technology

Sharded Model

• User exists in 1/64 Consumer Shards & 1/64 Producer Shards

• Database goes down: 1/64 users affected (1.5%)

• Instance goes down: 1/8 users affected (12.5%)

Eventual Consistency

• Amazon SNS/SQS Guarantees Eventual Consistency

• Visibility Timeout gives us time to get DB or Instance back online

• Sharded Amazon SQS = won’t affect other shards during downtime

Snapshots

• Set it and forget it

• Reliably works

• Allows us to regularly refresh non-prod DBs via scripts.

Goal: Easily & Quickly

Recover from Outage

Security Considerations The Basics

• Application config files use separate restricted accounts (not SA)

• DBs sit in private VPC segment

• Port restrictions done at Security Group Level

• Viddy HQ is whitelisted

• Developers can connect remotely over OpenVPN

• Support staff gets read-only DB access if they know SQL

The Facebook Security Model

• Every developer has access to everything (we’re a team of 7)

• Less friction, empowers developers

• With great privilege comes great responsibility

Questions?

Try Amazon RDS for SQL Server!

• Start using Transparent Data Encryption (TDE) – See Amazon RDS for SQL Server documentation

http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/

• Try Cross Region Snapshot Copy

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

DAT303

Recommended